Core Kafka Concepts

Broker
A Kafka server that stores data and serves client requests. A Kafka cluster is composed of one or more brokers. See more in Kafka Architecture.
Topic
A category or feed name to which records are published. Topics in Kafka are partitioned and replicated. Covered in the Introduction to Kafka.
Partition
A subsection of a topic. Partitions are the fundamental unit of parallelism in Kafka. Each partition is an ordered, immutable sequence of records. Details in Kafka Architecture.
Offset
A unique, sequential identifier assigned to each record within a partition. Consumers use offsets to track their position in a partition. Read more about Offset Management.
Producer
A client application that publishes (writes) records to Kafka topics. Learn about Developing Kafka Producers.
Consumer
A client application that subscribes to (reads and processes) records from Kafka topics. Explore Developing Kafka Consumers.
Consumer Group
A group of consumers that jointly consume records from one or more topics. Each partition is consumed by at most one consumer within a group. This enables parallel processing and fault tolerance. See Consumer Groups.
Leader (Partition Leader)
For each partition, one broker acts as the leader. All read and write requests for that partition go through the leader. Explained in Partition Leaders and Replicas.
Follower (Partition Follower)
Brokers that host replicas of a partition but are not the leader. Followers passively replicate data from the leader. See Partition Leaders and Replicas.
Replication
The process of copying partition data across multiple brokers for fault tolerance. The number of replicas is the replication factor.
ISR (In-Sync Replicas)
The set of partition replicas that are fully caught up with the leader. Only ISRs are eligible to be elected as the new leader if the current leader fails.
Log (Commit Log)
The underlying storage mechanism for Kafka partitions. Each partition is essentially an append-only commit log.
Segment (Log Segment)
Partitions are divided into log segments, which are individual files on disk. Segments are rolled based on size or time.
Event / Message / Record
The unit of data in Kafka. It typically consists of a key, a value, a timestamp, and optional headers.
Abstract network of interconnected terms related to core Kafka concepts.

Kafka Ecosystem and Advanced Concepts

ZooKeeper
A distributed coordination service historically used by Kafka for cluster metadata management, controller election, and configuration storage. See Coordination: ZooKeeper and KRaft.
KRaft (Kafka Raft Metadata mode)
A consensus protocol that allows Kafka to manage its metadata internally, removing the dependency on ZooKeeper. More in Coordination: ZooKeeper and KRaft.
Kafka Streams
A client library for building stream processing applications and microservices where input and output data are stored in Kafka. Dive into Kafka Streams API.
KStream
An abstraction in Kafka Streams representing an unbounded, continuously updating sequence of key-value records. See Streams Core Concepts.
KTable
An abstraction in Kafka Streams representing a changelog stream, modeling a dataset where each key has a single, updatable value. See Streams Core Concepts.
Kafka Connect
A framework for scalably and reliably streaming data between Apache Kafka and other systems using connectors. Learn about Kafka Connect.
Connector (Source/Sink)
A component in Kafka Connect that moves data. Source connectors import data into Kafka, and Sink connectors export data from Kafka. Explained in Source vs. Sink Connectors.
Schema Registry
A service that stores and retrieves Avro, JSON Schema, or Protobuf schemas. It helps manage schema evolution in Kafka. Using tools like Pomegra for fintech analysis also requires robust data schema management for diverse financial data sources.
Serialization / Deserialization
The process of converting objects into a byte stream (serialization) for transmission or storage, and converting a byte stream back into objects (deserialization). Crucial for producers and consumers.
Exactly-Once Semantics (EOS)
A processing guarantee ensuring that each message is processed exactly once, even in the presence of failures. Kafka supports EOS for producers and Kafka Streams applications.
Idempotent Producer
A producer configured to ensure that retried sends do not result in duplicate messages within a partition. A step towards EOS.
Lag (Consumer Lag)
The difference in offsets between the last message produced to a partition and the last message consumed by a consumer group from that partition. Indicates how far behind a consumer is.
Throughput
The rate at which Kafka can process data, typically measured in messages per second or megabytes per second.
Latency
The time delay in processing data. In Kafka, this can refer to producer publish latency, end-to-end latency (producer to consumer), or processing latency within Kafka Streams.
Bootstrap Servers
A list of one or more Kafka broker addresses that a client (producer, consumer, or application) uses to make its initial connection to the Kafka cluster to discover the full set of brokers.

Understanding these terms is crucial for anyone working with Apache Kafka, from developers to administrators. For more on related data technologies, consider exploring Blockchain Technology or the Rise of Ethical Hacking which often deals with data security.

Abstract visual connecting terms related to the Kafka ecosystem and advanced concepts.