Your quick reference guide to common terms and concepts in the Apache Kafka ecosystem.
Core Kafka Concepts
Broker
A Kafka server that stores data and serves client requests. A Kafka cluster is composed of one or more brokers.
Topic
A category or feed name to which records are published. Topics in Kafka are partitioned and replicated.
Partition
A subsection of a topic. Partitions are the fundamental unit of parallelism in Kafka. Each partition is an ordered, immutable sequence of records.
Offset
A unique, sequential identifier assigned to each record within a partition. Consumers use offsets to track their position in a partition.
Producer
A client application that publishes (writes) records to Kafka topics.
Consumer
A client application that subscribes to (reads and processes) records from Kafka topics.
Consumer Group
A group of consumers that jointly consume records from one or more topics. Each partition is consumed by at most one consumer within a group.
Leader (Partition Leader)
For each partition, one broker acts as the leader. All read and write requests for that partition go through the leader.
Follower (Partition Follower)
Brokers that host replicas of a partition but are not the leader. Followers passively replicate data from the leader.
Replication
The process of copying partition data across multiple brokers for fault tolerance. The number of replicas is the replication factor.
ISR (In-Sync Replicas)
The set of partition replicas that are fully caught up with the leader. Only ISRs are eligible to be elected as the new leader if the current leader fails.
Event / Message / Record
The unit of data in Kafka. It typically consists of a key, a value, a timestamp, and optional headers.
Kafka Ecosystem and Advanced Concepts
Kafka Streams
A client library for building stream processing applications and microservices where input and output data are stored in Kafka.
KStream
An abstraction in Kafka Streams representing an unbounded, continuously updating sequence of key-value records.
KTable
An abstraction in Kafka Streams representing a changelog stream, modeling a dataset where each key has a single, updatable value.
Kafka Connect
A framework for scalably and reliably streaming data between Apache Kafka and other systems using connectors.
Connector (Source/Sink)
A component in Kafka Connect that moves data. Source connectors import data into Kafka, and Sink connectors export data from Kafka.
A processing guarantee ensuring that each message is processed exactly once, even in the presence of failures. Kafka supports EOS for producers and Kafka Streams applications.
Lag (Consumer Lag)
The difference in offsets between the last message produced to a partition and the last message consumed by a consumer group from that partition.
Bootstrap Servers
A list of one or more Kafka broker addresses that a client uses to make its initial connection to the Kafka cluster.
Understanding these terms is crucial for anyone working with Apache Kafka, from developers to administrators.