Apache Kafka Interview Questions and Answers
by sonia, on May 27, 2017 11:32:26 AM
Q1. Mention what is Apache Kafka?
Ans: Apache Kafka is a publish-subscribe messaging system developed by Apache written in Scala. It is a distributed, partitioned and replicated log service.
Q2. Mention what is the traditional method of message transfer?
Ans: The traditional method of message transfer includes two methods
- Queuing:In a queuing, a pool of consumers may read message from the server and each message goes to one of them
- Publish-Subscribe:In this model, messages are broadcasted to all consumers
Kafka caters single consumer abstraction that generalized both of the above- the consumer group.
Q3. Mention what is the benefits of Apache Kafka over the traditional technique?
Ans: Apache Kafka has following benefits above traditional messaging technique
- Fast:A single Kafka broker can serve thousands of clients by handling megabytes of reads and writes per second
- Scalable: Data are partitioned and streamlined over a cluster of machines to enable larger data
- Durable: Messages are persistent and is replicated within the cluster to prevent data loss
- Distributed by Design: It provides fault tolerance guarantees and durability
Q4. Mention what is the meaning of broker in Kafka?
Ans: In Kafka cluster, broker term is used to refer Server.
Q5. Compare Kafka & Flume
|Functionality||Publish-subscribe model messaging system||System for data collection, aggregation & movement|
Q6. What role ZooKeeper plays in a cluster of Kafka?
Ans: Kafka is an open source system and also a distributed system is built to use Zookeeper. The basic responsibility of Zookeeper is to build coordination between different nodes in a cluster. Since Zookeeper works as periodically commit offset so that if any node fails, it will be used to recover from previously committed to offset.
The ZooKeeper is also responsible for configuration management, leader detection, detecting if any node leaves or joins the cluster, synchronization, etc.
Q7. What is Kafka?
Ans: Kafka is a message divider project coded in Scala. Kafka is originally developed by LinkedIn and developed as an open sourced in early 2011. The purpose of the project is to achieve the best stand for conducting the real-time statistics nourishment.
Q8. Why do you think the replications are dangerous in Kafka?
Ans: Duplication assures that issued messages which are available are absorbed in the case of any appliance mistake, plan fault or recurrent software promotions.
Q9. What major role a Kafka Producer API plays?
Ans: It is responsible for covering the two producers- kafka.producer.SyncProducer and the kafka.producer.async.AsyncProducer. The main aim is to disclose all the producer performance through a single API to the clients.
Q10. Distinguish between the Kafka and Flume?
Ans: Flume’s major use-case is to gulp down the data into Hadoop. The Flume is incorporated with the Hadoop’s monitoring system, file formats, file system and utilities such as Morphlines. Flume’s design of sinks, sources and channels mean that with the aid of Flume one can shift data among other systems lithely, but the main feature is its Hadoop integration.
The Flume is the best option used when you have non-relational data sources if you have a long file to stream into the Hadoop.Kafka’s major use-case is a distributed publish- subscribe messaging system. Kafka is not developed specifically for Hadoop and using Kafka to read and write data to Hadoop is considerably trickier than it is in Flume.
Kafka can be used when you particularly need a highly reliable and scalable enterprise messaging system to connect many multiple systems like Hadoop.
Q11. Describe partitioning key?
Ans: Its role is to specify the target divider of the memo, within the producer. Usually, a hash-oriented divider concludes the divider ID according to the given factors. Consumers also use the tailored Partitions.
Q12. Inside the manufacturer, when does the Queue Full Exception emerge?
Ans: Queue Full Exception naturally happens when the manufacturer tries to propel communications at a speed which Broker can’t grip. Consumers need to insert sufficient brokers to collectively grip the amplified load since the Producer doesn’t block.
Q13. Can Kafka be utilized without Zookeeper?
Ans: It is impossible to use Kafka without Zookeeper because it is not feasible to go around Zookeeper and attach in a straight line to the server. If the Zookeeper is down for a number of causes, then we will not be able to serve any customer demand.
Q14. What are consumers or users?
Ans: Kafka provides single consumer abstractions that discover both queuing and publish-subscribe Consumer Group. They tag themselves with a user group and every communication available on a topic is distributed to one user case within every promising user group. User instances are in disconnected process. We can determine the messaging model of the consumer based on the consumer groups.
- If all consumer instances have the same consumer set, then this works like a conventional queue adjusting load over the consumers.
- If all customer instances have dissimilar consumer groups, then this works like a publish-subscribe and all messages are transmitted to all the consumers.
Q15. Describe an Offset?
Ans: The messages in the partitions will be given a sequential ID number known as an offset, the offset will be used to identify each message in the partition uniquely. With the aid of Zookeeper Kafka stores the offsets of messages consumed for a specific topic and partition by this consumer group.
Q16. What do you know about partitioning key?
Ans: A partition key can be precise to point to the aimed division of a communication, in Kafka producer. Usually, a hash-oriented divider concludes the division id with the input and people uses modified divisions also.
Q17. Why is Kafka technology significant to use?
Ans: Kafka being distributed publish-subscribe system has the advantages as below.Fast: Kafka comprises of a broker and a single broker can serve thousands of clients by handling megabytes of reads and writes per second.Scalable: facts are partitioned and streamlined over a cluster of machines to enable large informationDurable: Messages are persistent and is replicated in the cluster to prevent record loss Distributed by Design: It provides fault tolerance guarantees and robust.
Q18. Mention what is the maximum size of the message does Kafka server can receive?
Ans: The maximum size of the message that Kafka server can receive is 1000000 bytes.
Q19. Which are the elements of Kafka?
The most important elements of Kafka:
- Topic – It is the bunch of similar kind of messages
- Producer – using this one can issue communications to the topic
- Consumer – it endures to a variety of topics and takes data from brokers.
- Brokers – this is the place where the issued messages are stored
Q20. Explain what is Zookeeper in Kafka? Can we use Kafka without Zookeeper?
Ans: Zookeeper is an open source, high-performance co-ordination service used for distributed applications adapted by Kafka.
No, it is not possible to bye-pass Zookeeper and connect straight to the Kafka broker. Once the Zookeeper is down, it cannot serve client request.
- Zookeeper is basically used to communicate between different nodes in a cluster
- In Kafka, it is used to commit offset, so if node fails in any case it can be retrieved from the previously committed offset
- Apart from this it also does other activities like leader detection, distributed synchronization, configuration management, identifies when a new node leaves or joins, the cluster, node status in real time, etc.
Q21. Explain how message is consumed by consumer in Kafka?
Ans: Transfer of messages in Kafka is done by using sendfile API. It enables the transfer of bytes from the socket to disk via kernel space saving copies and call between kernel user back to the kernel.
Q22. Explain how you can improve the throughput of a remote consumer?
Ans: If the consumer is located in a different data center from the broker, you may require to tune the socket buffer size to amortize the long network latency.
Q23. Explain how you can get exactly once messaging from Kafka during data production?
Ans: During data, production to get exactly once messaging from Kafka you have to follow two things avoiding duplicates during data consumption and avoiding duplication during data production.
Here are the two ways to get exactly one semantics while data production:
- Avail a single writer per partition, every time you get a network error checks the last message in that partition to see if your last write succeeded
- In the message include a primary key (UUID or something) and de-duplicate on the consumer
Q24. Explain how you can reduce churn in ISR? When does broker leave the ISR?
Ans: ISR is a set of message replicas that are completely synced up with the leaders, in other word ISR has all messages that are committed. ISR should always include all replicas until there is a real failure. A replica will be dropped out of ISR if it deviates from the leader.
Q25. Why replication is required in Kafka?
Ans: Replication of message in Kafka ensures that any published message does not lose and can be consumed in case of machine error, program error or more common software upgrades.
Q26. What does it indicate if replica stays out of ISR for a long time?
Ans: If a replica remains out of ISR for an extended time, it indicates that the follower is unable to fetch data as fast as data accumulated at the leader.
Q27. Mention what happens if the preferred replica is not in the ISR?
Ans: If the preferred replica is not in the ISR, the controller will fail to move leadership to the preferred replica.
Q28. Is it possible to get the message offset after producing?
Ans: You cannot do that from a class that behaves as a producer like in most queue systems, its role is to fire and forget the messages. The broker will do the rest of the work like appropriate metadata handling with id’s, offsets, etc.
As a consumer of the message, you can get the offset from a Kafka broker. If you gaze in the SimpleConsumer class, you will notice it fetches MultiFetchResponse objects that include offsets as a list. In addition to that, when you iterate the Kafka Message, you will have MessageAndOffset objects that include both, the offset and the message sent.
Q29. Elaborate Kafka architecture.
Ans: A cluster contains multiple brokers since it is a distributed system. Topic in the system will get divided into multiple partitions and each broker store one or more of those partitions so that multiple producers and consumers can publish and retrieve messages at the same time.
Q30. How to start a Kafka server?
Ans: Given that Kafka exercises Zookeeper, we have to start the Zookeeper’s server.
Learn more in this Zookeeper Tutorial now.
One can use the convince script packaged with Kafka to get a crude but effective single node Zookeeper instance> bin/zookeeper-server-start.shconfig/zookeeper.propertiesNow the Kafka server can start> bin/Kafka-server-start.shconfig/server.properties
Related Interview Questions...