Apache Flume Vs Apache Kafka

Kafka Flume
Publish subscribe messaging system Its a service for collecting, aggregating and moving the large amounts of data to hadoop or process and persists the data into a relational database systems
The messages are replicated in multiple broker nodes, so in case of failure, we can easily retrieve back the message It does not replicates the events/data, so in case of node failure, the data will be lost
Its a pull messaging system so the message is still available for some number of days. So the client with different consumer group can pull the message Data is pushed to the destination which could be logger, hadoop or Custom Sink. So the messages wont be stored as like in Kafka

Both systems can be used together. So the messages can be pushed to Kafka and the same would be consumed by Flume agent with KafkaSource and the data also can be pushed to the KafkaSink.

Advertisements

Flume Agent for consuming Kafka messages

Please refer the below steps to create a single node flume agent to consume Kafka messages

1. Download the Flume and install it
2. Checkout the GitHub repo [https://github.com/dkbalachandar/flume-kafka-agent]
3. Copy the flume-kafka-conf.properties[available in the /src/main/resources] into FLUME_INSTALLED_DIRECTORY/conf folder[Update the zookeeper node and the topic details]
4. Then run mvn package and then copy the /target/flume-kafka-agent-1.0-SNAPSHOT-jar-with-dependencies.jar into FLUME_INSTALLED_DIRECTORY/lib folder
5. Run the Flume agent by running the below command

bin/flume-ng agent --conf conf --conf-file conf/flume-kafka-conf.properties --name a1 -Dflume.root.logger=INFO,console