Flume is a standard, simple, robust, flexible, and extensible tool for data ingestion from various data producers (webservers) into Hadoop. In this book, we will be using the simple and illustrative example to explain the basics of Apache Flume and how to use it in practice.
This book is meant for all those professionals who would like to learn the process of transferring log and streaming data from various webservers to HDFS or HBase using Apache Flume.
To make the most of this book, you should have a good understanding of the basics of Hadoop and HDFS commands. Big Data, as we know, is a collection of large datasets that cannot be processed using traditional computing techniques. Big Data, when analyzed, gives valuable results. Hadoop is an open-source framework that allows to store and process Big Data in a distributed environment across clusters of computers using simple programming models.
Features of Flume: Some of the notable features of Flume are as follows
- Flume ingests log data from multiple web servers into a centralized store (HDFS, HBase) efficiently. - Using Flume, we can get the data from multiple servers immediately into Hadoop. - Along with the log files, Flume is also used to import huge volumes of event data produced by social networking sites like Facebook and Twitter, and e-commerce websites like Amazon and Flipkart. - Flume supports a large set of sources and destinations types. - Flume supports multi-hop flows, fan-in fan-out flows, contextual routing, etc. - Flume can be scaled horizontally.