Tuesday, September 16, 2014

Apache Flume NG

Data ingesting tool - flume

Source - Web logs, HTTP, REST servers ,Avro, Thrift, Syslog, Netcat

Channels - Memory buffer or File or DB or Storage

Sink - Target / Destination

To import mutliple data sources to HDFS, Agents are used.

Each agent consists Source, Channel and Sink

-------------------------------------------------------------------------
Running Flume Command from Flume Terminal [as below]:
-------------------------------------------------------------------------
Flume > bin/flume-ng
               agent --conf ./conf/
              -f conf/flume-conf.conf
              -Dflume.root.logger=DEBUG, console -n agent1

-----------------------------------------------------
File content of flume-conf.conf [as below]:-
-----------------------------------------------------

agent1.sources = s1
agent1.channels = c1
agent1.sinks = k1

# Define source and type of event (in this case it is exec)
agent1.sources.s1.type = exec
agent1.sources.s1.command = tail -f /tmp/esplog.log

# Define channel and type (in this case channel is stored in memory, other types are: file, database etc)
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100

# Define sink and type (in this case it is hdfs, i.e an event type result is stored in hdfs)
# Default fileType outputs Writable (LongWritable) contents.
agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = hdfs://<IP>:<Port>/user/flume/esplog.log
agent1.sinks.k1.hdfs.fileType=DataStream

# Bind source and sink to a channel
agent1.sources.s1.channels = c1
agent1.sinks.k1.channel = c1

No comments:

Post a Comment