Data ingesting tool - flume
Source - Web logs, HTTP, REST servers ,Avro, Thrift, Syslog, Netcat
Channels - Memory buffer or File or DB or Storage
Sink - Target / Destination
To import mutliple data sources to HDFS, Agents are used.
Each agent consists Source, Channel and Sink
-------------------------------------------------------------------------
Running Flume Command from Flume Terminal [as below]:
-------------------------------------------------------------------------
Flume > bin/flume-ng
agent --conf ./conf/
-f conf/flume-conf.conf
-Dflume.root.logger=DEBUG, console -n agent1
-----------------------------------------------------
File content of flume-conf.conf [as below]:-
-----------------------------------------------------
agent1.sources = s1
agent1.channels = c1
agent1.sinks = k1
# Define source and type of event (in this case it is exec)
agent1.sources.s1.type = exec
agent1.sources.s1.command = tail -f /tmp/esplog.log
# Define channel and type (in this case channel is stored in memory, other types are: file, database etc)
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
# Define sink and type (in this case it is hdfs, i.e an event type result is stored in hdfs)
# Default fileType outputs Writable (LongWritable) contents.
agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = hdfs://<IP>:<Port>/user/flume/esplog.log
agent1.sinks.k1.hdfs.fileType=DataStream
# Bind source and sink to a channel
agent1.sources.s1.channels = c1
agent1.sinks.k1.channel = c1
Source - Web logs, HTTP, REST servers ,Avro, Thrift, Syslog, Netcat
Channels - Memory buffer or File or DB or Storage
Sink - Target / Destination
To import mutliple data sources to HDFS, Agents are used.
Each agent consists Source, Channel and Sink
-------------------------------------------------------------------------
Running Flume Command from Flume Terminal [as below]:
-------------------------------------------------------------------------
Flume > bin/flume-ng
agent --conf ./conf/
-f conf/flume-conf.conf
-Dflume.root.logger=DEBUG, console -n agent1
-----------------------------------------------------
File content of flume-conf.conf [as below]:-
-----------------------------------------------------
agent1.sources = s1
agent1.channels = c1
agent1.sinks = k1
# Define source and type of event (in this case it is exec)
agent1.sources.s1.type = exec
agent1.sources.s1.command = tail -f /tmp/esplog.log
# Define channel and type (in this case channel is stored in memory, other types are: file, database etc)
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
# Define sink and type (in this case it is hdfs, i.e an event type result is stored in hdfs)
# Default fileType outputs Writable (LongWritable) contents.
agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = hdfs://<IP>:<Port>/user/flume/esplog.log
agent1.sinks.k1.hdfs.fileType=DataStream
# Bind source and sink to a channel
agent1.sources.s1.channels = c1
agent1.sinks.k1.channel = c1
No comments:
Post a Comment