Apache Flume Event Serializers

The Event Serializer is an interface used for the serialization of events in a random manner. It serves as the mechanism that converts a Flume event into a different format for output. Both the file_roll sink and the HDFS sink support the EventSerializer interface.

Below are the details of the different Event Serializers:

  1. Body text serializer
  2. Text with headers serializer
  3. Avro event serializer

Body text serializer -

This serializer writes the same event data to the output stream without any modifications or transformations. It ignores the event headers, making this the default serializer if none is specified. If headers are present in the event, they will be discarded.

<agent_name>.sinks = <sink_name>
<agent_name>.sinks.<sink-name>.type = file_roll
<agent_name>.sinks.<sink-name>.channel = <channel-name>
<agent_name>.sinks.<sink-name>.sink.directory = /var/log/flume
<agent_name>.sinks.<sink-name>.sink.serializer = text
<agent_name>.sinks.<sink-name>.sink.serializer.appendNewline = false

Example for agent named agt -

agt.sinks = sn1
agt.sinks.sn1.type = file_roll
agt.sinks.sn1.channel = chn1
agt.sinks.sn1.sink.directory = /var/log/flume
agt.sinks.sn1.sink.serializer = text
agt.sinks.sn1.sink.serializer.appendNewline = false

Configuration options are as follows -

Property NameDefaultDescription
appendNewlinetrueWhether a newline will be appended to each event at write time.

Text with headers serializer -

The header serializer outputs both the headers and the body together. It lets you save the header alongside the body, too. First, it writes the header to the output stream, then it adds the body, and finally, it includes a newline.

<agent_name>.sinks = <sink_name>
<agent_name>.sinks.<sink-name>.type = file_roll
<agent_name>.sinks.<sink-name>.channel = <channel-name>
<agent_name>.sinks.<sink-name>.sink.directory = /var/log/flume
<agent_name>.sinks.<sink-name>.sink.serializer = text_with_header
<agent_name>.sinks.<sink-name>.sink.serializer.appendNewline = false

Example for agent named agt -

agt.sinks = sn1
agt.sinks.sn1.type = file_roll
agt.sinks.sn1.channel = chn1
agt.sinks.sn1.sink.directory = /var/log/flume
agt.sinks.sn1.sink.serializer = text_with_header
agt.sinks.sn1.sink.serializer.appendNewline = false

Configuration options are as follows -

Property NameDefaultDescription
appendNewlinetrueWhether a newline will be appended to each event at write time.

Avro event serializer -

The Avro event serializer lets you customize the record schema. It turns Flume events into an Avro container file, just like the "Flume Event" serializer.

avro_event serializer creates a representation of the event. It is a file format that has a bunch of benefits compared to other serialization formats that are tied to specific platforms or programming languages.

Example for agent named agt -

agt.sinks.sn1.type = hdfs
agt.sinks.sn1.channel = chn
agt.sinks.sn1.hdfs.path = /flume/events/%y-%m-%d/%H%M%S
agt.sinks.sn1.serializer 
			= org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
agt.sinks.sn1.serializer.compressionCodec = snappy
agt.sinks.sn1.serializer.schemaURL 
			= hdfs://namenode/path/to/schema.avsc

Configuration options are as follows -

Property NameDefaultDescription
syncIntervalBytes2048000Approx avro sync interval in bytes.
compressionCodecnullAvro compression codec.
schemaURLnullAvro schema URL.