Apache Flume Channel Selectors

The channel selector is a part of Flume that figures out which channel a Flume event should go into when there are multiple channels available. It can send the event to one channel or several.

This selection process happens internally. As mentioned before, there are different ways to manage multiple channels using various types of channel selectors.

Replicating channel selectors -

This is the default channel selector. If nothing is set up for the channel selector, the replicating channel selector takes over and decides where to send the event.

Basically, it makes a copy of the event for each channel that’s available, as long as there’s more than one.

# Replicating channel selectors
<Agent_name>.sources = <Source-name>
<Agent_name>.channels = <Channel1> <Channel2>……<Channeln>
<Agent_name>.sources.<source-name>.selector.type = replicating
<Agent_name>.sources.<source-name>.channels 
				= <Channel1> <Channel2>……<Channeln>
<Agent_name>.sources.<source-name>.selector.optional 
				= <Optional channel-number>

Example for agent named agt1 and it’s source called src1 -

agt1.sources = src1
agt1.channels = chnl1 chnl2 chnl3
agt1.sources.src1.selector.type = replicating
agt1.sources.src1.channels = chnl1 chnl2 chnl3
agt1.sources.src1.selector.optional = chnl1

In this setup, chnl1 is an optional channel, so if it has trouble writing—it just gets ignored. But if chnl2 or chnl3 have issues writing, the whole transaction fails because those channels aren’t optional.

Below are the two properties for replicating channel selector.

Property NameDefaultDescription
selector.typereplicatingThe component type name, needs to be replicating
selector.optionalSet of channels to be marked as optional

Multiplexing channel selector -

This channel selector can direct Flume events to various channels based on header information.

# Multiplexing channel selectors
<Agent_name>.sources = <Source-name>
<Agent_name>.channels = <Channel1> <Channel2>……<Channeln>
<Agent_name>.sources.<source-name>.selector.type = multiplexing
<Agent_name>.sources.<source-name>.selector.header = <header-name>
<Agent_name>.sources.<source-name>.selector.mapping.
<header-category> = <Channel1> <Channel2>……<Channeln>
<Agent_name>.sources.<source-name>.selector.mapping.<header-category> 
			= <Channel1> <Channel2>……<Channeln>
<Agent_name>.sources.<source-name>.selector.default 
			= <Channel1> <Channel2>……<Channeln>

Example for agent named agt1 and source called src1 -

agt1.sources = src1
agt1.channels = chnl1 chnl2 chnl3 chnl4
agt1.sources.src1.selector.type = multiplexing
agt1.sources.src1.selector.header = grade
agt1.sources.src1.selector.mapping.grade1 = chnl1
agt1.sources.src1.selector.mapping.grade2 = chnl2
agt1.sources.src1.selector.mapping.grade3 = chnl3
agt1.sources.src1.selector.default = chnl4

In this setup, channels are sorted based on the header grade. So, grade 1 goes to channel 1, grade 2 goes to channel 2, and grade 3 goes to channel 3. Channel 4 is the default one.

Below are the two properties for multiplexing channel selector.

Property NameDefaultDescription
selector.typereplicatingThe component type name, needs to be multiplexing
selector.headerflume.selector.header
selector.default
selector.mapping.*

Custom channel selector -

This is users own implementation of the ChannelSelector interface. If you’re creating a custom channel selector, make sure to add its class and any other necessary bits to the agent’s classpath. Start Flume after adding the custom channel selector.

# custom channel selectors
<Agent_name>.sources = <Source-name>
<Agent_name>.channels = <Channel1>
<Agent_name>.sources.<source-name>.selector.type 
				= custom selector type

Example for agent named agt1 and its source called src1 -

agt1.sources = src1
agt1.channels = chnl
agt1.sources.src1.selector.type = example.ChannelSelector

Below are the two properties for custom channel selector.

Property NameDefaultDescription
selector.typeThe component type name, needs to be your FQCN

We can define the selector for any channel using the 'selector.type' property. Channel selectors work between the Source and the Channel. They decide which channel goes to which Sink and also determine the right HDFS cluster or HBase system to use. This way, everything stays organized and runs smoothly.