Split source value and create a custom field with splitted one

Hi All,

I have merged few Yarn logs in elasticsearch and in Kibana.

source=/data06/yarn/logs/application_1510757389739_0634/container_e15_1510757554319_0034_01_000001/hive2kafka-export29.log

From source, I want to extract 'application_1510757389739_0634' and add it to a new field applicationID.

I have created below filter in logstash:

if ([fields][log_type] == "yarnHive2kafkaLog") {
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} !%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}! %{GREEDYDATA:message}"}
}
mutate {
split => ["source", "/"]
add_field => { "applicationID" => "%{source[4]}" }
}
}

I could extract 'application_1510757389739_0634', however source field went for a toss and became a CSV value,

, data06, yarn, logs, application_1510757389739_0634, container_e15_1510757327319_0034_01_000001, hive2kafka-export30.log

Below Screenshot shows this.

How do I tackle with this?

Thanks in Advance.

You asked the mutate filter to split source into an array and that's what it did.

There are many ways to solve this, including making a copy of source before you split it or using a grok filter to extract the correct directory component.

Thanks Magnus.

I tried the suggested 2 ways:

1. copy of source by creating a temp variable as below,

if ([fields][log_type] == "yarnHive2kafkaLog") {
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} !%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}! %{GREEDYDATA:message}"}
}
mutate {
copy => { "source" => "source_tmp" }
split => ["source_tmp", "/"]
add_field => { "applicationID" => "%{source_tmp[3]}" }
}
}

I could not get EXTRACTED applicationID. Output is as below:

2. grok filter on source

snippet:
if ([fields][log_type] == "yarnHive2kafkaLog") {
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} !%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}! %{GREEDYDATA:message}"}

match => { "source" => "/%{GREEDYDATA:primaryDir}/%{GREEDYDATA:subDir1}/%{GREEDYDATA:subDir2}/%{GREEDYDATA:subDir3}/%{GREEDYDATA:containerID}/%{GREEDYDATA:fileName}"}
}
mutate {
add_field => { "applicationID" => "%{subDir3}" }
}
}

I have same Issue in this as well applicationID has value as %{subDir3}.

Could you please correct me on this?

mutate {
copy => { "source" => "source_tmp" }
split => ["source_tmp", "/"]
add_field => { "applicationID" => "%{source_tmp[3]}" }
}

split applies before copy. You need to use at least two mutate filters after one another.

Similar thing with the grok filter solution. The grok filter breaks after the first match so you need to match against source in a separate grok filter.

This resolved the Issue and Thanks a ton :ok_hand:

Working code snippets for both the approaches.

1. copy of source by creating a temp variable,

	if ([fields][log_type] == "yarnHive2kafkaLog") {
    grok {
            match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
         }
    mutate {
            copy => { "source" => "source_tmp" }
           }
    mutate {
            split => ["source_tmp", "/"]
            add_field => { "applicationID" => "%{source_tmp[4]}" }
           }                       
            }  

2. grok filter on source

	if ([fields][log_type] == "yarnHive2kafkaLog") {
    grok {
            match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
         }
    grok {
            match => { "source" => "/%{GREEDYDATA:primaryDir}/%{GREEDYDATA:subDir1}/%{GREEDYDATA:subDir2}/%{GREEDYDATA:subDir3}/%{GREEDYDATA:containerID}/%{GREEDYDATA:fileName}"}
            }
    mutate {
           add_field => { "applicationID" => "%{subDir3}" }
           }                       
            }
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.