How to split a fluentd log data?

LmYjQ · April 27, 2017, 6:15am

here is one of my fluentd log data
2017-04-23T16:20:31+08:00 cv.product.access.mobile {"race":"album","video_id":43633036,"ip":"117.177.78.48","cdn":"cdn-web-qn.colorv.cn","act":"update","ad_type":"AdExchange","agent":"Mozilla/5.0 (Linux; Android 5.1.1; vivo X6SPlus D Build/LMY47V; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.49 Mobile MQQBrowser/6.2 TBS/043128 Safari/537.36 MicroMessenger/6.5.7.1041 NetType/WIFI Language/zh_CN","author_zone":0,"author_registered_at":"2015-08-02 13:06:05","post_id":0,"published_at":"","referer":"","reference_id":0,"sessid":"7e620e610447414bafe5091470f8b0b2","duration":144,"author_is_priest":0,"download_type":"myapp","author_udid":"d850a4a042d6382","status_404":"","author_version":"and-3.6.13-gdt","mold_id":10006,"url":"http://video.colorv.cn/play/43633036?from=timeline&isappinstalled=0&from=share","author_os":"and","page_kind":"mini","request_id":"ff6925b4c86b4d34be534a6609edfa2d","referrer_id":"","author_id":3934438,"play_time":60,"method":"GET","published":0}

I want to make a 2-step split.

the first step is to split it into 3 parts:
timestamp: 2017-04-23T16:20:31+08:00
log_type: cv.product.access.mobile
log_content: {the big json}
the second step is to split the log_content into many fields by each key in the json.

I have tried the following split for the first step, but it didn't work.

input {
        redis {
        host => "localhost"
        data_type => "list"
        key => "log_test"
        type => "redis-input"
      }
}

filter {
        split{
        terminator=> "\t"
        }

}

output {
        elasticsearch {
        hosts => "localhost:9200"
        codec => "json"
        index => "test_index"
        }
}

Could you give me some advice?Thank you in advance.

magnusbaeck · April 27, 2017, 6:28am

The split filter doesn't just split strings, it splits one event into multiple events. Use the mutate filter's split option or a grok filter to split the string. Then apply a json filter to the field with the JSON data.

LmYjQ · April 27, 2017, 6:50am

Thank you Magnus.
I have tried the following config:

input {
        redis {
        host => "localhost"
        data_type => "list"
        key => "log_test"
        type => "redis-input"
      }
}

filter {
        mutate {
        split => { "message" => "\t" }
        }

}

output {
        elasticsearch {
        hosts => "localhost:9200"
        codec => "json"
        index => "test_index"
        }
}

I have tried the split option in mutate, but it seems don't work.
The debug info of logstash are like this:

14:37:06.296 [[main]>worker12] DEBUG logstash.pipeline - filter received {"event"=>{"path"=>"/data1/logs/logs/20170423/access.log.20170423_1205.log", "@timestamp"=>2017-04-26T11:57:00.093Z, "@version"=>"1", "host"=>"superman", "message"=>"2017-04-23T17:03:22+08:00\tcv.product.access.mobile\t{"race":"video","video_id":82047461,"ip":"14.221.50.230","cdn":"cdn-web-qn.colorv.cn","act":"update","ad_type":"vip","agent":"Mozilla/5.0 (Linux; Android 4.2.2; S39h Build/16.0.A.0.47; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.49 Mobile MQQBrowser/6.2 TBS/043205 Safari/537.36 MicroMessenger/6.5.6.1020 NetType/WIFI Language/zh_CN","author_zone":6,"author_registered_at":"2017-01-05 23:19:54","post_id":0,"published_at":"2017-04-23 07:08:32","referer":"","reference_id":0,"sessid":"a18dc4330cfa49479a22cd45a26c8496","duration":350,"author_is_priest":0,"download_type":"myapp","author_udid":"82f45d148c38b1a6","status_404":"","author_version":"and-4.2.2","mold_id":10018,"url":"http://video.colorv.cn/play/82047461?u=82f45d148c38b1a6&p=video&c=1&cat=create&from=groupmessage&from=share\",\"author_os\":\"and\",\"page_kind\":\"mini\",\"request_id\":\"d5fe57f8f50a457da57e4c188a584c2d\",\"referrer_id\":\"\",\"author_id\":9267280,\"play_time\":283,\"method\":\"GET\",\"published\":1}", "type"=>"type_count"}}

14:37:06.296 [[main]>worker5] DEBUG logstash.pipeline - output received {"event"=>{"path"=>"/data1/logs/logs/20170423/access.log.20170423_1205.log", "@timestamp"=>2017-04-26T11:56:58.004Z, "@version"=>"1", "host"=>"superman", "message"=>["2017-04-23T17:03:21+08:00\tcv.product.access.mobile\t{"race":"video","video_id":82017250,"ip":"123.188.169.133","cdn":"cdn-web-qn.colorv.cn","act":"update","ad_type":"vip","agent":"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36 QBCore/3.43.27.400 QQBrowser/9.0.2524.400","author_zone":3,"author_registered_at":"2016-02-16 22:46:51","post_id":204,"published_at":"2017-04-22 18:26:28","referer":"","reference_id":0,"sessid":"e56e6d2afae7486cbc39278c22c62014","duration":260,"author_is_priest":0,"download_type":"myapp","author_udid":"d74ec40d817e1b0f","status_404":"","author_version":"and-4.2.2","mold_id":10001,"url":"http://video.colorv.cn/play/82017250?u=d74ec40d817e1b0f&p=my&c=6&cat=upload&from=share\",\"author_os\":\"and\",\"page_kind\":\"mini\",\"request_id\":\"53c5d840228a4ab4aacd4854c8bd87f9\",\"referrer_id\":\"\",\"author_id\":7095205,\"play_time\":15,\"method\":\"GET\",\"published\":9}"], "type"=>"type_count"}}

Am I missing some other settings?

magnusbaeck · April 27, 2017, 7:35am

The problem is probably that Logstash doesn't deal with escape sequences like \t in a consistent way. The easiest workaround is probably to use a grok filter for the parsing.

Kurt_S · April 27, 2017, 2:33pm

Try putting in the tab litteraly: split => { "message" => " " } # with " " <= this is a tab

Worked for me as split character in the csv filter plugin, and from what I remember it had something to do with how Ruby intrepreted \t and the tab.

LmYjQ · April 28, 2017, 1:33am

That's a good idea! Thank you Kurt. I will try it.

system · May 26, 2017, 1:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use split filter on the field using logstash Logstash	5	6330	March 27, 2019
Create fields by splitting a log Logstash	8	1261	April 2, 2020
Split json Logstash	2	242	April 22, 2020
How can I find some designated value after using split Logstash	4	371	July 29, 2020
Logstash filter not able to split Logstash	4	1016	July 6, 2017

How to split a fluentd log data?

Related topics