How to split a fluentd log data?

here is one of my fluentd log data
2017-04-23T16:20:31+08:00 cv.product.access.mobile {"race":"album","video_id":43633036,"ip":"117.177.78.48","cdn":"cdn-web-qn.colorv.cn","act":"update","ad_type":"AdExchange","agent":"Mozilla/5.0 (Linux; Android 5.1.1; vivo X6SPlus D Build/LMY47V; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.49 Mobile MQQBrowser/6.2 TBS/043128 Safari/537.36 MicroMessenger/6.5.7.1041 NetType/WIFI Language/zh_CN","author_zone":0,"author_registered_at":"2015-08-02 13:06:05","post_id":0,"published_at":"","referer":"","reference_id":0,"sessid":"7e620e610447414bafe5091470f8b0b2","duration":144,"author_is_priest":0,"download_type":"myapp","author_udid":"d850a4a042d6382","status_404":"","author_version":"and-3.6.13-gdt","mold_id":10006,"url":"http://video.colorv.cn/play/43633036?from=timeline&isappinstalled=0&from=share","author_os":"and","page_kind":"mini","request_id":"ff6925b4c86b4d34be534a6609edfa2d","referrer_id":"","author_id":3934438,"play_time":60,"method":"GET","published":0}

I want to make a 2-step split.

  1. the first step is to split it into 3 parts:
    timestamp: 2017-04-23T16:20:31+08:00
    log_type: cv.product.access.mobile
    log_content: {the big json}
  2. the second step is to split the log_content into many fields by each key in the json.

I have tried the following split for the first step, but it didn't work.

input {
        redis {
        host => "localhost"
        data_type => "list"
        key => "log_test"
        type => "redis-input"
      }
}

filter {
        split{
        terminator=> "\t"
        }

}

output {
        elasticsearch {
        hosts => "localhost:9200"
        codec => "json"
        index => "test_index"
        }
}

Could you give me some advice?Thank you in advance.

The split filter doesn't just split strings, it splits one event into multiple events. Use the mutate filter's split option or a grok filter to split the string. Then apply a json filter to the field with the JSON data.

Thank you Magnus.
I have tried the following config:

input {
        redis {
        host => "localhost"
        data_type => "list"
        key => "log_test"
        type => "redis-input"
      }
}

filter {
        mutate {
        split => { "message" => "\t" }
        }

}

output {
        elasticsearch {
        hosts => "localhost:9200"
        codec => "json"
        index => "test_index"
        }
}

I have tried the split option in mutate, but it seems don't work.
The debug info of logstash are like this:

14:37:06.296 [[main]>worker12] DEBUG logstash.pipeline - filter received {"event"=>{"path"=>"/data1/logs/logs/20170423/access.log.20170423_1205.log", "@timestamp"=>2017-04-26T11:57:00.093Z, "@version"=>"1", "host"=>"superman", "message"=>"2017-04-23T17:03:22+08:00\tcv.product.access.mobile\t{"race":"video","video_id":82047461,"ip":"14.221.50.230","cdn":"cdn-web-qn.colorv.cn","act":"update","ad_type":"vip","agent":"Mozilla/5.0 (Linux; Android 4.2.2; S39h Build/16.0.A.0.47; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.49 Mobile MQQBrowser/6.2 TBS/043205 Safari/537.36 MicroMessenger/6.5.6.1020 NetType/WIFI Language/zh_CN","author_zone":6,"author_registered_at":"2017-01-05 23:19:54","post_id":0,"published_at":"2017-04-23 07:08:32","referer":"","reference_id":0,"sessid":"a18dc4330cfa49479a22cd45a26c8496","duration":350,"author_is_priest":0,"download_type":"myapp","author_udid":"82f45d148c38b1a6","status_404":"","author_version":"and-4.2.2","mold_id":10018,"url":"http://video.colorv.cn/play/82047461?u=82f45d148c38b1a6&p=video&c=1&cat=create&from=groupmessage&from=share\",\"author_os\":\"and\",\"page_kind\":\"mini\",\"request_id\":\"d5fe57f8f50a457da57e4c188a584c2d\",\"referrer_id\":\"\",\"author_id\":9267280,\"play_time\":283,\"method\":\"GET\",\"published\":1}", "type"=>"type_count"}}

14:37:06.296 [[main]>worker5] DEBUG logstash.pipeline - output received {"event"=>{"path"=>"/data1/logs/logs/20170423/access.log.20170423_1205.log", "@timestamp"=>2017-04-26T11:56:58.004Z, "@version"=>"1", "host"=>"superman", "message"=>["2017-04-23T17:03:21+08:00\tcv.product.access.mobile\t{"race":"video","video_id":82017250,"ip":"123.188.169.133","cdn":"cdn-web-qn.colorv.cn","act":"update","ad_type":"vip","agent":"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36 QBCore/3.43.27.400 QQBrowser/9.0.2524.400","author_zone":3,"author_registered_at":"2016-02-16 22:46:51","post_id":204,"published_at":"2017-04-22 18:26:28","referer":"","reference_id":0,"sessid":"e56e6d2afae7486cbc39278c22c62014","duration":260,"author_is_priest":0,"download_type":"myapp","author_udid":"d74ec40d817e1b0f","status_404":"","author_version":"and-4.2.2","mold_id":10001,"url":"http://video.colorv.cn/play/82017250?u=d74ec40d817e1b0f&p=my&c=6&cat=upload&from=share\",\"author_os\":\"and\",\"page_kind\":\"mini\",\"request_id\":\"53c5d840228a4ab4aacd4854c8bd87f9\",\"referrer_id\":\"\",\"author_id\":7095205,\"play_time\":15,\"method\":\"GET\",\"published\":9}"], "type"=>"type_count"}}

Am I missing some other settings?

The problem is probably that Logstash doesn't deal with escape sequences like \t in a consistent way. The easiest workaround is probably to use a grok filter for the parsing.

Try putting in the tab litteraly: split => { "message" => " " } # with " " <= this is a tab

Worked for me as split character in the csv filter plugin, and from what I remember it had something to do with how Ruby intrepreted \t and the tab.

1 Like

That's a good idea! Thank you Kurt. I will try it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.