Multiline JSON file not ingested with logstash to ElasticSearch


(Gaurav) #1

I have a logstash config file as
input{

file {
type => "json"
path => "/home/$user/MOCK_DATA.json"
start_position => "beginning"
codec => multiline
{
pattern => '^{'
what => previous
}
}

}

output{

elasticsearch {

hosts => ["localhost:9200"]

}
stdout { codec => rubydebug }
}

The logstash starts and keeps printing below
[2019-04-17T00:48:16,622][DEBUG][logstash.outputs.stdout ] config LogStash::Outputs::Stdout/@id = "7987e564e4590724503a51b0f02cf750dbde09f3-6"
[2019-04-17T00:48:16,622][DEBUG][logstash.outputs.stdout ] config LogStash::Outputs::Stdout/@enable_metric = true
[2019-04-17T00:48:16,622][DEBUG][logstash.outputs.stdout ] config LogStash::Outputs::Stdout/@workers = 1
[2019-04-17T00:48:16,628][DEBUG][logstash.agent ] starting agent
[2019-04-17T00:48:16,632][DEBUG][logstash.agent ] starting pipeline {:id=>"main"}
[2019-04-17T00:48:16,636][DEBUG][logstash.filters.csv ] CSV parsing options {:col_sep=>",", :quote_char=>"""}
[2019-04-17T00:48:16,639][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>1000}
[2019-04-17T00:48:16,826][INFO ][logstash.pipeline ] Pipeline main started
[2019-04-17T00:48:16,830][DEBUG][logstash.inputs.file ] _globbed_files: /home/MOCK_DATA.json"]
[2019-04-17T00:48:16,830][DEBUG][logstash.inputs.file ] _discover_file: /home/MOCK_DATA.json (exclude is )
[2019-04-17T00:48:16,831][DEBUG][logstash.inputs.file ] _open_file: /home/MOCK_DATA.json: opening
[2019-04-17T00:48:16,832][DEBUG][logstash.inputs.file ] /home/MOCK_DATA.json: sincedb last value 143901, cur size 143901
[2019-04-17T00:48:16,832][DEBUG][logstash.inputs.file ] /home/MOCK_DATA.json: sincedb: seeking to 143901
[2019-04-17T00:48:16,837][DEBUG][logstash.agent ] Starting puma
[2019-04-17T00:48:16,838][DEBUG][logstash.agent ] Trying to start WebServer {:port=>9600}
[2019-04-17T00:48:16,840][DEBUG][logstash.api.service ] [api-service] start
[2019-04-17T00:48:16,890][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-04-17T00:48:20,963][DEBUG][logstash.instrument.periodicpoller.cgroup] One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
[2019-04-17T00:48:21,832][DEBUG][logstash.pipeline ] Pushing flush onto pipeline
[2019-04-17T00:48:25,969][DEBUG][logstash.instrument.periodicpoller.cgroup] One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
[2019-04-17T00:48:26,834][DEBUG][logstash.pipeline ] Pushing flush onto pipeline
[2019-04-17T00:48:30,887][DEBUG][logstash.inputs.file ] _globbed_files: /home/MOCK_DATA.json"]
[2019-04-17T00:48:30,973][DEBUG][logstash.instrument.periodicpoller.cgroup] One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu

JSOn file has below content

[
{"employee_id":780,"first_name":"Zuzana","last_name":"Ames","email":"zamesln@jugem.jp","gender":"Female","ip_address":"84.193.133.88"},
{"employee_id":781,"first_name":"Emilee","last_name":"Glavias","email":"eglaviaslo@omniture.com","gender":"Female","ip_address":"100.119.249.85"},
{"employee_id":782,"first_name":"Ford","last_name":"De Robertis","email":"fderobertislp@wikispaces.com","gender":"Male","ip_address":"159.81.140.145"}
]

There is no data in Elastic search . Can someone help on what's wrong here ?


#2
[2019-04-17T00:48:16,831][DEBUG][logstash.inputs.file ] _open_file: /home/MOCK_DATA.json: opening
[2019-04-17T00:48:16,832][DEBUG][logstash.inputs.file ] /home/MOCK_DATA.json: sincedb last value 143901, cur size 143901
[2019-04-17T00:48:16,832][DEBUG][logstash.inputs.file ] /home/MOCK_DATA.json: sincedb: seeking to 143901

logstash has already read the file, so it skips to the end of the file and waits for data to be appended to it. If you want to force logstash to re-read it you can set

sincedb_path => "/dev/null"

The way you have configured the multiline you will get JSON arrays that are missing the final ']' followed by a separate event that contains the ']'. You can drop the latter using

if [message] == "]" { drop {} }

If you want to parse the json then use

mutate { gsub => [ "message", "\Z", "]" ] }
json { source => "message" target => "someField" }