JSON parsing question - Elasticsearch+Kibana+Logstash 6.5


(Thiago) #1

Hi,
I'm a new user from Elastic and I'm trying to parse and index a json file using logstash, but so far I was able to do it but the fields were not indexed as expected - for example - all my fields were not indexed - I can only search the data as a String search.
Could someone help me and let me know what I'm probably doing wrong?

Each object from my JSON file is a single line (my file has thousands of objects). See an example:

{"_index":"test","_type":"log","_id":"jfdsjhadhjsdddshjkl","_score":1,"_source":{"@timestamp":"2019-01-07T02:02:21.567Z","beat":{"hostname":"ip-111-11-11-111","name":"ip-111-11-11-111","version":"1.0.0"},"field1":"test","field2":"test",..."fieldx":"test","fieldxy":{"__cachedRelations":{},"__data":{"field1":"test","field2":"test",..."fieldn":"test"},"__persisted":false,"__strict":false,"field1":"test"}],"source":"test","type":"log"}}

My Logstash conf:

input {
 stdin {
  type => "stdin-type"
 }
 file {
  path => ["mypath/myfile.json"]
  sincedb_path => "nul"
  start_position => "beginning"
 }
}
filter {
}
}
output {
 elasticsearch {
   hosts => ["localhost:9200"]
 }
}

I also tried to insert a filter - json { source => "message" - but I started to face a lot of mapping issues (_id, _type and _index metadata fields) that I could not find a way to solve.

Thanks


#2

Try adding the json_lines codec to your input: https://www.elastic.co/guide/en/logstash/current/plugins-codecs-json_lines.html


(Thiago) #3

No Luck Chris, my logstash starts with no errors but I can't see the index being created.

input {
 file {
  path => [mypath/myfiile.json"]
  sincedb_path => "null"
  start_position => "beginning"
  codec => "json_lines"
 }
}
filter {
}
output {
 elasticsearch {
   hosts => ["localhost:9200"]
 }
 stdout { 
  codec => rubydebug
 }
}

I also tried to apply new modifications after some additional research here in the discussion website, but now I am facing another issue:

Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-2019.01.08", :_type=>"doc", :routing=>nil}, #LogStash::Event:0x39d97ce5], :response=>{"index"=>{"_index"=>"logstash-2019.01.08", "_type"=>"doc", "_id"=>"G4Z1LmgBAfcjVD_FA9s7", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Field [_source] is defined both as an object and a field in [doc]"}}}}

Here is my conf file:

input {
 file {
  path => ["mypath/myfile.json"]
  sincedb_path => "null"
  start_position => "beginning"
 }
}
filter {
 json {
  source => "message"
 }
 mutate {
  rename => { "_id" => "idoriginal" }
  rename => { "_index" => "indexoriginal" }
  rename => { "_type" => "typeoriginal" }
 }
}
output {
 elasticsearch {
   hosts => ["localhost:9200"]
 }
 stdout { 
  codec => rubydebug
 }
}

Any other ideas?

Thanks


#4

I believe your JSON is not valid JSON. I took your example above and cleaned it up:

{"_index":"test","_type":"log","_id":"jfdsjhadhjsdddshjkl","_score":1,"_source":{"@timestamp":"2019-01-07T02:02:21.567Z","beat":{"hostname":"ip-111-11-11-111","name":"ip-111-11-11-111","version":"1.0.0"},"field1":"test","field2":"test","fieldx":"test","fieldxy":{"__cachedRelations":{},"__data":{"field1":"test","field2":"test","fieldn":"test"},"__persisted":false,"__strict":false,"field1":"test"},"source":"test","type":"log"}}

This is my LS config for testing:

/usr/share/logstash/bin/logstash -e 'input{stdin{codec=>"json_lines"}}output{stdout{codec=>rubydebug}}'

Output:

{
"@version" => "1",
"_type" => "log",
"_source" => {
"source" => "test",
"beat" => {
"name" => "ip-111-11-11-111",
"version" => "1.0.0",
"hostname" => "ip-111-11-11-111"
},
"field2" => "test",
"fieldx" => "test",
"type" => "log",
"fieldxy" => {
"__cachedRelations" => {},
"__data" => {
"field1" => "test",
"field2" => "test",
"fieldn" => "test"
},
"field1" => "test",
"__strict" => false,
"__persisted" => false
},
"field1" => "test",
"@timestamp" => "2019-01-07T02:02:21.567Z"
},
"_id" => "jfdsjhadhjsdddshjkl",
"host" => "",
"_index" => "test",
"@timestamp" => 2019-01-10T04:33:46.976Z,
"_score" => 1
}


#5

Your JSON document includes a top-level field called _source, which is an object. However, elasticsearch has its own use for that field.

If you start off with an empty index does the error go away?


(Thiago) #6

Thanks @Badger, but what do you mean with "start off with an empty index"?
I not created and not mapped the fields before execute this conf file. The index could be created automatic, right? About the mapping, is there a way to make it dynamically? Because I now that I can have additional fields in some specific lines.

Thanks


#7

You appear to be using daily indexes. Does it happen for the first document that you insert on a new day (noting that es rolls daily indexes at midnight UTC).


#8

Badger is right and I over looked your full JSON content. You are using metadata fields in the JSON and you accounted for most with exception of _source and apparently _score just gets ignored. Why do you have those fields in your JSON? It's almost as if you pulled the document from elasticsearch.

I changed the field names in the JSON and document indexes into ES perfectly:

{"my_index":"test","my_type":"log","my_id":"jfdsjhadhjsdddshjkl","_score":100,"my_source":{"@timestamp":"2019-01-07T02:02:21.567Z","beat":{"hostname":"ip-111-11-11-111","name":"ip-111-11-11-111","version":"1.0.0"},"field1":"test","field2":"test","fieldx":"test","fieldxy":{"__cachedRelations":{},"__data":{"field1":"test","field2":"test","fieldn":"test"},"__persisted":false,"__strict":false,"field1":"test"},"source":"test","type":"log"}}

You can control the type, index, and document_id via the ES output plugin if needed.

Elasticsearch Metatdata fields

This is the ES document created, checkout the metadata fields that are added:

{
"_index": "testing",
"_type": "doc",
"_id": "4GSSPWgBiOp7-GDzLdvq",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-01-11T15:40:16.633Z",
"my_index": "test",
"my_id": "jfdsjhadhjsdddshjkl",
"my_source": {
"field1": "test",
"@timestamp": "2019-01-07T02:02:21.567Z",
"fieldx": "test",
"field2": "test",
"fieldxy": {
"__persisted": false,
"field1": "test",
"__strict": false,
"__data": {
"field1": "test",
"field2": "test",
"fieldn": "test"
},
"__cachedRelations": {}
},
"source": "test",
"beat": {
"hostname": "ip-111-11-11-111",
"version": "1.0.0",
"name": "ip-111-11-11-111"
},
"type": "log"
},
"my_type": "log",
"_score": 100,
"host": "",
"@version": "1"
},
"fields": {
"@timestamp": [
"2019-01-11T15:40:16.633Z"
],
"my_source.@timestamp": [
"2019-01-07T02:02:21.567Z"
]
},
"sort": [
1547221216633
]
}


(Thiago) #9

Thanks @Badger and @Chris_Lyons.

My file is an elasticsearch extraction from another server (using elasticdump), that is why I have to import to my server now.
Unfortunately my file has around 100 million records and will be difficult to rename the "_source" field. Is there a way to rename it in my logstash conf file?

Thanks


#10

Yes, you can use mutate+rename


#11

You can give this a try too. You will need to install the prune filter:

input {
file {
path => ["mypath/myfile.json"]
sincedb_path => "null"
start_position => "beginning"
codec => "json_lines"
}
}
filter{
#Place your source into a new field which converts it back to a string
mutate{ add_field=>{"temp"=>"%{_source}"} }

#Leverage JSON filter to again parse the JSON and placing the contents of _source at the root of the document
json{ source=>"temp" }

#Run prune to clean things up
prune { blacklist_names => ["^_id$","^_index$","^_score$","^_type$","^_source$","^temp$"]}
}

Working With Plugins


(Thiago) #12

@Chris_Lyons @Badger
I used "mutate+rename" and that worked.

Thanks a lot for your support and help.


(system) closed #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.