[simple question] import JSON into elasticsearch

Kevin_Csuka · March 1, 2018, 4:46pm

I am new to Elasticsearch and I just need to use it once.

Just setup a working fresh Elasticsearch 6.2.2 and Kibana is working in my browser. I've got no indexes or other stuff, it just booted.

I've got one simple json file which has to be imported, see: https://pastebin.com/xm2iad4q
Is someone willing to provide me all the technical details on how to do it, instead of referring me to the Guides?

For a senior this takes one minute.

dadoonet · March 1, 2018, 5:25pm

So you should have a look at filebeat to read your file content and stream each line to elasticsearch.

Logstash can do it as well. With a stdin input plugin, and elasticsearch output plugin and a json codec I think you could do something like:

cat myfile > bin/logstash -f myconf.conf

Or write a shell script which takes every single line and add it to a bulk where you need to provide a header in addition to each line.

Does it help?

Kevin_Csuka · March 2, 2018, 11:48am

I choose filebeat.
The logfile states:

2018-03-02T11:41:50.844Z	INFO	log/harvester.go:216	Harvester started for file: /home/kcsuka/test.json
2018-03-02T11:42:20.847Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":30,"time":36},"total":{"ticks":40,"time":52,"value":0},"user":{"ticks":10,"time":16}},"info":{"ephemeral_id":"e6916c25-f65b-4b5a-b684-19f7e35f6100","uptime":{"ms":30015}},"memstats":{"gc_next":4194304,"memory_alloc":1590768,"memory_total":3203456,"rss":21905408}},"filebeat":{"events":{"added":3,"done":3},"harvester":{"open_files":1,"running":1,"started":1}},"libbeat":{"config":{"module":{"running":0},"reloads":1},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":0,"filtered":3,"total":3}}},"registrar":{"states":{"cleanup":1,"current":1,"update":3},"writes":3},"system":{"cpu":{"cores":4},"load":{"1":0,"15":0,"5":0.01,"norm":{"1":0,"15":0,"5":0.0025}}}}}}

Filebeat config:

filebeat.prospectors:
- paths:
    - /home/kcsuka/*.json
document_type: myapp
json.keys_under_root: true
json.add_error_key: true

[Album] imgur.com

How to move forward? Seems so simple...

dadoonet · March 2, 2018, 3:22pm

What gives

GET yourindename/_search

Kevin_Csuka · March 5, 2018, 12:21pm

name@host:~$ curl -XGET '<ip>:9200/filebeat-6.2.2-2018.03.02/_search?pretty'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "filebeat-6.2.2-2018.03.02",
        "_type" : "doc",
        "_id" : "SuiB5mEBV_lG4tafRHrT",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2018-03-02T11:38:04.776Z",
          "offset" : 50806,
          "error" : {
            "message" : "Error decoding JSON: invalid character ',' looking for beginning of value",
            "type" : "json"
          },
          "beat" : {
            "name" : "<hostname>",
            "hostname" : "<hostname>",
            "version" : "6.2.2"
          },
          "source" : "/home/<username>/file.json"
        }
      }
    ]
  }
}

dadoonet · March 5, 2018, 3:38pm

I thought that you had one line per document which is incorrect.
I looked again and it's a one line file with an array of json docs:

[{doc1},{doc2},{doc3},....]

I moved your question to #logstash as I think that you will need here some way to parse this and transform to JSON docs.
May be Logstash team can help better.

Anyway, I don't think you can use filebeat with the "json" options here. Either do that single document generation on your side or do that in Logstash.
Elasticsearch will need at the end that you send individual JSON documents.

Kevin_Csuka · March 11, 2018, 3:16pm

Anyone of the Logstash team can provide me the required details?
Thanks.

yaauie · March 11, 2018, 9:00pm

First problem: the file you have posted in the pastebin isn't valid JSON; in order for any tool to handle an input reliably, that input must adhere to the relevant specification.

I used a command line tool jsonlint to validate, but you can get similar results with jsonlint.com:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/122123-one-off-import-json-array }
╰─○ cat input-array.json | jsonlint --compact
line 1, col 408, found: ',' - expected: 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['.
[error: 1]

When we look at line 1, column 248, we see that there is an extra comma (,) character: },,{;
When the superfluous comma is removed, it is still not valid JSON:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/122123-one-off-import-json-array }
╰─○ cat input-array-2.json | jsonlint --compact
line 1, col 50803, found: '}' - expected: 'EOF'.
[error: 1]

Once that final superfluous closing } was removed, we have a valid json array:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/122123-one-off-import-json-array }
╰─○ cat input-array-3.json | jsonlint --compact
[
  # SNIP
]
[success]

A Logstash pipeline can be configured to read your file with logstash-input-file, which will read the file and emit each line to its codec, continuing to watch the file for additions; the input can be configured to use logstash-codec-json to create events, which when presented with a JSON array, will create one event per element in that array.

You may want to use one or more filters to modify or enrich your data (for example, you may want to use logstash-filter-date to set the @timestamp field used by Kibana to the event's dateOfSleep).

You'll then want to add logstash-output-elasticsearch to the end of your pipeline, to tell it where to put the events; by default, this output will create one index per day of data which probably isn't what you want here, so we can tell it to just create a single named index instead.

The resulting pipeline configuration will look something like this:

input {
  file {
    codec => json
    path => "/absolute/path/to/json/files/to/read"
    # additional file input configuration ...
  }
}

filter {
  date {
    match => ["dateOfSleep", "yyyy-dd-MM"]
  }
  # any filters you want
}

output {
  elasticsearch {
    hosts => ["localhost"]
    index => "sleep-quality"
    # additional elasticsearch output configuration
  }
}

We'll run this pipeline with logstash:

$ bin/logstash -f path/to/your/sleep_quality_pipeline.conf

The logstash-input-file plugin was made to track changes to a folder, emitting new lines to the codec so that it can emit new events into the pipeline every time a new file shows up or an existing file gets appended to. This means that you'll need to interrupt this pipeline (ctrl+c) once it is done, otherwise the process will just keep running.

Kevin_Csuka · March 14, 2018, 2:41pm

Thanks for your reply. I've got a valid json file now.
I followed your steps but I do not see any index pattern added in Elasticsearch.

Output of Logstash

[2018-03-14T15:34:10,365][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2018-03-14T15:34:10,371][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[2018-03-14T15:34:10,680][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.2.2"}
[2018-03-14T15:34:10,813][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-03-14T15:34:11,284][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-03-14T15:34:11,490][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2018-03-14T15:34:11,492][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-03-14T15:34:11,564][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-03-14T15:34:11,602][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>nil}
[2018-03-14T15:34:11,602][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-03-14T15:34:11,605][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-03-14T15:34:11,608][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-03-14T15:34:11,617][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost"]}
[2018-03-14T15:34:11,765][INFO ][logstash.pipeline        ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x58af2551@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:246 sleep>"}
[2018-03-14T15:34:11,782][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
[2018-03-14T15:36:51,677][WARN ][logstash.runner          ] SIGINT received. Shutting down.
[2018-03-14T15:36:51,920][FATAL][logstash.runner          ] SIGINT received. Terminating immediately..
[2018-03-14T15:36:52,087][FATAL][logstash.runner          ] SIGINT received. Terminating immediately..
[2018-03-14T15:36:52,126][ERROR][org.logstash.Logstash    ] org.jruby.exceptions.ThreadKill

Config file:

➜  bin cat  /etc/logstash/conf.d/test.conf
input {
  file {
    codec => json
    path => "/tmp/sleep.json"
    # additional file input configuration ...
  }
}

filter {
  date {
    match => ["dateOfSleep", "yyyy-dd-MM"]
  }
  # any filters you want
}

output {
  elasticsearch {
    hosts => ["localhost"]
    index => "sleep-quality"
    # additional elasticsearch output configuration
  }
}

The json file, so it is present at provided location:

➜  bin cat /tmp/sleep.json
[{
	"dateOfSleep": "2018-03-01",
  ....

[Album] Imgur

Full log with --debug option:
https://pastebin.com/5njEgsnS

yaauie · March 15, 2018, 5:20pm

Whoops; logstash-input-file keeps track of where it left off using a "sincedb", and by default won't re-emit bytes from a file that it has already read (that way, when people use this plugin to find "new" logs, they don't re-read everything each time they restart Logstash).

If we tell it that its sincedb is on your system's null device, it won't keep a record and will always read all files from the beginning:

input {
  file {
    codec => "json"
    path => "/absolute/path/to/json/files/to/read"
    sincedb_path => "/dev/null"
  }
}

Kevin_Csuka · March 16, 2018, 3:16pm

Sorry, but no index is added to Elasticsearch.
Full debug log: https://pastebin.com/ZSjv5bnW

Config:

➜  logstash cat /etc/logstash/conf.d/test.conf 
input {
  file {
    codec => "json"
    path => "/tmp/sleep.json"
    sincedb_path => "/dev/null"
  }
}

filter {
  date {
    match => ["dateOfSleep", "yyyy-dd-MM"]
  }
}

output {
  elasticsearch {
    hosts => ["localhost"]
    index => "sleep_quality_two"
  }
}

yaauie · March 17, 2018, 3:58am

Does your json have a trailing newline? If not, the json codec never gets the signal that it's done reading a chunk

Kevin_Csuka · March 17, 2018, 3:38pm

How should i ensure there is a trailing newline present?

JSON file: https://pastebin.com/0JeZ0UWp

I added multiple 'enters' at the end of the file and restarted logstash. Still no data added in Elasticsearch

Kevin_Csuka · March 20, 2018, 10:24am

Any idea :)?

system · April 17, 2018, 10:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SImple direct from log file JSON to elasticsearch Beats	4	1217	October 13, 2016
Getting json data into Elasticsearch Beats	1	302	September 5, 2019
How can I import json data to elasticsearch? Logstash	3	444	March 26, 2019
Import json Logstash	6	961	March 14, 2018
Simple example wanted: read and output json files to elasticsearch Logstash	4	2447	February 21, 2018

[simple question] import JSON into elasticsearch

Related topics