[simple question] import JSON into elasticsearch

I am new to Elasticsearch and I just need to use it once.

Just setup a working fresh Elasticsearch 6.2.2 and Kibana is working in my browser. I've got no indexes or other stuff, it just booted.

I've got one simple json file which has to be imported, see: https://pastebin.com/xm2iad4q
Is someone willing to provide me all the technical details on how to do it, instead of referring me to the Guides?

For a senior this takes one minute.

So you should have a look at filebeat to read your file content and stream each line to elasticsearch.

Logstash can do it as well. With a stdin input plugin, and elasticsearch output plugin and a json codec I think you could do something like:

cat myfile > bin/logstash -f myconf.conf

Or write a shell script which takes every single line and add it to a bulk where you need to provide a header in addition to each line.

Does it help?

I choose filebeat.
The logfile states:

2018-03-02T11:41:50.844Z	INFO	log/harvester.go:216	Harvester started for file: /home/kcsuka/test.json
2018-03-02T11:42:20.847Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":30,"time":36},"total":{"ticks":40,"time":52,"value":0},"user":{"ticks":10,"time":16}},"info":{"ephemeral_id":"e6916c25-f65b-4b5a-b684-19f7e35f6100","uptime":{"ms":30015}},"memstats":{"gc_next":4194304,"memory_alloc":1590768,"memory_total":3203456,"rss":21905408}},"filebeat":{"events":{"added":3,"done":3},"harvester":{"open_files":1,"running":1,"started":1}},"libbeat":{"config":{"module":{"running":0},"reloads":1},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":0,"filtered":3,"total":3}}},"registrar":{"states":{"cleanup":1,"current":1,"update":3},"writes":3},"system":{"cpu":{"cores":4},"load":{"1":0,"15":0,"5":0.01,"norm":{"1":0,"15":0,"5":0.0025}}}}}}

Filebeat config:

filebeat.prospectors:
- paths:
    - /home/kcsuka/*.json
document_type: myapp
json.keys_under_root: true
json.add_error_key: true

How to move forward? Seems so simple...

What gives

GET yourindename/_search
name@host:~$ curl -XGET '<ip>:9200/filebeat-6.2.2-2018.03.02/_search?pretty'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "filebeat-6.2.2-2018.03.02",
        "_type" : "doc",
        "_id" : "SuiB5mEBV_lG4tafRHrT",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2018-03-02T11:38:04.776Z",
          "offset" : 50806,
          "error" : {
            "message" : "Error decoding JSON: invalid character ',' looking for beginning of value",
            "type" : "json"
          },
          "beat" : {
            "name" : "<hostname>",
            "hostname" : "<hostname>",
            "version" : "6.2.2"
          },
          "source" : "/home/<username>/file.json"
        }
      }
    ]
  }
}

I thought that you had one line per document which is incorrect.
I looked again and it's a one line file with an array of json docs:

[{doc1},{doc2},{doc3},....]

I moved your question to #logstash as I think that you will need here some way to parse this and transform to JSON docs.
May be Logstash team can help better.

Anyway, I don't think you can use filebeat with the "json" options here. Either do that single document generation on your side or do that in Logstash.
Elasticsearch will need at the end that you send individual JSON documents.

Anyone of the Logstash team can provide me the required details?
Thanks.

First problem: the file you have posted in the pastebin isn't valid JSON; in order for any tool to handle an input reliably, that input must adhere to the relevant specification.

I used a command line tool jsonlint to validate, but you can get similar results with jsonlint.com:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/122123-one-off-import-json-array }
╰─○ cat input-array.json | jsonlint --compact
line 1, col 408, found: ',' - expected: 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['.
[error: 1]

When we look at line 1, column 248, we see that there is an extra comma (,) character: },,{;
When the superfluous comma is removed, it is still not valid JSON:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/122123-one-off-import-json-array }
╰─○ cat input-array-2.json | jsonlint --compact
line 1, col 50803, found: '}' - expected: 'EOF'.
[error: 1]

Once that final superfluous closing } was removed, we have a valid json array:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/122123-one-off-import-json-array }
╰─○ cat input-array-3.json | jsonlint --compact
[
  # SNIP
]
[success]

A Logstash pipeline can be configured to read your file with logstash-input-file, which will read the file and emit each line to its codec, continuing to watch the file for additions; the input can be configured to use logstash-codec-json to create events, which when presented with a JSON array, will create one event per element in that array.

You may want to use one or more filters to modify or enrich your data (for example, you may want to use logstash-filter-date to set the @timestamp field used by Kibana to the event's dateOfSleep).

You'll then want to add logstash-output-elasticsearch to the end of your pipeline, to tell it where to put the events; by default, this output will create one index per day of data which probably isn't what you want here, so we can tell it to just create a single named index instead.

The resulting pipeline configuration will look something like this:

input {
  file {
    codec => json
    path => "/absolute/path/to/json/files/to/read"
    # additional file input configuration ...
  }
}

filter {
  date {
    match => ["dateOfSleep", "yyyy-dd-MM"]
  }
  # any filters you want
}

output {
  elasticsearch {
    hosts => ["localhost"]
    index => "sleep-quality"
    # additional elasticsearch output configuration
  }
}

We'll run this pipeline with logstash:

$ bin/logstash -f path/to/your/sleep_quality_pipeline.conf

The logstash-input-file plugin was made to track changes to a folder, emitting new lines to the codec so that it can emit new events into the pipeline every time a new file shows up or an existing file gets appended to. This means that you'll need to interrupt this pipeline (ctrl+c) once it is done, otherwise the process will just keep running.

Thanks for your reply. I've got a valid json file now.
I followed your steps but I do not see any index pattern added in Elasticsearch.

Output of Logstash

[2018-03-14T15:34:10,365][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2018-03-14T15:34:10,371][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[2018-03-14T15:34:10,680][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.2.2"}
[2018-03-14T15:34:10,813][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-03-14T15:34:11,284][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-03-14T15:34:11,490][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2018-03-14T15:34:11,492][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-03-14T15:34:11,564][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-03-14T15:34:11,602][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>nil}
[2018-03-14T15:34:11,602][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-03-14T15:34:11,605][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-03-14T15:34:11,608][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-03-14T15:34:11,617][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost"]}
[2018-03-14T15:34:11,765][INFO ][logstash.pipeline        ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x58af2551@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:246 sleep>"}
[2018-03-14T15:34:11,782][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
[2018-03-14T15:36:51,677][WARN ][logstash.runner          ] SIGINT received. Shutting down.
[2018-03-14T15:36:51,920][FATAL][logstash.runner          ] SIGINT received. Terminating immediately..
[2018-03-14T15:36:52,087][FATAL][logstash.runner          ] SIGINT received. Terminating immediately..
[2018-03-14T15:36:52,126][ERROR][org.logstash.Logstash    ] org.jruby.exceptions.ThreadKill

Config file:

➜  bin cat  /etc/logstash/conf.d/test.conf
input {
  file {
    codec => json
    path => "/tmp/sleep.json"
    # additional file input configuration ...
  }
}

filter {
  date {
    match => ["dateOfSleep", "yyyy-dd-MM"]
  }
  # any filters you want
}

output {
  elasticsearch {
    hosts => ["localhost"]
    index => "sleep-quality"
    # additional elasticsearch output configuration
  }
}

The json file, so it is present at provided location:

➜  bin cat /tmp/sleep.json
[{
	"dateOfSleep": "2018-03-01",
  ....

Full log with --debug option:
https://pastebin.com/5njEgsnS

Whoops; logstash-input-file keeps track of where it left off using a "sincedb", and by default won't re-emit bytes from a file that it has already read (that way, when people use this plugin to find "new" logs, they don't re-read everything each time they restart Logstash).

If we tell it that its sincedb is on your system's null device, it won't keep a record and will always read all files from the beginning:

input {
  file {
    codec => "json"
    path => "/absolute/path/to/json/files/to/read"
    sincedb_path => "/dev/null"
  }
}

Sorry, but no index is added to Elasticsearch.
Full debug log: https://pastebin.com/ZSjv5bnW

Config:

➜  logstash cat /etc/logstash/conf.d/test.conf 
input {
  file {
    codec => "json"
    path => "/tmp/sleep.json"
    sincedb_path => "/dev/null"
  }
}

filter {
  date {
    match => ["dateOfSleep", "yyyy-dd-MM"]
  }
}

output {
  elasticsearch {
    hosts => ["localhost"]
    index => "sleep_quality_two"
  }
}

Does your json have a trailing newline? If not, the json codec never gets the signal that it's done reading a chunk :weary:

How should i ensure there is a trailing newline present?

JSON file: https://pastebin.com/0JeZ0UWp

I added multiple 'enters' at the end of the file and restarted logstash. Still no data added in Elasticsearch

Any idea :)?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.