Looking for working example data set to bulk index into ES6

I created a thread before, which I then closed, you can view it here:

I'm looking for a sample dataset that I can index into Elasticsearch 6 using bulk indexing.

I tried using this example:

I downloaded the dataset, removed the metadata+action using awk and then tried to add my own index and types so I could practice how to index a json dataset that wasn't correctly formatted for bulk indexing.
This example on this page: Exploring Your Data | Elasticsearch Reference [6.2] | Elastic would throw errors when I tried to index it with:

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary @out.json

Btw this page has a typo:

The example shows:

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

The example should be:

curl -s -H "Content-Type: application/x-ndjson" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

I tried indexing after formatting it like this:

{ "index": { "_index": "bank_account", "_type": "account" }}
(0, '{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}\n')
{ "index": { "_index": "bank_account", "_type": "account" }}
(1, '{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}\n')
{ "index": { "_index": "bank_account", "_type": "account" }}
(2, '{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"}\n')
{ "index": { "_index": "bank_account", "_type": "account" }}
(3, '{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"}\n')
{ "index": { "_index": "bank_account", "_type": "account" }}
(4, '{"account_number":20,"balance":16418,"firstname":"Elinor","lastname":"Ratliff","age":36,"gender":"M","address":"282 Kings Place","employer":"Scentric","email":"elinorratliff@scentric.com","city":"Ribera","state":"WA"}\n')
{ "index": { "_index": "bank_account", "_type": "account" }}
(5, '{"account_number":25,"balance":40540,"firstname":"Virginia","lastname":"Ayala","age":39,"gender":"F","address":"171 Putnam Avenue","employer":"Filodyne","email":"virginiaayala@filodyne.com","city":"Nicholson","state":"PA"}\n')
{ "index": { "_index": "bank_account", "_type": "account" }}
(6, '{"account_number":32,"balance":48086,"firstname":"Dillard","lastname":"Mcpherson","age":34,"gender":"F","address":"702 Quentin Street","employer":"Quailcom","email":"dillardmcpherson@quailcom.com","city":"Veguita","state":"IN"}\n')
{ "index": { "_index": "bank_account", "_type": "account" }}

I try index it with:

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary @out.json

Then I get this error:

{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"o3fb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"pHfb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"pXfb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"pnfb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"p3fb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"qHfb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"qXfb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"qnfb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}},{"index":{"_index":"bank_account","_type":"account","_id":"q3fb3WIBobNcAQzMkHb4","status":400,"error":{"type":"mapper_parsing_exception","reas

I got the auto generate id posting right with the bulk api via:

$ chmod +x convert.py 
$ cat convert.py 
#!/usr/bin/env python

src_file = 'src_file.json'
dest_file = 'dest_file.json'
metadata = '{"index": {"_index": "bank_accounts", "_type": "account"}}'

with open(src_file) as open_file:
    lines = open_file.readlines()

lines = [line.replace(' ', '') for line in lines]

with open(dest_file, 'w') as f:
    for each_line in lines:
        f.write(metadata + '\n')
        f.writelines(each_line)

The original file:

$ head -4 file.json 
{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}

Removing the initial metadata:

$ cat file.json | grep account_number >> src_file.json
$ ./convert.py 

Previewing the destination file:

$ head -4 dest_file.json 
{"index": {"_index": "bank_accounts", "_type": "account"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880HolmesLane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index": {"_index": "bank_accounts", "_type": "account"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671BristolStreet","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}

Looking at my current indices:

$ curl http://localhost:9200/_cat/indices?v
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .monitoring-es-6-2018.05.06 3OgdIbDWQWCR8WJlQTXr9Q   1   1     114715            6      104mb           50mb

Ingesting the data via Bulk API:

$ curl -s -H 'Content-Type: application/json' -XPOST localhost:9200/_bulk --data-binary @dest_file.json 

Looking at my indices to verify that the index exist:

$ curl http://localhost:9200/_cat/indices?v
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   bank_accounts               u37MQvzhSPe97BJzp1u49Q   5   1       1000            0    296.4kb           690b
green  open   .monitoring-es-6-2018.05.06 3OgdIbDWQWCR8WJlQTXr9Q   1   1     114750            6    103.9mb         49.9mb

Looking at one document: :smiley:

$ curl 'http://localhost:9200/bank_accounts/_search?pretty&size=1'
{
  "took" : 641,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank_accounts",
        "_type" : "account",
        "_id" : "cohJN2MBCa89A-FEmiJs",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 6,
          "balance" : 5686,
          "firstname" : "Hattie",
          "lastname" : "Bond",
          "age" : 36,
          "gender" : "M",
          "address" : "671BristolStreet",
          "employer" : "Netagy",
          "email" : "hattiebond@netagy.com",
          "city" : "Dante",
          "state" : "TN"
        }
      }
    ]
  }
}
1 Like

Thanks Ruan. Seemed like I had an issue with my code(silly me).
Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.