Where to find examples of bulk indexing for ES6?


(Timogoosen) #1

I have a bunch of notes on bulk indexing with examples and code that I used to use to use for indexing data into Elasticsearch via the bulk index API. None of the examples seem to work and I run into all kinds of errors. I wanted to know if anyone can point me to a good tutorial to help me figure out what I could possibly be doing wrong. I know since version 6 allot has changed for example this: https://www.elastic.co/blog/strict-content-type-checking-for-elasticsearch-rest-requests where ES6 checks the content type of the body of REST requests. This thus means that if you are going to be doing bulk index operations for example of a json file using curl you would have to add the content type now, thus the request would have to look something like this:

$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary @out.json

Are there any other changes that I need to be aware of and are there any changes to the bulk index format that is accepted by Elasticsearch's _bulk index API? Thanks


Looking for working example data set to bulk index into ES6
(Alexander Reelsen) #2

the format itself has not changed. See https://www.elastic.co/guide/en/elasticsearch/reference/6.2/docs-bulk.html

Can you be more specific what you mean with all kinds of errors, so we can see what we can do to improve documentation in that regard?

Thanks a lot!


(Timogoosen) #4

I created another post about this that was a bit more descriptive.
See here: Looking for working example data set to bulk index into ES6
Seems I havn't solved the issue yet.

You can see a video of what I'm trying to do here:
https://asciinema.org/a/iqONeVdBwCjAO0cHJ07HPWlbU

It comes down to: I want to be able to index data and only specify the index and type. In the documentation it specifies that you don't need to specify the id each entry for a type.

The video shows step by step what I'm trying to do and the error I'm getting. Let me know if you can't open the video and I'lll just copy paste all my steps in here.


(Timogoosen) #5

From what I understand the id's should be auto generated. See this doc: https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html
"If no _id is specified, an ID will be autogenerated:"

Then the documentation shows the request body where the id is not specified (implying it will be autogenerated):

{ "index": { "_index": "website", "_type": "blog" }}
{ "title":    "My second blog post" }

(Christian Dahlqvist) #6

These examples work for me:

curl -XPOST localhost:9200/_bulk -H 'Content-Type: application/json' -d'
{ "index": { "_index": "website", "_type": "blog" }}
{ "title":    "My first blog post" }
'
curl -XPOST localhost:9200/website/blog/_bulk -H 'Content-Type: application/json' -d'
{ "index": {}}
{ "title":    "My second blog post" }
'

(Timogoosen) #7

Your examples work for me too. If you have a look at my examples they were:

Which means that my examples were making use of a different content type which is
"Content-Type: application/x-ndjson"

Where your examples are making use of :"Content-Type: application/json"

i can give you the steps by step of what I've been doing so far,but it is also on the video that I linked earlier.


(Christian Dahlqvist) #8

Looking at the file at the 3:30 mark in your video, it looks like your document lines might not be correctly formatted.


(Timogoosen) #9

I see someone else solved it also in my old thread.

Thanks for your help. Seemed like I had a small issue with how I was iterating over each document in my python code. Thanks for your help!


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.