Use saveJsonToEs and always keep the same _id field

Hello,

I would like to know whether there's a workaround to this issue with log parsing:

Before the 2.X version of Elasticsearch, i used this scala feature:
rdd.saveJsonToEs(indexAndType) with JSON, containing an _id field.

And my ES mapping contained the _id field as a string.

This field was created with SHA256 from a log line. Therefore each log line could be in ES, and if want to reindex all data, the new line created a "version 2" of the document _id.

Now, with 2.X _id feature, i cannot use saveJsonToEs, because i am not allowed to use _id in a JSON document.

I noticed that with bulk feature, i can add an _id field:

[root@server ~]# cat all.json
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-o"}}
{ "text": "toto" }
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-a"}}
{ "text": "tata" }
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-i"}}
{ "text": "titi" }

However, in scala, i just have a file with a huge number of json line:

{ "text": "toto", "_id":"uniq1"}
{ "text": "tata", "_id":"uniq2" }
{ "text": "titi", "_id":"uniq3" }
{ "text": "tutu", "_id":"uniq4" } 
[...]

How can i insert these data into ES and be sure that, if i reindex all data, they will have always the same _id ?

Thanks.

You should be able to achieve the same result by using saveToEsWithMeta which accepts key/values (key metadata, value the doc) and by setting es.input.json to true.
If the key is determined from the doc, you can do so by setting an extractor.

thanks for your help, i'll try this :slight_smile:
i'll post here the results of these tests.