Use saveJsonToEs and always keep the same _id field

Christophe_Journel · March 11, 2016, 4:04pm

Hello,

I would like to know whether there's a workaround to this issue with log parsing:

Before the 2.X version of Elasticsearch, i used this scala feature:
rdd.saveJsonToEs(indexAndType) with JSON, containing an _id field.

And my ES mapping contained the _id field as a string.

This field was created with SHA256 from a log line. Therefore each log line could be in ES, and if want to reindex all data, the new line created a "version 2" of the document _id.

Now, with 2.X _id feature, i cannot use saveJsonToEs, because i am not allowed to use _id in a JSON document.

I noticed that with bulk feature, i can add an _id field:

[root@server ~]# cat all.json
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-o"}}
{ "text": "toto" }
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-a"}}
{ "text": "tata" }
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-i"}}
{ "text": "titi" }

However, in scala, i just have a file with a huge number of json line:

{ "text": "toto", "_id":"uniq1"}
{ "text": "tata", "_id":"uniq2" }
{ "text": "titi", "_id":"uniq3" }
{ "text": "tutu", "_id":"uniq4" } 
[...]

How can i insert these data into ES and be sure that, if i reindex all data, they will have always the same _id ?

Thanks.

costin · March 12, 2016, 2:32pm

You should be able to achieve the same result by using saveToEsWithMeta which accepts key/values (key metadata, value the doc) and by setting es.input.json to true.
If the key is determined from the doc, you can do so by setting an extractor.

Christophe_Journel · March 14, 2016, 10:51am

thanks for your help, i'll try this
i'll post here the results of these tests.

Topic		Replies	Views
Define custom ID to a document with saveJsonToES() Elasticsearch es-hadoop	1	2694	January 16, 2018
[ANN] Elasticsearch for Apache Hadoop and HDFS repository 1.3 M2 have been released Elasticsearch	4	357	July 6, 2017
Indexing and "_id" question Elasticsearch	6	1548	July 6, 2017
Duplicate data in ES Elasticsearch	4	446	July 6, 2017
How do i use field as _id in elasticsearch 2.x? Elasticsearch	2	692	July 5, 2017

Use saveJsonToEs and always keep the same _id field

Related topics