Hello,
I would like to know whether there's a workaround to this issue with log parsing:
Before the 2.X version of Elasticsearch, i used this scala feature:
rdd.saveJsonToEs(indexAndType)
with JSON, containing an _id field.
And my ES mapping contained the _id field as a string.
This field was created with SHA256 from a log line. Therefore each log line could be in ES, and if want to reindex all data, the new line created a "version 2" of the document _id.
Now, with 2.X _id feature, i cannot use saveJsonToEs, because i am not allowed to use _id in a JSON document.
I noticed that with bulk feature, i can add an _id field:
[root@server ~]# cat all.json
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-o"}}
{ "text": "toto" }
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-a"}}
{ "text": "tata" }
{"index": {"_index": "testindex", "_type": "typeblabla", "_id": "uniq1-i"}}
{ "text": "titi" }
However, in scala, i just have a file with a huge number of json line:
{ "text": "toto", "_id":"uniq1"}
{ "text": "tata", "_id":"uniq2" }
{ "text": "titi", "_id":"uniq3" }
{ "text": "tutu", "_id":"uniq4" }
[...]
How can i insert these data into ES and be sure that, if i reindex all data, they will have always the same _id ?
Thanks.