I haven't been able to find a way to tell ES to use the _id property in my
source instead of auto-generating one.
I think my use case justifies a need for this. I write all of my indexed
JSON objects to S3 for backup purposes. When I read these files out of S3, I
read them into a byte array and I would like to directly index these values.
However, if I create my IndexRequest without specifying an index, I was
hoping that ES would use the _id value from my source. Instead, ES generates
a new id and compares it to my source's id and tells me there is a mismatch.
elasticsearch needs to know the id in order to control the routing of the
document indexed to a shard. It can, potentially, do that for you (and parse
it twice), or you can do it yourself. Open an issue to allow for a mapping
setting to "search" for the id (something like id path, similar in nature to
routing path).
I haven't been able to find a way to tell ES to use the _id property in my
source instead of auto-generating one.
I think my use case justifies a need for this. I write all of my indexed
JSON objects to S3 for backup purposes. When I read these files out of S3, I
read them into a byte array and I would like to directly index these values.
However, if I create my IndexRequest without specifying an index, I was
hoping that ES would use the _id value from my source. Instead, ES generates
a new id and compares it to my source's id and tells me there is a mismatch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.