Hello! I want to use search after to process all documents but document hasn't any unique field, only _id. That's why I want use _id for sorting. But there are some important note in search after docs about my case. I have several questions:
Is overhead really big?
Can I create script field with source doc['_id'].value or overhead will not disappear?
Can I set doc_value = true for _id field (I using ids generated by myself, not auto-generated by elastic)?
You mention wanting to use search-after to "process all documents" - before I answer your question, I want to ask if the Scroll API might be suited to your use case - this would alleviate the issues you're having with search-after. If you're processing all documents returned from a search all at once and just need the results returned in batches, consider using the scroll API instead.
Search-after is the correct choice, though, if your workload 1) has large delays between retrieving batches of results, or 2) has many clients which need to maintain independent contexts.
To answer your question:
The overhead is pretty significant - we generally don't make recommendations like that in our docs if we aren't pretty sure that it will cause problems.
I don't believe using a script field would be any better - the problem is how the _id field is stored on disk in comparison to fields with doc_values enabled.
No, unfortunately this is not currently possible, which is why recommend copying the _id into a regular document field.
Scroll API documentation has note: Scrolling is not intended for real time user requests, but real time user requests is my case, that is why this api not good for me.
Well, then i will copy _id to doc field as recommended.
Can you tell me or may be share some article why _id has this significant restrictions?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.