Hi,
I was going through online documentation to find if there is any way I can try to solve my use case while using ES for my document search.
My Use case is like this ..
Loaded bunch of document in ES lets say 10k.
Client can search these documents with filters via standard API/SDK/POST while also pass some custome information as separate json object.
custom Request {
// ES specific search,
extraParams:{ "firstParam": "firstValue"
"secondParam":"secondValue",
}
}
Now while processing ES shard node should call a Rest endpoint passing "extraParam" and some fields from resulted document to rest endpoint and rest result will be stored on requested document to be sent in response while returning this document, also if requested by client we can sort the results on this new calculated field also.
This should be supporting pagination also while sorting on this new calculated field.
It will be great if we can batch multiple documents in single post call to REST end point.
Note : I was looking options and found Ingest api which does some of this where we can add new field on the document before indexing but in my case its reverse, where I want to calculate new field on the results of the filtering and return this new field also to requester.
As far as I know there is no built in way to call out during a search. It however sounds like this would be a very slow and expensive way to search. If you can describe what you are trying to achieve from a high level using this someone might be able to come up with more efficient alternatives.
High level we are trying to do some processing for filtered documents (ES search Results) along with some other static data ( applicable uniformly to all search results but input from clients ) through a rest call.
Other alternative is to fetch all the filtered data and apply this logic at Orchestration Layer, but in this case we need to fetch all result from ES (approximate 500K) under one sec to achieve our SLA. Which is kind of deep pagination by using Search/scan/scroll. To achieve this amount of fetch under one sec only option is to to a parallel fetch for all the pages of search which seems issue with ES because of..
Search have default fetch size of 10K, even if we change it deep pages will be slow because of large Offset.
Scan/Scroll is giving you cursor for next fetch that means its internally supported for sequential fetch and also we can not move backwards.
Any help of guidance really help us to move forward.
Performing individual call-outs for several thousand records from within Elasticsearch would probably not be much faster. I still do not understand what you are looking to do based on your description. A more concrete example would be helpful.
Hi,
Our cases is that we need to do some further processing on the filtered result from ES, while whole processing should me completed in say 3-4 secs.
so we have two options ..
Either fetch all the filtered results form ES with in 12 secs. and process retrieved data parallely toachive performance.
Ask ES to use post processing of th resuted data as it does with Ingest API for incoming data.
Call a rest endpoint which will use above mentioned Data sets and return response. Logic implemented in the rest call is well is implemented to support Batch processing of request under milliseconds.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.