Query Millions of records in Elasticsearch

Hi,

I have an index with 6 Crores of records. My usecase is to read the
entire index, check each record, whether it is present in new index or
not.If not I have to index into new index. I used scan and scroll operation
to read the index using JAVA Api. But this process is taking lot of time
i.e., to process 50,000 rcds it is taking 8 min. Can anyone suggest me how
I can configure or change my queries.

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4014e3f0-2ce6-48d9-afdd-e438857e85f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Mon, Dec 8, 2014 at 9:11 AM, Sushmitha Chakka <
sushmitha@sigmoidanalytics.com>

Hi,

I have an index with 6 Crores of records. My usecase is to read the
entire index, check each record, whether it is present in new index or
not.If not I have to index into new index. I used scan and scroll operation
to read the index using JAVA Api. But this process is taking lot of time
i.e., to process 50,000 rcds it is taking 8 min. Can anyone suggest me how
I can configure or change my queries.

I know from experience that scan/scroll can handle batch sizes in the low
thousands without trouble so you should give that a shot. Each scroll call
should be quite quick. It might be a good idea to post a JSON recreation
of your problem so we can see what is happening. Usually the slow part of
the scan/scroll into new index is the batch calls to add the documents into
the new index. And whether or not that is "slow" is really dependant on
the size of the documents, the complexity of any scripts you use on import,
your disk speed, the complexity of your analysis, your cpu speed, the merge
settings you use. That list is roughly in order of how likely I've seen
things effect import speed.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0twuaDy_qL-Gjv1Bv2qpee%2BREY14s8T2%2BUbgDgsp8kSw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.