Effiecient way to store the result of a large slow query

Chen_Wang · June 20, 2014, 7:58am

Hi guys,
Just wondering what is the most efficient way of executing a query that
takes time(parent/child documents) and returns large amount of entries, and
store the result in randomly evenly divided block to HDFS? e.g, the query
will return 100million records and I want every random 1million stored in a
different location(file/folder) on HDFS.

I assume I could execute the query with scroll, and then whenever I
received the 1 million records back, I then spawn anther thread to commit
it to HDFS? Is there a way to run the query distributed way and have 100
threads query ES at the same time and each getting a random 1million
back(without duplicate)? will ES hadoop help in this case?

Appreciate your input!
Chen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACim9Rm64uHE9EQ35r_mJr9VhiEbDfD-70vS1uQHSG6UXM7ZDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Index data in HDFS and Elasticsearch query it from HDFS Elasticsearch	1	427	July 6, 2017
Elasticsearch and Hadoop Questions Elasticsearch	10	377	July 6, 2017
Storing large amount of data in ES Elasticsearch	3	1354	July 6, 2017
Full data in HDFS and Elasticsearch keeps the index pointer Elasticsearch	4	426	July 6, 2017
Returning Many Large Documents and Performace Elasticsearch	1	317	July 6, 2017

Effiecient way to store the result of a large slow query

Related topics