Using custom_score to pull randomly ordered records yields slow query

I am using the following query on a 200 million size index

curl -XPOST "http://xxxx:9200/indexX/documentY/_searchhttp://xxxx:9200/leads/household/_search"
-d'
{"query":
{"custom_score":
{"script":"round(random()*100000)",
"query":{"match_all": {}}}},
"sort":{"_score":{"order":"desc"}},
"size":20000}'

To pull 20K records from the index in random, the match_all represent worse
case scenario.

The query takes a very long time, over 7 minutes for 1 shard.

I would like to know if there is a better way to approach this problem of
random results

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can you index each element with a random number? If so, in your query you
can ask for the first elements whose random number field is greater than
some new random number.

Are you asking for 20K documents in 1 call?

--
Ivan

On Wed, Jul 17, 2013 at 2:49 PM, David MZ david.mazvovsky@gmail.com wrote:

I am using the following query on a 200 million size index

curl -XPOST "http://xxxx:9200/indexX/documentY/_searchhttp://xxxx:9200/leads/household/_search"
-d'
{"query":
{"custom_score":
{"script":"round(random()***100000)",
"query":{"match_all": {}}}},
"sort":{"_score":{"order":"**desc"}},
"size":20000}'

To pull 20K records from the index in random, the match_all represent
worse case scenario.

The query takes a very long time, over 7 minutes for 1 shard.

I would like to know if there is a better way to approach this problem of
random results

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes I am asking for 20K records, and |I would like to have this repeated,
every time asking for 20K more records until I reach say 500K
the total number of records is 55 million.

The problem is that I may not be able to control how many records I will
get every time, as it depends how many random numbers are greater thren the
number I give, and it maybe less then the number of records I required,
will this not ad roundtrips?

I did not mention but this is not for real time purposes I would tolerate a
1 minute query for 20K

On Fri, Jul 19, 2013 at 12:48 AM, Ivan Brusic ivan@brusic.com wrote:

Can you index each element with a random number? If so, in your query you
can ask for the first elements whose random number field is greater than
some new random number.

Are you asking for 20K documents in 1 call?

--
Ivan

On Wed, Jul 17, 2013 at 2:49 PM, David MZ david.mazvovsky@gmail.comwrote:

I am using the following query on a 200 million size index

curl -XPOST "http://xxxx:9200/indexX/documentY/_searchhttp://xxxx:9200/leads/household/_search"
-d'
{"query":
{"custom_score":
{"script":"round(random()***100000)",
"query":{"match_all": {}}}},
"sort":{"_score":{"order":"**desc"}},
"size":20000}'

To pull 20K records from the index in random, the match_all represent
worse case scenario.

The query takes a very long time, over 7 minutes for 1 shard.

I would like to know if there is a better way to approach this problem of
random results

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/pRYSlaJ1TmI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.