Number of returned results and search time

Hi,

I need to return 100,000 results and it really slows down the search time.
Compared to straight lucene search, it takes 4 times more.

Is there a way to improve search time with such big amounts of returned
results?

Thanks,

Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Also - I'm filtering the 100K returned results by a set of another 2000
results, is there a way to save time by doing the filtering in one query at
elasticsearch?

On Tuesday, May 7, 2013 1:04:17 PM UTC+3, Ophir Michaeli wrote:

Hi,

I need to return 100,000 results and it really slows down the search time.
Compared to straight lucene search, it takes 4 times more.

Is there a way to improve search time with such big amounts of returned
results?

Thanks,

Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Did you try with scan & scroll feature?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 mai 2013 à 12:04, Ophir Michaeli ophirmichaeli@gmail.com a écrit :

Hi,

I need to return 100,000 results and it really slows down the search time. Compared to straight lucene search, it takes 4 times more.
Is there a way to improve search time with such big amounts of returned results?

Thanks,
Ophir

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I want to extend the scenario Ophir described:

We have 2,000 entities (its ID is one of the fields "stored not analyzed"
in Elastic Search). We want to ask some text query on another field but we
are interested only in results of the query that are part of "2,000 set".
Prior to Elastic Search we fetched 100,000 results from Lucene (number
100,000 was picked relatively random, just because we saw that 1M query in
Lucene took too much time) and manually checked those results one by one
against a "2,000 set", according to the ID Field.

Now we want to do the same with Elastic Search. Naturally it takes much
more time to do it in the same way since now those 100,000 results go
through the network.

So, the question is whether we can do something different in Elastic Search
in order to get the same functionality with normal performance.

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can't you simply add a field to every document which is true ("T") if a
document is a member of this special set and is false "F" otherwise?

Then this turns into a filter applied to your query. Fast and efficient.

Even if the set of 2000 is constantly changing you are only talking about
updating the T values.

On Wed, May 8, 2013 at 6:01 AM, Maxim Terletsky sxamt33@gmail.com wrote:

I want to extend the scenario Ophir described:

We have 2,000 entities (its ID is one of the fields "stored not analyzed"
in Elastic Search). We want to ask some text query on another field but we
are interested only in results of the query that are part of "2,000 set".
Prior to Elastic Search we fetched 100,000 results from Lucene (number
100,000 was picked relatively random, just because we saw that 1M query in
Lucene took too much time) and manually checked those results one by one
against a "2,000 set", according to the ID Field.

Now we want to do the same with Elastic Search. Naturally it takes much
more time to do it in the same way since now those 100,000 results go
through the network.

So, the question is whether we can do something different in Elastic
Search in order to get the same functionality with normal performance.

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We are talking about billions of documents. And yes, definitely, that 2,000
set is constantly changing. There is absolutely no possibility to update
our huge index with that speed for that purpose.
It could be just another additional condition for the query, but it will be
2,000 conditions (for example : query="car" and (IdField="X1" or
IdField=X2", ....." idField="X2000") ). I don't think Elastic Search could
handle such a huge query in a straight forward fashion. Hence the topic -
we try to understand whether there is some special mechanism that could
help us make a query and then filter the results based on some set of one
of the fields without transferring all the 100,000 back to Elastic Search
client.

On Wednesday, May 8, 2013 7:57:26 PM UTC+3, RKM wrote:

Can't you simply add a field to every document which is true ("T") if a
document is a member of this special set and is false "F" otherwise?

Then this turns into a filter applied to your query. Fast and efficient.

Even if the set of 2000 is constantly changing you are only talking about
updating the T values.

On Wed, May 8, 2013 at 6:01 AM, Maxim Terletsky <sxa...@gmail.com<javascript:>

wrote:

I want to extend the scenario Ophir described:

We have 2,000 entities (its ID is one of the fields "stored not analyzed"
in Elastic Search). We want to ask some text query on another field but we
are interested only in results of the query that are part of "2,000 set".
Prior to Elastic Search we fetched 100,000 results from Lucene (number
100,000 was picked relatively random, just because we saw that 1M query in
Lucene took too much time) and manually checked those results one by one
against a "2,000 set", according to the ID Field.

Now we want to do the same with Elastic Search. Naturally it takes much
more time to do it in the same way since now those 100,000 results go
through the network.

So, the question is whether we can do something different in Elastic
Search in order to get the same functionality with normal performance.

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David -

Thanks for the reply.

As I understand your suggestion I do a scan search for the 100,000, instead
of getting 100K results I get total number of hits and a scroll id.

How do I use this scroll id to query elasticsearch which of the 100k has
one of the fields of the other 2000 documents I have and return only those
documents?

On Tuesday, May 7, 2013 1:04:17 PM UTC+3, Ophir Michaeli wrote:

Hi,

I need to return 100,000 results and it really slows down the search time.
Compared to straight lucene search, it takes 4 times more.

Is there a way to improve search time with such big amounts of returned
results?

Thanks,

Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Maybe you can use terms filter (
http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter/)

On Thu, May 9, 2013 at 9:50 AM, Ophir Michaeli ophirmichaeli@gmail.comwrote:

David -

Thanks for the reply.******

As I understand your suggestion I do a scan search for the 100,000,
instead of getting 100K results I get total number of hits and a scroll id.


How do I use this scroll id to query elasticsearch which of the 100k has
one of the fields of the other 2000 documents I have and return only those
documents?****

On Tuesday, May 7, 2013 1:04:17 PM UTC+3, Ophir Michaeli wrote:

Hi,

I need to return 100,000 results and it really slows down the search
time. Compared to straight lucene search, it takes 4 times more.

Is there a way to improve search time with such big amounts of returned
results?

Thanks,

Ophir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, ES can.

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1", "X2", ... "X2000" ]
}
}
}

Note, because of a limit in Lucene, you have to break down your filter
terms into a sequence of 1024 terms (or 1000 for better counting
convencience):

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1", "X2", ... "X1000" ]
}
}
}

and

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1001", "X1002", ... "X2000" ]
}
}
}

Jörg

Am 09.05.13 09:13, schrieb Maxim Terletsky:

It could be just another additional condition for the query, but it
will be 2,000 conditions (for example : query="car" and (IdField="X1"
or IdField=X2", ....." idField="X2000") ). I don't think Elastic
Search could handle such a huge query in a straight forward fashion.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Are you sure this limit even apply to term's filter?

On Thu, May 9, 2013 at 12:50 PM, Jörg Prante joergprante@gmail.com wrote:

Yes, ES can.

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1", "X2", ... "X2000" ]
}
}
}

Note, because of a limit in Lucene, you have to break down your filter
terms into a sequence of 1024 terms (or 1000 for better counting
convencience):

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1", "X2", ... "X1000" ]
}
}
}

and

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1001", "X1002", ... "X2000" ]
}
}
}

Jörg

Am 09.05.13 09:13, schrieb Maxim Terletsky:

It could be just another additional condition for the query, but it will

be 2,000 conditions (for example : query="car" and (IdField="X1" or
IdField=X2", ....." idField="X2000") ). I don't think Elastic Search could
handle such a huge query in a straight forward fashion.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It applies to the terms QUERY but not the terms filter

On Thu, May 9, 2013 at 1:20 PM, Nicolas Labrot nithril@gmail.com wrote:

Are you sure this limit even apply to term's filter?

On Thu, May 9, 2013 at 12:50 PM, Jörg Prante joergprante@gmail.comwrote:

Yes, ES can.

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1", "X2", ... "X2000" ]
}
}
}

Note, because of a limit in Lucene, you have to break down your filter
terms into a sequence of 1024 terms (or 1000 for better counting
convencience):

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1", "X2", ... "X1000" ]
}
}
}

and

{
"query" : {
"match" : {
"_all" : "car"
}
},
"filter" : {
"terms" : {
"IdField" : [ "X1001", "X1002", ... "X2000" ]
}
}
}

Jörg

Am 09.05.13 09:13, schrieb Maxim Terletsky:

It could be just another additional condition for the query, but it will

be 2,000 conditions (for example : query="car" and (IdField="X1" or
IdField=X2", ....." idField="X2000") ). I don't think Elastic Search could
handle such a huge query in a straight forward fashion.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for all the replies, it works using the term filter:

{

"size": 2000,

"query": {

"query_string": {

  "query": "air",

  "fields": [

    "board^5",

    "user^1",

    "description^10"

  ],

  "analyzer": "snowball",

  "phrase_slop": 1000.0

}

},

"filter": {

"terms": {

  "iDPicture": [

    "0x33381c1f5d80a787ab669e3a974fc67c",

    "0xcd8c566cf3cd3d5a9b1a4d8216ed81ff"

  ]

}

},

"fields": [

"iDPin",

"iDPicture"

]

}

On Tuesday, May 7, 2013 1:04:17 PM UTC+3, Ophir Michaeli wrote:

Hi,

I need to return 100,000 results and it really slows down the search time.
Compared to straight lucene search, it takes 4 times more.

Is there a way to improve search time with such big amounts of returned
results?

Thanks,

Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.