Also - I'm filtering the 100K returned results by a set of another 2000
results, is there a way to save time by doing the filtering in one query at
elasticsearch?
On Tuesday, May 7, 2013 1:04:17 PM UTC+3, Ophir Michaeli wrote:
Hi,
I need to return 100,000 results and it really slows down the search time.
Compared to straight lucene search, it takes 4 times more.
Is there a way to improve search time with such big amounts of returned
results?
I need to return 100,000 results and it really slows down the search time. Compared to straight lucene search, it takes 4 times more.
Is there a way to improve search time with such big amounts of returned results?
We have 2,000 entities (its ID is one of the fields "stored not analyzed"
in Elastic Search). We want to ask some text query on another field but we
are interested only in results of the query that are part of "2,000 set".
Prior to Elastic Search we fetched 100,000 results from Lucene (number
100,000 was picked relatively random, just because we saw that 1M query in
Lucene took too much time) and manually checked those results one by one
against a "2,000 set", according to the ID Field.
Now we want to do the same with Elastic Search. Naturally it takes much
more time to do it in the same way since now those 100,000 results go
through the network.
So, the question is whether we can do something different in Elastic Search
in order to get the same functionality with normal performance.
Can't you simply add a field to every document which is true ("T") if a
document is a member of this special set and is false "F" otherwise?
Then this turns into a filter applied to your query. Fast and efficient.
Even if the set of 2000 is constantly changing you are only talking about
updating the T values.
On Wed, May 8, 2013 at 6:01 AM, Maxim Terletsky sxamt33@gmail.com wrote:
I want to extend the scenario Ophir described:
We have 2,000 entities (its ID is one of the fields "stored not analyzed"
in Elastic Search). We want to ask some text query on another field but we
are interested only in results of the query that are part of "2,000 set".
Prior to Elastic Search we fetched 100,000 results from Lucene (number
100,000 was picked relatively random, just because we saw that 1M query in
Lucene took too much time) and manually checked those results one by one
against a "2,000 set", according to the ID Field.
Now we want to do the same with Elastic Search. Naturally it takes much
more time to do it in the same way since now those 100,000 results go
through the network.
So, the question is whether we can do something different in Elastic
Search in order to get the same functionality with normal performance.
We are talking about billions of documents. And yes, definitely, that 2,000
set is constantly changing. There is absolutely no possibility to update
our huge index with that speed for that purpose.
It could be just another additional condition for the query, but it will be
2,000 conditions (for example : query="car" and (IdField="X1" or
IdField=X2", ....." idField="X2000") ). I don't think Elastic Search could
handle such a huge query in a straight forward fashion. Hence the topic -
we try to understand whether there is some special mechanism that could
help us make a query and then filter the results based on some set of one
of the fields without transferring all the 100,000 back to Elastic Search
client.
On Wednesday, May 8, 2013 7:57:26 PM UTC+3, RKM wrote:
Can't you simply add a field to every document which is true ("T") if a
document is a member of this special set and is false "F" otherwise?
Then this turns into a filter applied to your query. Fast and efficient.
Even if the set of 2000 is constantly changing you are only talking about
updating the T values.
On Wed, May 8, 2013 at 6:01 AM, Maxim Terletsky <sxa...@gmail.com<javascript:>
wrote:
I want to extend the scenario Ophir described:
We have 2,000 entities (its ID is one of the fields "stored not analyzed"
in Elastic Search). We want to ask some text query on another field but we
are interested only in results of the query that are part of "2,000 set".
Prior to Elastic Search we fetched 100,000 results from Lucene (number
100,000 was picked relatively random, just because we saw that 1M query in
Lucene took too much time) and manually checked those results one by one
against a "2,000 set", according to the ID Field.
Now we want to do the same with Elastic Search. Naturally it takes much
more time to do it in the same way since now those 100,000 results go
through the network.
So, the question is whether we can do something different in Elastic
Search in order to get the same functionality with normal performance.
Thanks in advance!
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
As I understand your suggestion I do a scan search for the 100,000, instead
of getting 100K results I get total number of hits and a scroll id.
How do I use this scroll id to query elasticsearch which of the 100k has
one of the fields of the other 2000 documents I have and return only those
documents?
On Tuesday, May 7, 2013 1:04:17 PM UTC+3, Ophir Michaeli wrote:
Hi,
I need to return 100,000 results and it really slows down the search time.
Compared to straight lucene search, it takes 4 times more.
Is there a way to improve search time with such big amounts of returned
results?
As I understand your suggestion I do a scan search for the 100,000,
instead of getting 100K results I get total number of hits and a scroll id.
How do I use this scroll id to query elasticsearch which of the 100k has
one of the fields of the other 2000 documents I have and return only those
documents?****
On Tuesday, May 7, 2013 1:04:17 PM UTC+3, Ophir Michaeli wrote:
Hi,
I need to return 100,000 results and it really slows down the search
time. Compared to straight lucene search, it takes 4 times more.
Is there a way to improve search time with such big amounts of returned
results?
It could be just another additional condition for the query, but it
will be 2,000 conditions (for example : query="car" and (IdField="X1"
or IdField=X2", ....." idField="X2000") ). I don't think Elastic
Search could handle such a huge query in a straight forward fashion.
It could be just another additional condition for the query, but it will
be 2,000 conditions (for example : query="car" and (IdField="X1" or
IdField=X2", ....." idField="X2000") ). I don't think Elastic Search could
handle such a huge query in a straight forward fashion.
It could be just another additional condition for the query, but it will
be 2,000 conditions (for example : query="car" and (IdField="X1" or
IdField=X2", ....." idField="X2000") ). I don't think Elastic Search could
handle such a huge query in a straight forward fashion.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.