Hey guys. Just wanted to say I really like the product but I'm having an
issue.
Right now when I search, I'm doing this.
searchRequestBuilder.setSize( 99999999 );
I've indexed data from a huge table in MySQL.
I want to make sure I always get all results. This number may work now but
what if it gets bigger? If I put -1 it defaults back to 10. Is there a
way I can always fetch all results? Thanks for your help.
If you want to do searches, you don't have to setSize. It will return 10
docs from the whole resultset. Then, you will have only to play with the
from parameter to fetch 10 other results :
I want to make sure I always get all results. This number may work now but
what if it gets bigger? If I put -1 it defaults back to 10. Is there a
way I can always fetch all results? Thanks for your help.
Hi David thanks for your response. In this case I don't think I can use
scan & scroll. It clearly states that it's not intended for real time user
requests. What I have to do is get a result set from date range as a http
request. We have looked at Elasticsearch to speed up this process. Right
now even with a highly optimized MySQL database, the request still takes a
long time. It's just simply a lot of data and it brings down the website
because of the times.
Maybe Elasticsearch is not intended for this use case but it sure has
greatly improved our response times in initial testing.
On Monday, April 2, 2012 10:39:46 AM UTC-4, David Pilato wrote:
Hey guys. Just wanted to say I really like the product but I'm having
an issue.
Right now when I search, I'm doing this.
searchRequestBuilder.setSize( 99999999 );
I've indexed data from a huge table in MySQL.
I want to make sure I always get all results. This number may work now
but what if it gets bigger? If I put -1 it defaults back to 10. Is there
a way I can always fetch all results? Thanks for your help.
In this case I don't think I can use scan & scroll. It clearly states that
it's not intended for real time user requests. What I have to do is get a
result set from date range as a http request. We have looked at
Elasticsearch to speed up this process. Right now even with a highly
optimized MySQL database, the request still takes a long time. It's just
simply a lot of data and it brings down the website because of the times.
Maybe Elasticsearch is not intended for this use case but it sure has
greatly improved our response times in initial testing.
On Monday, April 2, 2012 10:39:46 AM UTC-4, David Pilato wrote:
I want to make sure I always get all results. This number may work now
but
what if it gets bigger? If I put -1 it defaults back to 10. Is there a
way I can always fetch all results? Thanks for your help.
To make it clear, setSize() defines how many hits in form of full-
fledged documents are being transferred from Elasticsearch to the
client.
It has nothing to do with the index size coverage, it does not
determine a search limit over this number of documents indexed.
Nobody wants to transfer 99.999.999 full documents over the wire at
each query. You won't be able to do that in "real time". For
downloading large numbers of documents, scan/scroll is the method.
Like David suggests, for querying date ranges in an inverted index, a
discretization of date periods is recommended because of performance
gains. Often, high resolutions like seconds or minutes are not needed.
Consider you want to select documents by day ranges, then index just
the day number since the beginning of your data, as an integer. Then
evaluate user given dates to day numbers, as a filter to an
Elasticsearch query. Integer-based filters are compact and fast.
In this case I don't think I can use scan & scroll. It clearly states that
it's not intended for real time user requests. What I have to do is get a
result set from date range as a http request. We have looked at
Elasticsearch to speed up this process. Right now even with a highly
optimized MySQL database, the request still takes a long time. It's just
simply a lot of data and it brings down the website because of the times.
Maybe Elasticsearch is not intended for this use case but it sure has
greatly improved our response times in initial testing.
On Monday, April 2, 2012 10:39:46 AM UTC-4, David Pilato wrote:
I want to make sure I always get all results. This number may work now
but
what if it gets bigger? If I put -1 it defaults back to 10. Is there a
way I can always fetch all results? Thanks for your help.
To make it clear, setSize() defines how many hits in form of full-
fledged documents are being transferred from Elasticsearch to the
client.
It has nothing to do with the index size coverage, it does not
determine a search limit over this number of documents indexed.
Nobody wants to transfer 99.999.999 full documents over the wire at
each query. You won't be able to do that in "real time". For
downloading large numbers of documents, scan/scroll is the method.
Like David suggests, for querying date ranges in an inverted index, a
discretization of date periods is recommended because of performance
gains. Often, high resolutions like seconds or minutes are not needed.
Consider you want to select documents by day ranges, then index just
the day number since the beginning of your data, as an integer. Then
evaluate user given dates to day numbers, as a filter to an
Elasticsearch query. Integer-based filters are compact and fast.
In this case I don't think I can use scan & scroll. It clearly
states that it's not intended for real time user requests. What I
have to do is get a result set from date range as a http request.
We have looked at Elasticsearch to speed up this process. Right
now
even with a highly optimized MySQL database, the request still
takes
a long time. It's just simply a lot of data and it brings down the
website because of the times.
Maybe Elasticsearch is not intended for this use case but it sure
has greatly improved our response times in initial testing.
On Monday, April 2, 2012 10:39:46 AM UTC-4, David Pilato wrote:
I want to make sure I always get all results. This number may
work now but what if it gets bigger? If I put -1 it defaults
back to 10. Is there a way I can always fetch all results?
Thanks for your help.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.