Always view all results from search? (Java Client)


(Drew H-2) #1

Hey guys. Just wanted to say I really like the product but I'm having an
issue.

Right now when I search, I'm doing this.

searchRequestBuilder.setSize( 99999999 );

I've indexed data from a huge table in MySQL.

I want to make sure I always get all results. This number may work now but
what if it gets bigger? If I put -1 it defaults back to 10. Is there a
way I can always fetch all results? Thanks for your help.


(David Pilato) #2

If you need to extract docs from ES (which is slightly different than
searching), then you should use the scan & scroll APIs :
http://www.elasticsearch.org/guide/reference/api/search/scroll.html
http://www.elasticsearch.org/guide/reference/api/search/scroll.html

If you want to do searches, you don't have to setSize. It will return 10
docs from the whole resultset. Then, you will have only to play with the
from parameter to fetch 10 other results :
http://www.elasticsearch.org/guide/reference/api/search/from-size.html
http://www.elasticsearch.org/guide/reference/api/search/from-size.html

HTH
David.

Le 2 avril 2012 à 16:11, Drew H drewhjava@gmail.com a écrit :

really like the product but I'm having an issue.

Right now when I search, I'm doing this.

searchRequestBuilder.setSize( 99999999 );

I've indexed data from a huge table in MySQL.

I want to make sure I always get all results. This number may work now but
what if it gets bigger? If I put -1 it defaults back to 10. Is there a
way I can always fetch all results? Thanks for your help.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Drew H-2) #3

Hi David thanks for your response. In this case I don't think I can use
scan & scroll. It clearly states that it's not intended for real time user
requests. What I have to do is get a result set from date range as a http
request. We have looked at ElasticSearch to speed up this process. Right
now even with a highly optimized MySQL database, the request still takes a
long time. It's just simply a lot of data and it brings down the website
because of the times.

Maybe ElasticSearch is not intended for this use case but it sure has
greatly improved our response times in initial testing.

On Monday, April 2, 2012 10:39:46 AM UTC-4, David Pilato wrote:

If you need to extract docs from ES (which is slightly different than
searching), then you should use the scan & scroll APIs :
http://www.elasticsearch.org/guide/reference/api/search/scroll.html

If you want to do searches, you don't have to setSize. It will return 10
docs from the whole resultset. Then, you will have only to play with the
from parameter to fetch 10 other results :
http://www.elasticsearch.org/guide/reference/api/search/from-size.html

HTH

David.

Hey guys. Just wanted to say I really like the product but I'm having
an issue.

Right now when I search, I'm doing this.

searchRequestBuilder.setSize( 99999999 );

I've indexed data from a huge table in MySQL.

I want to make sure I always get all results. This number may work now
but what if it gets bigger? If I put -1 it defaults back to 10. Is there
a way I can always fetch all results? Thanks for your help.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(David Pilato) #4

In your use case, does the user really need to fetch and display all datas
on his screen ?
Or do you simply need to display the more relevant docs ?

But, you can create also indexes based on dates (one per month for example)
and then fetch the content month per month???

Not sure I answered to your needs :wink:

David.

Le 2 avril 2012 à 17:44, Drew H drewhjava@gmail.com a écrit :

In this case I don't think I can use scan & scroll. It clearly states that
it's not intended for real time user requests. What I have to do is get a
result set from date range as a http request. We have looked at
ElasticSearch to speed up this process. Right now even with a highly
optimized MySQL database, the request still takes a long time. It's just
simply a lot of data and it brings down the website because of the times.

Maybe ElasticSearch is not intended for this use case but it sure has
greatly improved our response times in initial testing.

On Monday, April 2, 2012 10:39:46 AM UTC-4, David Pilato wrote:

If you need to extract docs from ES (which is slightly different than
searching), then you should use the scan & scroll APIs :
http://www.elasticsearch.org/ guide/reference/api/search/ scroll.html
http://www.elasticsearch.org/guide/reference/api/search/scroll.html

If you want to do searches, you don't have to setSize. It will return 10
docs from the whole resultset. Then, you will have only to play with the
from parameter to fetch 10 other results : http://www.elasticsearch.org/
guide/reference/api/search/ from-size.html
http://www.elasticsearch.org/guide/reference/api/search/from-size.html

HTH
David.

really like the product but I'm having an issue.

Right now when I search, I'm doing this.

searchRequestBuilder.setSize( 99999999 );

I've indexed data from a huge table in MySQL.

I want to make sure I always get all results. This number may work now
but
what if it gets bigger? If I put -1 it defaults back to 10. Is there a
way I can always fetch all results? Thanks for your help.

--
David Pilato
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Jörg Prante) #5

To make it clear, setSize() defines how many hits in form of full-
fledged documents are being transferred from Elasticsearch to the
client.

It has nothing to do with the index size coverage, it does not
determine a search limit over this number of documents indexed.

Nobody wants to transfer 99.999.999 full documents over the wire at
each query. You won't be able to do that in "real time". For
downloading large numbers of documents, scan/scroll is the method.

Like David suggests, for querying date ranges in an inverted index, a
discretization of date periods is recommended because of performance
gains. Often, high resolutions like seconds or minutes are not needed.
Consider you want to select documents by day ranges, then index just
the day number since the beginning of your data, as an integer. Then
evaluate user given dates to day numbers, as a filter to an
Elasticsearch query. Integer-based filters are compact and fast.

Jörg

On Apr 2, 6:03 pm, "da...@pilato.fr" da...@pilato.fr wrote:

In your use case, does the user really need to fetch and display all datas
on his screen ?
Or do you simply need to display the more relevant docs ?

But, you can create also indexes based on dates (one per month for example)
and then fetch the content month per month???

Not sure I answered to your needs :wink:

David.

Le 2 avril 2012 à 17:44, Drew H drewhj...@gmail.com a écrit :

In this case I don't think I can use scan & scroll. It clearly states that
it's not intended for real time user requests. What I have to do is get a
result set from date range as a http request. We have looked at
ElasticSearch to speed up this process. Right now even with a highly
optimized MySQL database, the request still takes a long time. It's just
simply a lot of data and it brings down the website because of the times.

Maybe ElasticSearch is not intended for this use case but it sure has
greatly improved our response times in initial testing.

On Monday, April 2, 2012 10:39:46 AM UTC-4, David Pilato wrote:

If you need to extract docs from ES (which is slightly different than
searching), then you should use the scan & scroll APIs :
http://www.elasticsearch.org/guide/reference/api/search/ scroll.html
http://www.elasticsearch.org/guide/reference/api/search/scroll.html

If you want to do searches, you don't have to setSize. It will return 10
docs from the whole resultset. Then, you will have only to play with the
from parameter to fetch 10 other results : http://www.elasticsearch.org/
guide/reference/api/search/ from-size.html
http://www.elasticsearch.org/guide/reference/api/search/from-size.html

HTH
David.

really like the product but I'm having an issue.

Right now when I search, I'm doing this.

searchRequestBuilder.setSize( 99999999 );

I've indexed data from a huge table in MySQL.

I want to make sure I always get all results. This number may work now
but
what if it gets bigger? If I put -1 it defaults back to 10. Is there a
way I can always fetch all results? Thanks for your help.

--
David Pilato
http://dev.david.pilato.fr/http://dev.david.pilato.fr/
Twitter : @dadoonet

--
David Pilatohttp://dev.david.pilato.fr/
Twitter : @dadoonet


(David Pilato) #6

Thanks Jörg ! I understand now what I was trying to say :wink:

Cheers,
David.

-----Message d'origine-----
De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Jörg Prante
Envoyé : lundi 2 avril 2012 18:43
À : elasticsearch
Objet : Re: Always view all results from search? (Java Client)

To make it clear, setSize() defines how many hits in form of full-
fledged documents are being transferred from Elasticsearch to the
client.

It has nothing to do with the index size coverage, it does not
determine a search limit over this number of documents indexed.

Nobody wants to transfer 99.999.999 full documents over the wire at
each query. You won't be able to do that in "real time". For
downloading large numbers of documents, scan/scroll is the method.

Like David suggests, for querying date ranges in an inverted index, a
discretization of date periods is recommended because of performance
gains. Often, high resolutions like seconds or minutes are not needed.
Consider you want to select documents by day ranges, then index just
the day number since the beginning of your data, as an integer. Then
evaluate user given dates to day numbers, as a filter to an
Elasticsearch query. Integer-based filters are compact and fast.

Jörg

On Apr 2, 6:03 pm, "da...@pilato.fr" da...@pilato.fr wrote:

In your use case, does the user really need to fetch and display all
datas on his screen ?
Or do you simply need to display the more relevant docs ?

But, you can create also indexes based on dates (one per month for
example) and then fetch the content month per month???

Not sure I answered to your needs :wink:

David.

Le 2 avril 2012 à 17:44, Drew H drewhj...@gmail.com a écrit :

In this case I don't think I can use scan & scroll. It clearly
states that it's not intended for real time user requests. What I
have to do is get a result set from date range as a http request.
We have looked at ElasticSearch to speed up this process. Right
now

even with a highly optimized MySQL database, the request still
takes

a long time. It's just simply a lot of data and it brings down the
website because of the times.

Maybe ElasticSearch is not intended for this use case but it sure
has greatly improved our response times in initial testing.

On Monday, April 2, 2012 10:39:46 AM UTC-4, David Pilato wrote:

If you need to extract docs from ES (which is slightly different
than searching), then you should use the scan & scroll APIs :
http://www.elasticsearch.org/guide/reference/api/search/
scroll.html

<http://www.elasticsearch.org/guide/reference/api/search/scroll.htm

l>

If you want to do searches, you don't have to setSize. It will
return 10 docs from the whole resultset. Then, you will have only
to play with the from parameter to fetch 10 other results :
http://www.elasticsearch.org/ guide/reference/api/search/
from-size.html
<http://www.elasticsearch.org/guide/reference/api/search/from-
size

.html>

HTH
David.

really like the product but I'm having an issue.

Right now when I search, I'm doing this.

searchRequestBuilder.setSize( 99999999 );

I've indexed data from a huge table in MySQL.

I want to make sure I always get all results. This number may
work now but what if it gets bigger? If I put -1 it defaults
back to 10. Is there a way I can always fetch all results?
Thanks for your help.

--
David Pilato
http://dev.david.pilato.fr/http://dev.david.pilato.fr/
Twitter : @dadoonet

--
David Pilatohttp://dev.david.pilato.fr/ Twitter : @dadoonet


(system) #7