Search finds 42 hits, yet hits object empty

searchersteve · May 13, 2011, 2:22am

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":[]}}

How can this be? I've never encountered this before.

Adriano_Ferreira · May 13, 2011, 3:32am

On Thu, May 12, 2011 at 7:22 PM, searchersteve stevesuo@gmail.com wrote:

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":}}

How can this be? I've never encountered this before.

Hm, you're sure you are not using size=0 in search request?

Lukas_Vlcek1 · May 13, 2011, 6:52am

Which version of ES are you using? I think there was some issue
which has been fixed.

Lukas

On Friday, May 13, 2011, searchersteve stevesuo@gmail.com wrote:

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":}}

How can this be? I've never encountered this before.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p2934377.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Lukas_Vlcek1 · May 13, 2011, 6:53am

But generally if you can provide full recreation script then this
would be really helpful.

On Friday, May 13, 2011, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Which version of ES are you using? I think there was some issue
which has been fixed.

Lukas

On Friday, May 13, 2011, searchersteve stevesuo@gmail.com wrote:

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":}}

How can this be? I've never encountered this before.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p2934377.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

kimchy · May 13, 2011, 9:30am

This can happen with specifying size to 0, or a large from.
On Friday, May 13, 2011 at 9:53 AM, LukÃ¡Å¡ VlÄek wrote:

But generally if you can provide full recreation script then this
would be really helpful.

On Friday, May 13, 2011, LukÃ¡Å¡ VlÄek lukas.vlcek@gmail.com wrote:

Which version of ES are you using? I think there was some issue
which has been fixed.

Lukas

On Friday, May 13, 2011, searchersteve stevesuo@gmail.com wrote:

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":}}

How can this be? I've never encountered this before.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p2934377.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

searchersteve · May 14, 2011, 5:07pm

I've not upgraded to the April release. I'll hope that fixes it.

searchersteve · May 14, 2011, 9:02pm

I don't have my query set to size=0, so I think it was your other explanation -- if by "large from" you mean: from = 0 and size of index=100,000. That's roughly my situation.

If I understand the issue correctly, I'd like to find a workaround. I want users to be able to find results from the entire index. Should I do it like this?

Query to get the total number of hits.
Requery with a reasonably recent FROM = x to get most recent hits.
Tell users there are X more hits on server, and allow them to click "search older results?"
Requery using FROM = u and TO = x, where x minus u is a reasonably tight interval.

This can happen with specifying size to 0, or a large from. On Friday, May 13, 2011 at 9:53 AM, Lukáš Vlček wrote: > But generally if you can provide full recreation script then this > would be really helpful. > > On Friday, May 13, 2011, Lukáš Vlček <lukas.vlcek@gmail.com> wrote: > > Which version of ES are you using? I *think* there was some issue > > which has been fixed. > > > > Lukas > > > > On Friday, May 13, 2011, searchersteve <stevesuo@gmail.com> wrote: > > > I've suddenly gotten an inexplicable search result: > > > > > > {"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":[]}} > > > > > > How can this be? I've never encountered this before. > > > > > > -- > > > View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p2934377.html > > > Sent from the ElasticSearch Users mailing list archive at Nabble.com. >

Clinton_Gormley · May 15, 2011, 8:49am

Hi Steve

On Sat, 2011-05-14 at 14:02 -0700, searchersteve wrote:

If I understand the issue correctly, I'd like to find a workaround. I want
users to be able to find results from the entire index. Should I do it like
this?

Query to get the total number of hits.

Requery with a reasonably recent FROM = x to get most recent hits.

Tell users there are X more hits on server, and allow them to click
"search older results?"

Requery using FROM = u and TO = x, where x minus u is a reasonably tight
interval.

This can be really hard on ES. Think about it like this:

You ask for the first 10 results that contain the words "foo bar",
sorted by last_modified time.

That query is sent off to all 5 of your shards, each of which returns
their top 10 results, so you now have 50. The node handling the request
then combines those 50 results, and returns the top 10 from that list.

Now, you ask for results 100,000 to 100,010. The main node now needs to
work through 500,050 results to figure out which ones to return.

The first question you should ask yourself is: do I really need to
return 100,000 results to my user? Is anybody actually interested in
seeing 100,000 results? Google never returns more than 1,000 for that
very reason.

You can use 'scroll' to ask for 10 results, then another 10 in order,
which is more efficient. But you can't randomly access page 42 that way.

Also, and MUCH more efficient is to use the 'scan' search_type with
scroll. This will allow you to return all results efficiently, BUT the
results are not sorted in any way.

This is particularly useful for things like reindexing all of your data,
but generally is not useful for user search.

Again: Do my users really need to see 100,000 results?

clint

searchersteve · May 27, 2011, 4:59pm

Clint:

Belated thanks for that detailed analysis. It was very helpful.

Now that I think about the design more deliberately, the reason I wanted a user to be able to retrieve all 100,000 documents would be for the purpose of dumping the documents out to a file. I think the better thing to do is to set up two separate modes of operation for the user:

search, with a limit on the number of date-sorted results that can be retrieved;
download, which creates a file of unsorted results using the scan method you suggested.

Again, I appreciate your insights.

Best,
Steve