Search finds 42 hits, yet hits object empty

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":[]}}

How can this be? I've never encountered this before.

On Thu, May 12, 2011 at 7:22 PM, searchersteve stevesuo@gmail.com wrote:

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":}}

How can this be? I've never encountered this before.

Hm, you're sure you are not using size=0 in search request?

Which version of ES are you using? I think there was some issue
which has been fixed.

Lukas

On Friday, May 13, 2011, searchersteve stevesuo@gmail.com wrote:

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":}}

How can this be? I've never encountered this before.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p2934377.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

But generally if you can provide full recreation script then this
would be really helpful.

On Friday, May 13, 2011, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Which version of ES are you using? I think there was some issue
which has been fixed.

Lukas

On Friday, May 13, 2011, searchersteve stevesuo@gmail.com wrote:

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":}}

How can this be? I've never encountered this before.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p2934377.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

This can happen with specifying size to 0, or a large from.
On Friday, May 13, 2011 at 9:53 AM, Lukáš Vlček wrote:

But generally if you can provide full recreation script then this
would be really helpful.

On Friday, May 13, 2011, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Which version of ES are you using? I think there was some issue
which has been fixed.

Lukas

On Friday, May 13, 2011, searchersteve stevesuo@gmail.com wrote:

I've suddenly gotten an inexplicable search result:

{"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":}}

How can this be? I've never encountered this before.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p2934377.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

I've not upgraded to the April release. I'll hope that fixes it.

I don't have my query set to size=0, so I think it was your other explanation -- if by "large from" you mean: from = 0 and size of index=100,000. That's roughly my situation.

If I understand the issue correctly, I'd like to find a workaround. I want users to be able to find results from the entire index. Should I do it like this?

  1. Query to get the total number of hits.
  2. Requery with a reasonably recent FROM = x to get most recent hits.
  3. Tell users there are X more hits on server, and allow them to click "search older results?"
  4. Requery using FROM = u and TO = x, where x minus u is a reasonably tight interval.
This can happen with specifying size to 0, or a large from. On Friday, May 13, 2011 at 9:53 AM, Lukáš Vlček wrote: > But generally if you can provide full recreation script then this > would be really helpful. > > On Friday, May 13, 2011, Lukáš Vlček <lukas.vlcek@gmail.com> wrote: > > Which version of ES are you using? I *think* there was some issue > > which has been fixed. > > > > Lukas > > > > On Friday, May 13, 2011, searchersteve <stevesuo@gmail.com> wrote: > > > I've suddenly gotten an inexplicable search result: > > > > > > {"took":36,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":42,"max_score":4.101285,"hits":[]}} > > > > > > How can this be? I've never encountered this before. > > > > > > -- > > > View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p2934377.html > > > Sent from the ElasticSearch Users mailing list archive at Nabble.com. >

Hi Steve

On Sat, 2011-05-14 at 14:02 -0700, searchersteve wrote:

If I understand the issue correctly, I'd like to find a workaround. I want
users to be able to find results from the entire index. Should I do it like
this?

  1. Query to get the total number of hits.
  2. Requery with a reasonably recent FROM = x to get most recent hits.
  3. Tell users there are X more hits on server, and allow them to click
    "search older results?"
  4. Requery using FROM = u and TO = x, where x minus u is a reasonably tight
    interval.

This can be really hard on ES. Think about it like this:

You ask for the first 10 results that contain the words "foo bar",
sorted by last_modified time.

That query is sent off to all 5 of your shards, each of which returns
their top 10 results, so you now have 50. The node handling the request
then combines those 50 results, and returns the top 10 from that list.

Now, you ask for results 100,000 to 100,010. The main node now needs to
work through 500,050 results to figure out which ones to return.

The first question you should ask yourself is: do I really need to
return 100,000 results to my user? Is anybody actually interested in
seeing 100,000 results? Google never returns more than 1,000 for that
very reason.

You can use 'scroll' to ask for 10 results, then another 10 in order,
which is more efficient. But you can't randomly access page 42 that way.

Also, and MUCH more efficient is to use the 'scan' search_type with
scroll. This will allow you to return all results efficiently, BUT the
results are not sorted in any way.

This is particularly useful for things like reindexing all of your data,
but generally is not useful for user search.

Again: Do my users really need to see 100,000 results?

clint

Clint:

Belated thanks for that detailed analysis. It was very helpful.

Now that I think about the design more deliberately, the reason I wanted a user to be able to retrieve all 100,000 documents would be for the purpose of dumping the documents out to a file. I think the better thing to do is to set up two separate modes of operation for the user:

  1. search, with a limit on the number of date-sorted results that can be retrieved;
  2. download, which creates a file of unsorted results using the scan method you suggested.

Again, I appreciate your insights.

Best,
Steve

Hi Steve

On Sat, 2011-05-14 at 14:02 -0700, searchersteve wrote:

If I understand the issue correctly, I'd like to find a workaround. I want
users to be able to find results from the entire index. Should I do it like
this?

  1. Query to get the total number of hits.
  2. Requery with a reasonably recent FROM = x to get most recent hits.
  3. Tell users there are X more hits on server, and allow them to click
    "search older results?"
  4. Requery using FROM = u and TO = x, where x minus u is a reasonably tight
    interval.

This can be really hard on ES. Think about it like this:

You ask for the first 10 results that contain the words "foo bar",
sorted by last_modified time.

That query is sent off to all 5 of your shards, each of which returns
their top 10 results, so you now have 50. The node handling the request
then combines those 50 results, and returns the top 10 from that list.

Now, you ask for results 100,000 to 100,010. The main node now needs to
work through 500,050 results to figure out which ones to return.

The first question you should ask yourself is: do I really need to
return 100,000 results to my user? Is anybody actually interested in
seeing 100,000 results? Google never returns more than 1,000 for that
very reason.

You can use 'scroll' to ask for 10 results, then another 10 in order,
which is more efficient. But you can't randomly access page 42 that way.

Also, and MUCH more efficient is to use the 'scan' search_type with
scroll. This will allow you to return all results efficiently, BUT the
results are not sorted in any way.

This is particularly useful for things like reindexing all of your data,
but generally is not useful for user search.

Again: Do my users really need to see 100,000 results?

clint

Hi,

I know this is an old post but I had this problem and I came with a different fix. I also had TotalHits > 0 but no hits in response.hits().hits() because there was missing a refresh !
So don't forget it if you have this problem too :wink:

By missing refresh you mean not calling the refresh API? This should still not happen, can you recreate it?

On Monday, July 11, 2011 at 12:34 PM, cnotin wrote:

Hi,

I know this is an old post but I had this problem and I came with a
different fix. I also had TotalHits > 0 but no hits in
response.hits().hits() because there was missing a refresh !
So don't forget it if you have this problem too :wink:

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Search-finds-42-hits-yet-hits-object-empty-tp2934377p3158666.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com (http://Nabble.com).