Refreshing index preventing results being returned?


(Frederick Cheung) #1

Hi,

I've been having a problem where refreshing an index at the wrong point
seems to prevent results being returned. I've managed to cut down my
problem to the following sequence:

curl -XDELETE 'http://localhost:9201/test__garments'

curl -XPOST 'http://localhost:9201/test__garments/garment/1' -d 

'{"id":1, "name":"Some Garment"}'
curl -XPOST 'http://localhost:9201/test__garments/_refresh'
curl -XPUT 'http://localhost:9201/test__garments/verdict/_mapping' -d
'{"verdict":{"_parent":{"type":"garment"},"properties":{"id":{"type":"integer"}}}}'
curl -XPOST 'http://localhost:9201/test__garments/verdict/1?parent=1'
-d '{"id":1}'

curl -XPOST 'http://localhost:9201/test__garments/_refresh'
curl -XPOST 'http://localhost:9201/test__garments/verdict/_search' -d '
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
          "has_parent": {
            "type": "garment",
            "query": {
              "match_all": {}
            }
          }
      }
    }
  }
}
  '

Should produce 1 result: I'm indexing a document (garment), adding a child
document (verdict) and then querying for child documents that have a parent.

If I remove the first call to refresh then the search does return the
expected document and everything behaves normally. If I leave that refresh
in place then I get no results until I do something like

curl -XPOST localhost:9201/test__garments/garment/1/_update -d
'{"script":""}'

Has anyone ever seen something like this before? I'm using ES 0.90.9 on OS
X 10.9.1 (java -version is 1.7.0_45). This seems to work fine on ES 0.20.6

  • this showed up while I was (belatedly) testing upgrading to 0.90

Fred

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8310e078-052f-4c7a-a44c-6cf548266578%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Before the _search, you can issue a

curl -XPOST 'http://localhost:9200/_cache/clear?id_cache=true'

to make it work.

See also https://gist.github.com/jprante/8174209

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFtzbAYg%2BAfObW_okt2Su746WzF8dHnkXN%2BkJmd-sW5ZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Frederick Cheung) #3

On Sunday, December 29, 2013 8:53:00 PM UTC, Jörg Prante wrote:

Before the _search, you can issue a

curl -XPOST 'http://localhost:9200/_cache/clear?id_cache=true'

to make it work.

See also https://gist.github.com/jprante/8174209

Thanks, that definitely helps narrow down what's happening although I don't
see why the cache is in a stale state to begin with.

Fred

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/61f2e51e-0bd5-4864-b7c4-54e9e063d0db%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

You can also be safe when you create all the mappings first, then creating
the docs.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHq6gf67XcUJvxH66DTALCyhBGLCAHss%3DTO3ysLnGaxaA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Frederick Cheung) #5

On 30 Dec 2013, at 09:03, "joergprante@gmail.com" joergprante@gmail.com wrote:

You can also be safe when you create all the mappings first, then creating the docs.

Ok. What I'd really like to understand is what is causing this, so I can decide whether this is just something affecting the setup on my unit tests (in which case doing something like changing the order in which all those steps happen is fine). If this could happen in production then I need a finer handle in what I need to avoid, because as it is this feels like a bug that I could accidentally trigger

Fred

Jörg

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/oD8EKEYeZuM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHq6gf67XcUJvxH66DTALCyhBGLCAHss%3DTO3ysLnGaxaA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55B67DC0-791D-44D7-8967-E6AAA3F1F0E4%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #6

The semantics of "_refresh" is to execute a Lucene "maybeRefresh" with a
force attribute. This might also affect ES ID cache access that is not part
of the Lucene refresh operation.

There is an extra clear ID cache API beside refresh API, so that ES ID
cache can be cleared by another API request. For convenience, it could be
more comprehensible to include a clear ID cache operation into each refresh
API request. On the other hand, invalidating caches can be expensive, so
there are two API calls for good reason. So, it is a kind of interpretation
what can be expected from a refresh API call. I would call it a glitch, and
it seems specific to parent/child.

My rule of thumb: if you have a parent/child query on docs you created
between different API forced refresh calls, call the clear ID cache API, so
for the parent/child query, a valid ID cache will get populated again.

Maybe you should open an issue to get this tracked by the ES team, because
the current behavior is not corresponding to a "least surprise" principle.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFY4Dcd4hXzsf8k-vWZkpn7EsPTxhHxk-hTNK_gUiq4Jg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7