Inconsistency between GET and POST searches

Hi,

I've got a search query which fails with "CircuitBreakingException: Data
too large" when POSTed, but succeeds when the identical query is sent as a
GET (with the json in the query string).

The search query itself may be buggy, as far as I can tell (the "size"
parameter is in the wrong place). But the different behaviour between the
two test cases is the bug I'm interested in.

This is on version 1.0.3, with two nodes. Presumably something is causing
too many fieldvalues to be loaded into memory in the POST version (note
"nested: QueryPhaseExecutionException" in the error output). I can't
reproduce locally with a small dataset, only in production with a 10G
index. I guess I would have to create a very large test dataset first (and
maybe set the circuit breaker settings low?), but I've run out of time for
debugging it this morning and thought I'd see if this was a known issue
first. I thought the "nested" message might be a meaningful clue to someone
who knows more about it.

Gist here: https://gist.github.com/sebbacon/7b5e67aaae7f0e0a31aa

Thanks

Seb

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/adb7d64d-49f2-4bb6-9cf1-cdc4280b24c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

OK, so it turns out the GET version just wasn't getting parsed at all.

curl -XPOST -G http://localhost:9200/bork/user/_search -d '
  something-nonsense'

Always returns everything; the parameters have to be in the form key=val
when in the URL. The docs do already say that; I was being misled by the
behaviour of the elasticsearch-head plugin, which I assumed was doing the
right thing with JSON.

Back to the drawing board... I'm back to my original assumption (before
this red herring) that the issue is because the query is faceting across
the entire dataset, which is simply too big.

My assumption was that my including the type in the URL the faceting would
only happen across that type (which only has 101 records), but I suppose
this is not the case...?

Thanks

Seb

On Friday, 25 July 2014 10:41:59 UTC+1, Seb Bacon wrote:

Hi,

I've got a search query which fails with "CircuitBreakingException: Data
too large" when POSTed, but succeeds when the identical query is sent as a
GET (with the json in the query string).

The search query itself may be buggy, as far as I can tell (the "size"
parameter is in the wrong place). But the different behaviour between the
two test cases is the bug I'm interested in.

This is on version 1.0.3, with two nodes. Presumably something is causing
too many fieldvalues to be loaded into memory in the POST version (note
"nested: QueryPhaseExecutionException" in the error output). I can't
reproduce locally with a small dataset, only in production with a 10G
index. I guess I would have to create a very large test dataset first (and
maybe set the circuit breaker settings low?), but I've run out of time for
debugging it this morning and thought I'd see if this was a known issue
first. I thought the "nested" message might be a meaningful clue to someone
who knows more about it.

Gist here: Demonstrating different behaviour between GET and POST for search queries. Hard to reproduce on a fresh install, as relates to size of index. Can reproduce on our production server, which is 10Gi with 13,286,926 items · GitHub

Thanks

Seb

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ac8c1e6-99df-4738-92eb-ad0e1620763a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It turns out that computing this facet only takes about 12MB, but the
fielddata cache was completely full. Restarting the nodes emptied the
cache, and everything started working again.

I note that there's a setting:

indices.fielddata.cache.expire

Which is off by default. I guess I need to set that to something sensible
and see what happens. What's the reason for it being off by default?

Thanks

Seb

On Friday, 25 July 2014 11:51:03 UTC+1, Seb Bacon wrote:

OK, so it turns out the GET version just wasn't getting parsed at all.

curl -XPOST -G http://localhost:9200/bork/user/_search -d '
  something-nonsense'

Always returns everything; the parameters have to be in the form key=val
when in the URL. The docs do already say that; I was being misled by the
behaviour of the elasticsearch-head plugin, which I assumed was doing the
right thing with JSON.

Back to the drawing board... I'm back to my original assumption (before
this red herring) that the issue is because the query is faceting across
the entire dataset, which is simply too big.

My assumption was that my including the type in the URL the faceting would
only happen across that type (which only has 101 records), but I suppose
this is not the case...?

Thanks

Seb

On Friday, 25 July 2014 10:41:59 UTC+1, Seb Bacon wrote:

Hi,

I've got a search query which fails with "CircuitBreakingException: Data
too large" when POSTed, but succeeds when the identical query is sent as a
GET (with the json in the query string).

The search query itself may be buggy, as far as I can tell (the "size"
parameter is in the wrong place). But the different behaviour between the
two test cases is the bug I'm interested in.

This is on version 1.0.3, with two nodes. Presumably something is causing
too many fieldvalues to be loaded into memory in the POST version (note
"nested: QueryPhaseExecutionException" in the error output). I can't
reproduce locally with a small dataset, only in production with a 10G
index. I guess I would have to create a very large test dataset first (and
maybe set the circuit breaker settings low?), but I've run out of time for
debugging it this morning and thought I'd see if this was a known issue
first. I thought the "nested" message might be a meaningful clue to someone
who knows more about it.

Gist here: Demonstrating different behaviour between GET and POST for search queries. Hard to reproduce on a fresh install, as relates to size of index. Can reproduce on our production server, which is 10Gi with 13,286,926 items · GitHub

Thanks

Seb

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d7270870-7a3c-4141-b70f-5de5d714ee84%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.