Only partial results returned for aggregation + ElasticsearchIllegalStateException when trying scroll

Over a sample ~2.5M document dataset, where each record holds a geopoint
and some other data, I wanted ElasticSearch 1.4.1 to provide the following
data:

For all results in a given geo_bounding box:
Group results by:
(geohash of length 8, a term, day)
For each group provide:
2 sums of terms, 2 distinct terms of the documents in the group

The nested aggregation looked like:

geohash_grid
terms
date_histogram
sum
sum
cardinality
cardinality

I had two issues:

  1. I seem to have received only some of the response. The response
    "hits.total" was 174054, yet when I summed the geohash_grid (first
    aggregation) doc_count, I got about ~13K. I tried perhaps passing a large
    "size" but this had no effect. Is there a way to get all of the response?
  2. The next logical step was to try pagination, but when I added
    &scroll=60s to the URL, I received an ElasticsearchIllegalStateException
    exception and 503 status. From the logs, the stack was:

[DEBUG][action.search.type ] [zoidberg] [listener][4]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@77526f86] while
moving to second phase
org.elasticsearch.ElasticsearchIllegalStateException
at
org.elasticsearch.action.search.type.TransportSearchHelper.buildScrollId(TransportSearchHelper.java:65)
at
org.elasticsearch.action.search.type.TransportSearchCountAction$AsyncAction.moveToSecondPhase(TransportSearchCountAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.innerMoveToSecondPhase(TransportSearchTypeAction.java:397)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:198)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onResult(TransportSearchTypeAction.java:174)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onResult(TransportSearchTypeAction.java:171)
at
org.elasticsearch.search.action.SearchServiceTransportAction$6.handleResponse(SearchServiceTransportAction.java:244)
at
org.elasticsearch.search.action.SearchServiceTransportAction$6.handleResponse(SearchServiceTransportAction.java:235)
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:158)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:127)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

This occurs regardless of the geohash precision.

Questions:

  1. To get the data I need, is the aggregation I have built the
    correct/optimal way?
  2. Why can't I see all results in a non-paginated aggregation with a
    large response? Is there a hard limit?
  3. What is the cause of the exception?

Thanks
Eran

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c8192343-557b-4c0b-afab-0563eddcef07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Thu, Dec 18, 2014 at 11:02 AM, Eran Duchan pavius@gmail.com wrote:

  1. I seem to have received only some of the response. The response
    "hits.total" was 174054, yet when I summed the geohash_grid (first
    aggregation) doc_count, I got about ~13K. I tried perhaps passing a large
    "size" but this had no effect. Is there a way to get all of the response?

The geo hash grid aggregation has the same issues as the terms aggregation
in terms of accuracy (it works very similarly), so what is written at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-approximate-counts
also applies to the geo hash grid aggregation. In order to work around the
issue, you could consider increasing the shard size (defaults to 10,000)
but don't go too large or it might consume too many resources, or decrease
the size of your geo hashes (eg. length=7 or 6, so that there would be
fewer terms overall).

  1. The next logical step was to try pagination, but when I added
    &scroll=60s to the URL, I received an ElasticsearchIllegalStateException
    exception and 503 status. From the logs, the stack was:

[DEBUG][action.search.type ] [zoidberg] [listener][4]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@77526f86] while
moving to second phase
org.elasticsearch.ElasticsearchIllegalStateException
at
org.elasticsearch.action.search.type.TransportSearchHelper.buildScrollId(TransportSearchHelper.java:65)
at
org.elasticsearch.action.search.type.TransportSearchCountAction$AsyncAction.moveToSecondPhase(TransportSearchCountAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.innerMoveToSecondPhase(TransportSearchTypeAction.java:397)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:198)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onResult(TransportSearchTypeAction.java:174)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onResult(TransportSearchTypeAction.java:171)
at
org.elasticsearch.search.action.SearchServiceTransportAction$6.handleResponse(SearchServiceTransportAction.java:244)
at
org.elasticsearch.search.action.SearchServiceTransportAction$6.handleResponse(SearchServiceTransportAction.java:235)
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:158)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:127)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I agree the message is not user-friendly... Did you try to add pagination
on a request of type COUNT? Unfortunately, aggregations don't support
pagination, pagination only works for hits and the (DFS_)QUERY_THEN_FETCH
types.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7c4N5mv4_pgEu0V_xeji0UW5v76ccNHL43hL%3De5_NsSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Adrien

> Did you try to add pagination on a request of type COUNT?
Yes, I run the aggregation with search_type=count.

The thing is I need accurate results, not super fast execution. Scoring is
something we don't use and need so I would like all relevant results
(i.e. results which pass the supplied query filter) across all shards/nodes
to be added to the list of results. I tried doing this by setting size to a
high number (1000000), but to no avail. I see the documentation you
referred to indicates that size:0 removes any limit, but shouldn't a high
size work as well? Is there an inherent limitation to run the query
originally posted and expect accurate results?

Eran

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f68edad9-d00a-45ce-b90c-f74decce37e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I second that. May of us need accurate results at the expense of
performance. So an optional two step execution for results correction (for
buckets not present in all shards responses) would be very helpful!
A great first step would be to do so on a single node (if not already done)
when aggregating its shards as it does not impose as much overhead (no need
to extra network calls).
For me personally it would be super helpful because we have a large number
of fairly small datasets which fit on one good server and we need exact
analysis

also see my (admittedly naive) comment
on https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/elasticsearch/aLBv2QB7VMg

Nick,

I am not an expert in this area either but with multi-core processors (24,
32, 48) it is not uncommon to have fairly large number of shards on a node
so 30 shards is not out of question
I assumed that ES aggregate shard results on a node prior to shipping them
to a master but I do not know if it is true. It may very well be that node
sends per shard aggregations to the master which case it it 32xShard
ResultSize for our 32 shard node. reducing size of network packet by 32
(even if it were just 8) and work for master by the same ratio is not a
chump change. Somehow I think ES already doing it :slight_smile: but who knows

Another potential benefit of doing node aggregation is that on a single
node when aggregating multiple shards ES could resolve potential errors by
aggregating all buckets and re-calculating buckets not present in every
shard at a fairly low cost while doing so across nodes is costly. On the
other hand it may amplify the error across nodes do not know

  • s

On Thursday, December 18, 2014 12:45:27 PM UTC-5, Eran Duchan wrote:

Thanks Adrien

> Did you try to add pagination on a request of type COUNT?
Yes, I run the aggregation with search_type=count.

The thing is I need accurate results, not super fast execution. Scoring is
something we don't use and need so I would like all relevant results
(i.e. results which pass the supplied query filter) across all shards/nodes
to be added to the list of results. I tried doing this by setting size to a
high number (1000000), but to no avail. I see the documentation you
referred to indicates that size:0 removes any limit, but shouldn't a high
size work as well? Is there an inherent limitation to run the query
originally posted and expect accurate results?

Eran

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9498921b-4bae-40f2-938a-bd2df1889eb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.