Mget randomly fails on large sets of records


(Jay Taylor) #1

I am trying to query approximately 1,500 id's from ES using the _mget
endpoint. In order to do this I split my id list into groups of 1,000
records and execute each a query for each 1K grouping.

Usually the first group goes through, but the second group returns
almost all failures that look like this:

{"_id":"123092822","_index":"profiles","_type":null,"error":"RemoteTransportException[[Brute
II][inet[/192.168.1.47:9300]][indices/mget/shard/s]]; nested: "}

I experimented with breaking the set of 1,500 into groups of 10, and
when I did that the first query returned okay, but from there I got
very mixed results where #'s 1, 2, 5, and 9 return okay but then 0, 3,
4, 6, 7, 8 all failed.

I suspect that there is some kind of timeout ocurring, but I haven't
been able to find any elasticsearch.yml configuration directives that
would help.

Any insight or suggestions about what could be going on would be a
tremendous help.

Thanks,
Jay


(Shay Banon) #2

This looks like a problem in elasticsearch, sadly, there isn't proper
logging in place to try and narrow down why it happens. I just pushed to
master and 0.17 branch proper logging for it, can you give it a go? If not,
pop on IRC I can provide a custom build.

On Sat, Jul 30, 2011 at 2:02 AM, Jay Taylor outtatime@gmail.com wrote:

I am trying to query approximately 1,500 id's from ES using the _mget
endpoint. In order to do this I split my id list into groups of 1,000
records and execute each a query for each 1K grouping.

Usually the first group goes through, but the second group returns
almost all failures that look like this:

{"_id":"123092822","_index":"profiles","_type":null,"error":"RemoteTransportException[[Brute
II][inet[/192.168.1.47:9300]][indices/mget/shard/s]]; nested: "}

I experimented with breaking the set of 1,500 into groups of 10, and
when I did that the first query returned okay, but from there I got
very mixed results where #'s 1, 2, 5, and 9 return okay but then 0, 3,
4, 6, 7, 8 all failed.

I suspect that there is some kind of timeout ocurring, but I haven't
been able to find any elasticsearch.yml configuration directives that
would help.

Any insight or suggestions about what could be going on would be a
tremendous help.

Thanks,
Jay


(system) #3