UnavailableShardsException after loading 1.5M documents

Chris_Mildebrandt · June 6, 2014, 10:56pm

Hi all,

I'm using the Python API (pyes) to perform the bulk loading of our data,
here's the important part of the code:

import os
from pyes import ES

max_docs = 10000
es = ES(server='hadoop42.robinsystems.com:9200')

for prefix in xrange(1, 105):
f_name = os.path.join('data', str(prefix) + '.json')
with open(f_name, 'rb') as f:
for line in f:
es.index(line, str(prefix), 'my_type', bulk=True)

It loops through files (1.json, 2.json, 3.json, etc) and loads them into
indexes ('1', '2', '3', etc). The API does 400 documents at a time. It hums
along until about 1.5M documents, then the process fails with the following
error:

Traceback (most recent call last):
File "load_data.py", line 24, in
es.index(line, str(prefix), 'my_type', bulk=True)
File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 729, in
index
return self.flush_bulk()
File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 763, in
flush_bulk
return self.bulker.flush_bulk(forced)
File "/usr/local/lib/python2.7/site-packages/pyes/models.py", line 204,
in flush_bulk
"\n".join(batch) + "\n")
File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 441, in
_send_request
response = self.connection.execute(request)
File "/usr/local/lib/python2.7/site-packages/pyes/connection_http.py",
line 109, in execute
self._local.server = server = self._get_server()
File "/usr/local/lib/python2.7/site-packages/pyes/connection_http.py",
line 145, in _get_server
raise NoServerAvailable(ex)
pyes.exceptions.NoServerAvailable: list index out of range

After that, I can't even load one document into the system:

curl -XPOST http://hadoop42.robinsystems.com:9200/_bulk --data-binary
@t.json

{"took":60001,"errors":true,"items":[{"create":{"_index":"21","_type":"my_type","_id":"unj0OWVgQZCNXYqfChaOVg","status":503,"error":"UnavailableShardsException[[21][5]
[3] shardIt, [1] active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@5e27693e]"}}]}

The t.json file has one document in it. I restarted the cluster and I get
the same error. All my primary shards are active, the replicas are coming
up slowly. The current state of the cluster is yellow. I would expect to be
able to still load documents in this state.

Here are some more details of our setup:

6 node cluster with 256GB RAM, 120GB set as ES_HEAP
104 indexes with 10 shards each and 2 replicas
Each index holds 80,000 documents and each document is about 20KB

Any idea why I'd be unable to load documents into my cluster after this
point?

Thanks,
-Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/912ea701-bc14-455b-a023-f0f644b9f5de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Mildebrandt · June 6, 2014, 11:59pm

It looks like the problem is on my end. I misplaced the HEAP size parameter
and was only running with 1GB. After bumping it up to a more respectable
amount, the loading is humming along again.

-Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1270ea62-1b90-4e52-950d-84776ba3a668%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Mildebrandt · June 9, 2014, 8:48pm

Well, I hit the same error again after 6.7M documents and the increased
heap size.

After a restart of the cluster, it's taking a very long time to bring up
the replica shards. Until I have about 50% of my shards (primary + replica)
initialized, I'm unable to load any more documents. When I try to bulk
load, I fail after about another 1000 documents. Here's my current cluster
stats from Marvel:

Nodes: 6
Indices: 106
Shards: 2342
Data: 1.14 TB
CPU: 238%
Memory: 44.21 GB / 719.90 GB
Up time: 3.4 h
Version: 1.1.1

My total number of shards is actually 3124, still waiting on several
hundred to come back after three and a half hours. I have the following
questions:

Is there an index size to heap size ratio that we should be adhering?
What would prevent us from indexing additional documents? We seem to
be no where close to filling the memory allocated to elasticsearch, and the
CPU usage has remained low.
I just changed the es.logger.level to DEBUG, restarted the cluster,
waited for all the primary shards to initialize, and submitted a document
for indexing. It failed with the UnavailableShardsException and nothing
appeared in the logs on the node where I submitted the request. Is there
somewhere else I should be looking?
I've given no tuning parameters to the indexes or the system as a
whole. Is there something I may be missing?

I'm sure this is something that could be solved on my side with some change
in parameters. Any ideas what I can try?

Thanks,
-Chris

On Friday, June 6, 2014 4:59:49 PM UTC-7, Chris Mildebrandt wrote:

It looks like the problem is on my end. I misplaced the HEAP size
parameter and was only running with 1GB. After bumping it up to a more
respectable amount, the loading is humming along again.

-Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f2e4833c-7fc0-4ca6-8a07-81ee63a0cceb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Timing out while indexing Elasticsearch	30	14941	July 6, 2017
Pyes bulk loading no server available error Elasticsearch	11	700	July 6, 2017
Pyes indexing problem Elasticsearch	2	281	July 6, 2017
Pyes indexing problem Elasticsearch	1	254	July 6, 2017
Pyes bulk insert problem Elasticsearch	3	648	July 6, 2017

UnavailableShardsException after loading 1.5M documents

Related topics