Hi all,
I'm using the Python API (pyes) to perform the bulk loading of our data,
here's the important part of the code:
import os
from pyes import ES
max_docs = 10000
es = ES(server='hadoop42.robinsystems.com:9200')
for prefix in xrange(1, 105):
f_name = os.path.join('data', str(prefix) + '.json')
with open(f_name, 'rb') as f:
for line in f:
es.index(line, str(prefix), 'my_type', bulk=True)
It loops through files (1.json, 2.json, 3.json, etc) and loads them into
indexes ('1', '2', '3', etc). The API does 400 documents at a time. It hums
along until about 1.5M documents, then the process fails with the following
error:
Traceback (most recent call last):
File "load_data.py", line 24, in
es.index(line, str(prefix), 'my_type', bulk=True)
File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 729, in
index
return self.flush_bulk()
File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 763, in
flush_bulk
return self.bulker.flush_bulk(forced)
File "/usr/local/lib/python2.7/site-packages/pyes/models.py", line 204,
in flush_bulk
"\n".join(batch) + "\n")
File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 441, in
_send_request
response = self.connection.execute(request)
File "/usr/local/lib/python2.7/site-packages/pyes/connection_http.py",
line 109, in execute
self._local.server = server = self._get_server()
File "/usr/local/lib/python2.7/site-packages/pyes/connection_http.py",
line 145, in _get_server
raise NoServerAvailable(ex)
pyes.exceptions.NoServerAvailable: list index out of range
After that, I can't even load one document into the system:
curl -XPOST http://hadoop42.robinsystems.com:9200/_bulk --data-binary
@t.json
{"took":60001,"errors":true,"items":[{"create":{"_index":"21","_type":"my_type","_id":"unj0OWVgQZCNXYqfChaOVg","status":503,"error":"UnavailableShardsException[[21][5]
[3] shardIt, [1] active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@5e27693e]"}}]}
The t.json file has one document in it. I restarted the cluster and I get
the same error. All my primary shards are active, the replicas are coming
up slowly. The current state of the cluster is yellow. I would expect to be
able to still load documents in this state.
Here are some more details of our setup:
- 6 node cluster with 256GB RAM, 120GB set as ES_HEAP
- 104 indexes with 10 shards each and 2 replicas
- Each index holds 80,000 documents and each document is about 20KB
Any idea why I'd be unable to load documents into my cluster after this
point?
Thanks,
-Chris
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/912ea701-bc14-455b-a023-f0f644b9f5de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.