Elasticsearch Thrift client time-out


(Harish Ved) #1

Hey,
I am trying to index some documents in elasticsearch 1.2.0 using thrift
connection in python client
http://elasticsearch-py.readthedocs.org/en/master/ on my virtual machine.
But after indexing around 1200~1800 documents I am getting a TSocket
timeout. Here is the traceback -

Traceback (most recent call last):
File "new_insert_bulk.py", line 400, in
actions = define_products(namespace_id, store_ids, int(sys.argv[3]),
category_ids, category_names, actions, cluster_id)
File "new_insert_bulk.py", line 371, in define_products
print helpers.bulk(es, actions)
File
"/home/ubuntu/.virtualenvs/elasticsearch/local/lib/python2.7/site-packages/elasticsearch/helpers.py",
line 148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File
"/home/ubuntu/.virtualenvs/elasticsearch/local/lib/python2.7/site-packages/elasticsearch/helpers.py",
line 107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File
"/home/ubuntu/.virtualenvs/elasticsearch/local/lib/python2.7/site-packages/elasticsearch/client/utils.py",
line 70, in _wrapped
return func(*args, params=params, **kwargs)
File
"/home/ubuntu/.virtualenvs/elasticsearch/local/lib/python2.7/site-packages/elasticsearch/client/init.py",
line 568, in bulk
params=params, body=self._bulk_body(body))
File
"/home/ubuntu/.virtualenvs/elasticsearch/local/lib/python2.7/site-packages/elasticsearch/transport.py",
line 274, in perform_request
status, headers, data = connection.perform_request(method, url, params,
body, ignore=ignore)
File
"/home/ubuntu/.virtualenvs/elasticsearch/local/lib/python2.7/site-packages/elasticsearch/connection/thrift.py",
line 62, in perform_request
response = tclient.execute(request)
File
"/home/ubuntu/.virtualenvs/elasticsearch/local/lib/python2.7/site-packages/elasticsearch/connection/esthrift/Rest.py",
line 42, in execute
return self.recv_execute()
File
"/home/ubuntu/.virtualenvs/elasticsearch/local/lib/python2.7/site-packages/elasticsearch/connection/esthrift/Rest.py",
line 53, in recv_execute
(fname, mtype, rseqid) = self._iprot.readMessageBegin()
File "build/bdist.linux-x86_64/egg/thrift/protocol/TBinaryProtocol.py",
line 126, in readMessageBegin
File "build/bdist.linux-x86_64/egg/thrift/protocol/TBinaryProtocol.py",
line 206, in readI32
File "build/bdist.linux-x86_64/egg/thrift/transport/TTransport.py", line
58, in readAll
File "build/bdist.linux-x86_64/egg/thrift/transport/TTransport.py", line
159, in read
File "build/bdist.linux-x86_64/egg/thrift/transport/TSocket.py", line
105, in read
socket.timeout: timed out

I have attached the server logs in log2.txt. Also I have attached
screenshots from bigdesk which may help. Previously, when the complexity of
the documents was lower, bulk index request of around 1300 docs in each
request would be indexed without any hiccups. Currently I am trying around
200 docs per request but it times out. I have kept logging level at 'TRACE'.

This chunk on the logs -

[2014-07-07 22:24:56,980][TRACE][lucene.iw ]
[Blaquesmith][in_0][0] elasticsearch[Blaquesmith][scheduler][T#1] DW:
anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-07 22:24:57,277][TRACE][lucene.iw ]
[Blaquesmith][in_0][1] elasticsearch[Blaquesmith][scheduler][T#1] DW:
anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-07 22:24:57,277][TRACE][lucene.iw ]
[Blaquesmith][in_0][1] elasticsearch[Blaquesmith][scheduler][T#1] IW:
nrtIsCurrent: infoVersion matches: true; DW changes: false; BD changes:
false
[2014-07-07 22:24:57,277][TRACE][lucene.iw ]
[Blaquesmith][in_0][1] elasticsearch[Blaquesmith][scheduler][T#1] DW:
anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-07 22:24:57,496][TRACE][lucene.iw ]
[Blaquesmith][in_0][2] elasticsearch[Blaquesmith][scheduler][T#1] DW:
anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-07 22:24:57,496][TRACE][lucene.iw ]
[Blaquesmith][in_0][2] elasticsearch[Blaquesmith][scheduler][T#1] IW:
nrtIsCurrent: infoVersion matches: true; DW changes: false; BD changes:
false
[2014-07-07 22:24:57,496][TRACE][lucene.iw ]
[Blaquesmith][in_0][2] elasticsearch[Blaquesmith][scheduler][T#1] DW:
anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-07 22:24:57,980][TRACE][lucene.iw ]
[Blaquesmith][in_0][0] elasticsearch[Blaquesmith][scheduler][T#1] DW:
anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-07 22:24:57,981][TRACE][lucene.iw ]
[Blaquesmith][in_0][0] elasticsearch[Blaquesmith][scheduler][T#1] IW:
nrtIsCurrent: infoVersion matches: true; DW changes: false; BD changes:
false
[2014-07-07 22:24:57,981][TRACE][lucene.iw ]
[Blaquesmith][in_0][0] elasticsearch[Blaquesmith][scheduler][T#1] DW:
anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false

keeps repeating even after a long time after the timeout. The timed-out
indexing request was around 22:21:00. I dont think the document may be
invalid (with the mapping or something) because I am indexing the same
document after changing 2 fields. How can I avoid this time-out? And how
can I be sure that this wont happen occasionally in production system? I
would be glad for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ad427104-85c9-44f8-8482-aea1ecf092f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2