I have a strange problem with ES. It's running on a cluster with 64 cores
etc, so I don't think the power of the hardware is the issue.
I want to index a lot of documents with elasticsearch-hadoop.
After some problems I now have everything into place and it seems to work
So I wrote a simple pig script which loads all the files (~500) and stores
them into an ES index.
However, after ~22h the job failed, because of connection problems between
But during that time, there wasn't any heavy usage of network bandwidth or
After that I tried to run the pig script only for one document so I know
what is indexed and what is missing.
After about 3 documents indexed well doing this, the jobs started to fail
again, due to network problems although there wasn't any significant load.
I observed that even after the indexing jobs stopped, there was stuff
happening with the index. The number of documents kept growing for quite
some time and the translog operations went up and down being mostly at
about half a million.
For me this looks like the index takes more time indexing than the pig
script takes for writing into the index and after some time somewhere a
buffer gets too full.
Is this possible? I would expect, that in this case elasticsearch-hadoop
should get throttled.
The only documentation about the translog is what I found here:
which I find a bit little. I still don't know what implications the number
of translog operations has.
On the linked page it says, I could increase the numbers when doing bulk
indexing but I don't understand how this would help.
Also what's TPS?
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d17e1231-da99-4bc2-b019-806046ffd34e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.