Overloading by indexing?

Julius_K · June 24, 2014, 2:33pm

Hi,

I have a strange problem with ES. It's running on a cluster with 64 cores
etc, so I don't think the power of the hardware is the issue.

I want to index a lot of documents with elasticsearch-hadoop.
After some problems I now have everything into place and it seems to work
fine.

So I wrote a simple pig script which loads all the files (~500) and stores
them into an ES index.
However, after ~22h the job failed, because of connection problems between
the nodes.
But during that time, there wasn't any heavy usage of network bandwidth or
other ressources.

After that I tried to run the pig script only for one document so I know
what is indexed and what is missing.
After about 3 documents indexed well doing this, the jobs started to fail
again, due to network problems although there wasn't any significant load.

I observed that even after the indexing jobs stopped, there was stuff
happening with the index. The number of documents kept growing for quite
some time and the translog operations went up and down being mostly at
about half a million.

For me this looks like the index takes more time indexing than the pig
script takes for writing into the index and after some time somewhere a
buffer gets too full.

Is this possible? I would expect, that in this case elasticsearch-hadoop
should get throttled.

The only documentation about the translog is what I found here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html
which I find a bit little. I still don't know what implications the number
of translog operations has.

On the linked page it says, I could increase the numbers when doing bulk
indexing but I don't understand how this would help.
Also what's TPS?

Best regards
Julius

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d17e1231-da99-4bc2-b019-806046ffd34e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · June 24, 2014, 11:23pm

TPS is usually transactions per second.

Are you monitoring your cluster, and your memory/heap usage? How are you
coming to the conclusion that it's a networking issue?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 June 2014 00:33, Julius K fooliuskoolius@gmail.com wrote:

Hi,

I have a strange problem with ES. It's running on a cluster with 64 cores
etc, so I don't think the power of the hardware is the issue.

I want to index a lot of documents with elasticsearch-hadoop.
After some problems I now have everything into place and it seems to work
fine.

So I wrote a simple pig script which loads all the files (~500) and stores
them into an ES index.
However, after ~22h the job failed, because of connection problems between
the nodes.
But during that time, there wasn't any heavy usage of network bandwidth or
other ressources.

After that I tried to run the pig script only for one document so I know
what is indexed and what is missing.
After about 3 documents indexed well doing this, the jobs started to fail
again, due to network problems although there wasn't any significant load.

I observed that even after the indexing jobs stopped, there was stuff
happening with the index. The number of documents kept growing for quite
some time and the translog operations went up and down being mostly at
about half a million.

For me this looks like the index takes more time indexing than the pig
script takes for writing into the index and after some time somewhere a
buffer gets too full.

Is this possible? I would expect, that in this case elasticsearch-hadoop
should get throttled.

The only documentation about the translog is what I found here:
Elasticsearch Platform — Find real-time answers at scale | Elastic
which I find a bit little. I still don't know what implications the number
of translog operations has.

On the linked page it says, I could increase the numbers when doing bulk
indexing but I don't understand how this would help.
Also what's TPS?

Best regards
Julius

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d17e1231-da99-4bc2-b019-806046ffd34e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d17e1231-da99-4bc2-b019-806046ffd34e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bSM3t2j%3DqJwBe4%3DjhSkd5BSJskoJtCf0Qj7amD3pUa0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Julius_K · June 25, 2014, 4:51pm

2014-06-25 1:23 GMT+02:00 Mark Walkom markw@campaignmonitor.com:

TPS is usually transactions per second.
Ok, thanks

Are you monitoring your cluster, and your memory/heap usage?
Yes, that is what I meant with no significant load and no heavy usage
of ressources.
How are you coming to the conclusion that it's a networking issue?
The error messages say something like connection timed out and similar.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 June 2014 00:33, Julius K fooliuskoolius@gmail.com wrote:

Hi,

I have a strange problem with ES. It's running on a cluster with 64 cores
etc, so I don't think the power of the hardware is the issue.

I want to index a lot of documents with elasticsearch-hadoop.
After some problems I now have everything into place and it seems to work
fine.

So I wrote a simple pig script which loads all the files (~500) and stores
them into an ES index.
However, after ~22h the job failed, because of connection problems between
the nodes.
But during that time, there wasn't any heavy usage of network bandwidth or
other ressources.

After that I tried to run the pig script only for one document so I know
what is indexed and what is missing.
After about 3 documents indexed well doing this, the jobs started to fail
again, due to network problems although there wasn't any significant load.

I observed that even after the indexing jobs stopped, there was stuff
happening with the index. The number of documents kept growing for quite
some time and the translog operations went up and down being mostly at about
half a million.

For me this looks like the index takes more time indexing than the pig
script takes for writing into the index and after some time somewhere a
buffer gets too full.

Is this possible? I would expect, that in this case elasticsearch-hadoop
should get throttled.

The only documentation about the translog is what I found here:
Elasticsearch Platform — Find real-time answers at scale | Elastic
which I find a bit little. I still don't know what implications the number
of translog operations has.

On the linked page it says, I could increase the numbers when doing bulk
indexing but I don't understand how this would help.
Also what's TPS?

Best regards
Julius

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d17e1231-da99-4bc2-b019-806046ffd34e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ipm7UQpQO88/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bSM3t2j%3DqJwBe4%3DjhSkd5BSJskoJtCf0Qj7amD3pUa0w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAN%3Dtz-j-SgU7ZCJbHYF%2B9CcWPiRTYVGk_qCLmCm_wBMA6n7AQQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Timing out while indexing Elasticsearch	30	15186	July 6, 2017
Indexing is becoming slow, what to look for? Elasticsearch	8	373	July 6, 2017
Cluster resource usage Elasticsearch	14	447	July 6, 2017
ES Indexing from Hadoop Issues Elasticsearch	5	810	July 6, 2017
Extremly slow troughput on large index Elasticsearch	8	1007	July 6, 2017

Overloading by indexing?

Related topics