What happens when Storm ESBolt capacity is over 1 for a long time?

ferran.munoz · July 12, 2017, 8:28am

Hello,

We are using Storm ESBolt to index documents to Elasticsearch. Our topology is reading the documents from Kafka and indexing them to Elasticsearch using ESBolt.

We have peak periods during the day when the capacity of the ESBolt is over 1 during 30 minutes or 1 hour but the rest of the day its capacity is 0.1. What happens when the ESBolt goes over 1.0 of capacity during 30 minutes or 1 hour? May it drop documents?

We don't have any failed tuples (we have es.storm.bolt.write.ack: true) and we don't have any error/warning message in the worker logs. We have the bolt configured with es.batch.write.refresh: false.

Regards,

Ferran

james.baiera · July 17, 2017, 2:21am

My understanding of Storm is that in this case the bolt in question just exerts backpressure to the bolts and spouts feeding it data. Are you seeing any dropped writes or problems during these times? If so it might make sense to increase your parallelism to account for peak processing times.

ferran.munoz · July 18, 2017, 7:24am

Thanks for your answer. Yes I am seeing dropped writes because I cannot find documents expected to be in Elasticsearch. I have the feeling that backpressure is not applied, but I don't know how to see that the pressure is applied. Do you know if I can see it from Storm UI?
I will try to increase the parallelism. However, increasing the parallelism will increase Elasticsearch nodes load. And I think that in my case it would be better to apply the backpressure and "recover" after the peak times.

ferran.munoz · July 20, 2017, 4:23pm

I will be more specific. What happens when the number of retries (es.batch.write.retry.count) has been reached? Do the bolt raise an exception?

james.baiera · July 20, 2017, 4:30pm

@ferran.munoz it depends on a couple of factors, primarily whether or not you have es.storm.bolt.write.ack enabled. If not enabled, the then flush is attempted, but if any records are not able to be stored in Elasticsearch then they are dropped on the floor. If acks are enabled, the bolt will keep a working set of tuples that it is trying to insert, and on failure to complete writing (as in the retries are exhausted) then the tuples that were not successfully written will be marked to the Storm OutputCollector as failed. All other tuples that were successful in being written will be acked. In the event of a fatal exception in the output, all tuples that were being tracked are signaled as failed to the output collector, and the fatal exception is raised from the bolt.

system · August 17, 2017, 4:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch-hadoop and Apache Storm integration Elasticsearch es-hadoop	4	1005	July 6, 2017
Elastic search SaveJsontoEs Hadoop Libra dropping documents without throwing error or warning Elasticsearch es-hadoop	9	557	February 24, 2023
ES Write timeout Elasticsearch	28	5984	November 3, 2017
Pushback to hadoop from es on bulk load Elasticsearch es-hadoop	9	10609	July 6, 2017
What happens if ES has a index issue? Logstash	2	282	January 16, 2019

What happens when Storm ESBolt capacity is over 1 for a long time?

Related topics