Problem with elapsed plugin

Hi,

Just hit a problem with the Elapsed plugin, referring to the data below, not sure why it reported the elapsed_end_without_match error though the start event is there. I have checked that the unique_field is the same for the start and end event.
background:
for this data source, the elapsed start/end events come in in sequence, so it's always like: start, end, start, end...
patterns like start start, end, end is not possible.
And the unique_field used here is not really unique across different start/end pairs, i.e. the unique_field for the current start/end pair could be the same as the previous pair.

June 13th 2016, 18:46:02.797 beats_input_codec_plain_applied, selection_Finish, elapsed, elapsed_match
June 13th 2016, 18:46:02.750 beats_input_codec_plain_applied, selection_Start
June 13th 2016, 18:45:59.219 beats_input_codec_plain_applied, selection_Finish, elapsed, elapsed_match
June 13th 2016, 18:45:59.172 beats_input_codec_plain_applied, selection_Start
June 13th 2016, 18:45:50.705 beats_input_codec_plain_applied, selection_Finish, elapsed, elapsed_match
June 13th 2016, 18:45:50.658 beats_input_codec_plain_applied, selection_Start
June 13th 2016, 18:45:49.111 beats_input_codec_plain_applied, selection_Finish, elapsed, elapsed_match
June 13th 2016, 18:45:49.064 beats_input_codec_plain_applied, selection_Start
June 13th 2016, 18:45:47.049 beats_input_codec_plain_applied, selection_Finish, elapsed_end_without_start
June 13th 2016, 18:45:47.002 beats_input_codec_plain_applied, selection_Start
June 13th 2016, 18:45:46.752 beats_input_codec_plain_applied, selection_Finish, elapsed, elapsed_match
June 13th 2016, 18:45:46.705 beats_input_codec_plain_applied, selection_Start
June 13th 2016, 18:45:38.315 beats_input_codec_plain_applied, selection_Finish, elapsed, elapsed_match
June 13th 2016, 18:45:38.268 beats_input_codec_plain_applied, selection_Start
June 13th 2016, 18:45:37.050 beats_input_codec_plain_applied, selection_Finish, elapsed, elapsed_match
June 13th 2016, 18:45:37.003 beats_input_codec_plain_applied, selection_Start

Thank you!

I configured multiple hosts in the elasticsearch output plugin hosts option, could this be the cause?
In logstash document, it stats > If given an array it will load balance requests across the hosts specified in the hosts parameter., how does this load balance work? periodically switching among the hosts, or randomly pick on host and stick to it, switch only if problem happens?

I do not think that multiple hosts are the problem. I am using the elapsed filter plugin with a single elasticsearch node and have noticed the same problem as you.

While it works, for the majority of time, sometimes I get an elapsed_end_without_start tag even though the start event is present.

One more problem I noticed is that if I do not add the periodic_flush setting the timeout setting never kicks in.

The event is never marked as expired and therefore no elapsed_expired_error tag is generated (unless I add the periodic_flush setting as mentioned above.

Is this a bug or the way it should works? I am asking because the periodic_flush setting is marked as optional.

pantheo, you're right, the problem persists even we changed the configuration to one server only rather than load balancing.
Can someone enlighten us on this problem? It's a big problem to us as we use this plugin quite extensively.

Zpp, judging from the plugin's discussion section in github, I do not think we will get an answer any time soon.
However, I have overcome this problem, short of, by incorporating the elapsed_end_without_start tag to what I am doing with the elapsed_match tag.
For example, I have actions that some may end and others may not. Every action though has a Request Begin section, just like in your example. What I am doing with the elapsed plugin is to check whether an action has a matching Request End section.
Well I am considering actions that also have the elapsed_end_without_start tag along with the ones with the elapsed_match tag in my visualizations.
It is not perfect, but for now it gets the job done.

hi pantheo, thanks for sharing. I changed the indexer startup parameter (-w 1) to force indexer to only use single thread for filtering and output. This will slow down the indexing rate, but solves my problem.

Hi Zpp,

thanks for sharing your solution. I will try it to our infrastructure too and see if the problem solves.

Regards

Hi all,

Just to let you now that today I came across this very same problem. After following the recommendation to configure the workers number to "1" (and, thus, do not multithread the processing) everything went fine.

A source with almost 1 million request-response pairs: With 2 workers, around 10k requests didn't get its elapsed time calculated. With a single worker all of them were OK.

1 Like