I would like to do something similar to the first example of this doc. My problem is that the two logs are indexed almost one after the other. Could this lead to an error when querying for the first log ?
I actually got a first error:
No mapping found for [@timestamp] in order to sort on
I understood that it failed to sort the results because they didn't had a @timestamp property, which is weird because when I make the query in kibana, the results do have a @timestamp property. After adding the following parameter: enable_sort => false, I get this new error:
[2018-03-29T16:31:02,539][ERROR][logstash.filters.ruby ] Ruby exception occurred: no implicit conversion to rational from nil
This is due to me trying to use a property that should have been set after the elastic query.
Can you tell me if the two logs being close to one another could lead to those sort of errors ?
The square-brackets field reference is unique to Logstash, but this error message looks like it is coming from Elasticsearch; did you add a sort parameter to the Elasticsearch plugin? If so, you may need to specify it without the square brackets.
It's also possible that your query hit an index where the @timestamp was not yet defined; what does your elasticsearch filter configuration look like (be sure to redact passwords)?
Maybe not these specific errors, but they definitely could be problematic. Logstash allows outputs to be batched, and the Elasticsearch output takes advantage of the bulk APIs to batch up inserts into fewer requests; if the start event and end-event fall into the same batch, it's possible that the start event won't exist yet in Elasticsearch when the filter is run for the end event.
did you add a sort parameter to the Elasticsearch plugin?
I did not.
what does your elasticsearch filter configuration look like (be sure to redact passwords)?
I didn't set any password as I hadn't configure authentication in elastic yet.
Concerning the logstash batchs, could I force it to be send before the stop event ? Or could I delay the parsing of the end event to be sure that the start event already exist in elastic ?
Once an event is written to Elasticsearch, a refresh has to complete before the document is available for searching. This is by default up to one second, so the document may not be found even if the events are in different batches.
Is there an option to delay the parsing of end events of one second ?
I managed to have something working by using the elapsed filter. But the problem persist if I use more than one workers for the pipeline so that could be problematic when scaling up...
No, you can not delay the processing of a single event as the whole batch need to be acknowledged as a whole. As all related events need to be processed in a single thread this type of processing typically does not scale well.
A more scalable option might be to instead have an external process/script that periodically performs searches against indexed data and updates events that are not yet complete. This would allow Logstash to work without restraints and is likely to scale much better.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.