Comparing 2 sources of log input ( using fuzzy? hash? term ?)

noelyim · September 14, 2015, 2:56pm

I need to compare 2 log files from 2 different hosts ( eventually 2 sources of breadcrumbs). How to do it with logstash and elasticsearch?

Mark_Harwood · September 14, 2015, 3:03pm

What sort of comparison?

equals vs not equals?
line diffs?
lines on a time graph showing log volumes by time period?

noelyim · September 14, 2015, 3:34pm

Comparing xml data. Maybe one or more fields.
-equals
I don't need line diffs because both log files will be growing.
I don't care about the volumes.
For example,
log A:
12345
...
12346
...
12347

log B:
12345
...
...
12347

I want to use logstash+elasticsearch to identify log B is missing data (field1 =12346)

Thanks a lot

Mark_Harwood · September 14, 2015, 3:38pm

Can you give a brief example of the 2 inputs and your ideal response?
I'm now clearer on what format your data comes in but not what sort of comparison you are looking for

Mark_Harwood · September 14, 2015, 4:10pm

So you want to know the instant that two continually updated files become inconsistent?
How are you handling the timing of when these comparisons will run?

If a simple "not equal" comparison is required on stable sets then computation of a hash would make sense for efficient comparison.

noelyim · September 14, 2015, 4:30pm

True. The timing may be off. Both logs files are xml format.
So, something like this: pick a data from log1 find the same data in log2
-if found, pick the next data from log1 and find it in log2 , if match - the two logs are sycn
-if not found, the two logs are out of sync

Do you have example? How does hash 'not equal' comparison work?
Should I have 2 logstash instances and both of them supply to elasticsearch?

Thanks

Mark_Harwood · September 14, 2015, 4:59pm

Your approach would be too slow if you are talking about running searches as they involve disk seeks which are hardware operations that are expensive regardless of what software you use.
It always makes sense to reduce the number of seeks so if you can process each set of data as a stream and compute a single value (number of docs? size of all docs? hash of contents?) then you can compare these 2 values much more cheaply.

Topic		Replies	Views
Comparing fields of log files with different index in kibana Elasticsearch	11	2914	August 1, 2017
A couple functionality questions on diffing files and tracking changes Elasticsearch	2	616	January 25, 2017
Use case question - Can Elasticsearch be used as a log de-duplication solution? Elasticsearch	3	492	July 6, 2017
Logstash Elasticsearch plugin compare inputs Logstash	7	496	July 8, 2021
Checking for tampering of indices Elasticsearch	2	548	July 6, 2017

Comparing 2 sources of log input ( using fuzzy? hash? term ?)

Related topics