Combining information from different blocks of information of the same logfile

Hello,

I am building a Dashboard with some information I retrieve from different devices but I've encountered a problem I don't think I can solve with the parsing of information in Logstash.

The info in logs I am retrieving is either INFO or ERROR information, but for now I am focusing on the INFO lines.

To make sure I only get the information I need, I used the multiline filter in Filebeat and I am creating linesof information like this :

type1:

INFO - TEXT - Sat Aug 18 17:53:45 CEST 2018

{
"wsName": "newFile",_
"Connection type": "WIFI: "WifiName"",
"RAM available": "32.78%",
"CPU usage": "19.72%",
"Internal storage available (MB)": 99999.99,
"External storage available (MB)": 99999.99,
"Connectipvity": "Is available: true. Is connected: true. Type connectivity: WIFI. Wifi signal level: 4 out of 5"
}

type2:

INFO - TEXT - Sat Aug 18 17:53:45 CEST 2018
Task1/Sub_task1 Result: 10

I am using the info I get from both, type1 lines and type2 lines, but I need to combine somehow the info since the type2 lines is fully depending on the connection data I retrieve in the previous line.
That is, I need to link the results of the tasks with the WiFi connection that was used while performing them.

And from here I am a bit lost, since the information is in different lines but I would like to put in a single chart the number of Result:10 I get depending on the quality of the Wifi signal. Is it impossible? Could I link it somehow using the timestamp?

Thanks in advance!

Hm. This is tricky. It sounds like what you're really looking for is the equivalent of a join (from the relational DB world). Elastic doesn't really do joins.

You might be able to get somewhere with bucket aggregations.

It doesn't look like your two log messages have any kind of natural key that you could use to injest them into the same document, either. (I'd see if maybe you could add such an element into your logs.) Time would be an inaccurate key, but it may be the best thing you've got.

What I'd try to do, I think, is try to solve this problem at write time.

For instance (and this is just an example, probably not a very good one), you could bucket all logs into one document per minute or something like that. So, you'd write your documents with an id that's something like: 2018-08-18-17-53, and then update that as docs come in for the same minute. You'd still miss correlations, if your type1 and corresponding type2 came in across different minute boundaries, but it'd give you an approximation. Ideally, your logs would have a real, natural key that you could use to form associations between different, but related log messages.

1 Like

Thanks for the answer Chris. Yes, it is a tricky problem indeed. I guess it's obvious I come from a Oracle SQL world (I've worked mostly with ETL like PowerCenter, ODI and QlikView) so it was difficult for me not to think about joins ^^.

What I thought is that I could add an extra field to each register in which I generate a code based on the timestamp (so I don't use timestamp itself) and try to link it through that field. Also, since the Wifi connection changes sometimes, I also thought of analysing the different Wifi Connections over time VS the Results.

I'll also take a look at the Bucket Aggregations to see if it fits my purpose!

Thank you again :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.