Grok extraction of field based on regex match on Index Name of event in question possible?


(Andrew) #1

This is a general enquiry. I have read multiple articles around how you can forcefully add a year to an existing date/timestamp that does not contain the year in the input message.

What I'm looking to achieve is the extraction of the year from the index name which can then be used to populate the date in the timestamp field to bring consistency to the timestamp format across my systems. (In particular, bringing my Ubuntu log set inline with Fortigate and Centos 7 audit logs).

The benefits to be realized as I see it are;

  1. Timestamp of the event will be associated with the event accurately and allow for consistent visualization across multiple devices.
  2. Archival of indexes which may span multiple years 'will' * independent sanity check required here* will contain the appropriate timestamp if historical data needs to be queried at a later point in time.
  3. Avoids adding arbitrary year fields to the date, potentially introducing inconsistent data and ultimately resulting in skewed output. (bad stuff in, bad stuff out).
  4. Hopefully avoiding the need to use a Ruby filter do some abstract extraction of data if existing filters could be used for this purpose (such as grok?).

Basically I have a timestamp in the following format. "Jan 14 10:01:01, Jan 14 10:01:01" - I'll look at resolving the duplication issue. For the mean time, its missing the year.

The index name in question is linintfilebeat-2017.01.13. I'm looking for a way to extract "2017" from the index name and storing in the field "year". Then appending the field "year" after "Jan 14" in the time stamp.

The way I see it is if I use this approach, the date/time will be accurate at the turn over of the day, week, month and year.

Please let me know if I have inaccuracy in my thought process here or whether this is possible or not. I've read over the types of filters a few times and I don't see a method which can achieve extraction on an index name. I may have completely misinterpreted what I have read or possibly over looked an option. Singing out to the wider community for assistance here.

Hugely appreciated.

Cheers,
Andrew


(Andrew Cholakian) #2

If I understand your question correctly, you want to take the year out of the index name and get that into the documents you already have.

The best way I can think of is to use the elasticsearch input and use the docinfo_fields to pull that stuff out, then reindex. Does that make sense?


(Andrew) #3

Yes, that sounds like what I need. I was looking at Logstash inputs but didn't consider the use of the Elasticsearch inputs. I'll give that a try and report back. :slight_smile:


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.