Hi,
I could use some help. I need to parse a specific log format. It looks like this:
{"properties":{"columns":["time","v0"],"db":"default","vs":"/Common/Test_10+/Common/BaDOS","metric":"base.conns","sample_rate":"1000"},"values":[
[1536828341000,0],
[1536828343000,0],
[1536828344000,0],
[1536828345000,0],
[1536828346000,0]
Every second a line is added. I would need the fields: columns, db, vs, metric, sample_rate and the values. The first entry in values is the unixtime stamp. The Server only inserts the values as new log lines. So any help on how to achieve this would be great.
Are these files on disk ? Are you using beats or Logstash to send these files to Elasticsearch ? Can you provide an example of the data (as is, or want to be) sent to Elasticsearch ?
Every second a line is added.
Are you trying to process the lines out of JSON array ? If so, it would be much easier to process actual JSON with a reasonably sized values array.
Yes these files are on Disk. I tried using filebeats but it‘s not indexing the fields. Yes the JSON Array is extended every second with a new timestamp and value. What i get at the Moment is that the file is read line for line. So the Field names are missing. What do you mean with a reasonably sized values array. How do I configure that?
Thanks
Once you can get fully formed JSON as the payload you have much better options with processing (if you need it). For example in Logstash, you will want to look at the split filter Split filter plugin | Logstash Reference [8.11] | Elastic to possibly create new events for each value in the values array (e.g. full JSON/Elasticsearch documents). You may also be also want to look at the ingest node's Foreach processor : Foreach processor | Elasticsearch Guide [8.11] | Elastic to accomplish the same, but would likely require some custom scripting. Both ways will likely take a bit of data wrangling to shape documents to index to meaningful way. (i.e. there probably is not an easy button for this type of log file)
What do you mean with a reasonably sized values array. How do I configure that?
You said that the values array is added to every second. I assume at some point, the system logging the event will stop adding to the current JSON and start a new JSON structure. If that happens every 1 minute, then you will have 60 values in your value array (which is very reasonable) and allows the event to be sent every 1 minute. However, if the server keeps adding new lines every 1 second for a whole day, then the size of the values arrays is starting to get unreasonable and the delay to get a fully a formed JSON value is also an unreasonable 1 day delay before processing can start. How often a new JSON is created (and how many values are in the array) is a property of the server generating the logs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.