We built a solution leveraging the ELK to parse and generate reports out of AWS billing files. Currently we are indexing all the columns available in the log file as it is. But we found that the number of columns can be changed in the future like more columns can be added or dropped based on how user managing the resource tags names in aws.
How we handle this in ES or even at Logstash level so that the indexing process does not break even if the input file columns changes ?
Appreciate your help if anybody came across any similar cases and resolved it.
Hi, Thank you for your reply. I have imported a template in ES first with the exact list of columns available in the input file before indexing the files. And in the logstash configuration I have given same exact list of columns in the filter.
Basically what I am trying to figure out is way in ES or through Logstash wherein we can map the actual column names with the values dynamically. (assuming the first row of the file will be the column header)
In this way we don't mess up the order by dropping or adding columns in the input files and indexing should process file correctly
I have very limited experience on logstash. But from my elasticsearch experience, this is doable.
For example you can have a python script to read the file parse each line into dictionary and just send that to elasticsearch. Elasticsearch will handle columns for you.
So if there's any missing column in your data, elasticsearch will not put any value for it, but other columns in the row can still have values (so it's just like how other no-sql handles it). If there's extra column in the data elasticsearch can create dynamic mapping for it, meaning it'll try to identify the field type and create the column for you.
I know logstash support csv but not sure if it has any feature like this. Another question is why your csv column changes? AWS billing csv should have fixed format.
Thank you so much for the reply. The columns in the billing file can be customized. For example, we can add user tags also in the DBR file. But we will know when the order of columns or count altered.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.