Indexing through logstash - varying columns in the input

deepu.sundar · November 15, 2016, 7:00pm

Hi Team,

We built a solution leveraging the ELK to parse and generate reports out of AWS billing files. Currently we are indexing all the columns available in the log file as it is. But we found that the number of columns can be changed in the future like more columns can be added or dropped based on how user managing the resource tags names in aws.

How we handle this in ES or even at Logstash level so that the indexing process does not break even if the input file columns changes ?

Appreciate your help if anybody came across any similar cases and resolved it.

suanmeiguo · November 15, 2016, 7:48pm

I think ES is supporting that. Did you have any specific issue?

deepu.sundar · November 15, 2016, 8:02pm

Hi, Thank you for your reply. I have imported a template in ES first with the exact list of columns available in the input file before indexing the files. And in the logstash configuration I have given same exact list of columns in the filter.

filter {
csv {
columns => ["InvoiceID","PayerAccountId","....
separator => ","
}

I see that when the columns list if different than in the configurations, some records are not getting indexed properly. Am I missing anything ?

deepu.sundar · November 15, 2016, 11:17pm

Basically what I am trying to figure out is way in ES or through Logstash wherein we can map the actual column names with the values dynamically. (assuming the first row of the file will be the column header)

In this way we don't mess up the order by dropping or adding columns in the input files and indexing should process file correctly

suanmeiguo · November 17, 2016, 6:59pm

Hi Deepu,

I have very limited experience on logstash. But from my elasticsearch experience, this is doable.

For example you can have a python script to read the file parse each line into dictionary and just send that to elasticsearch. Elasticsearch will handle columns for you.

So if there's any missing column in your data, elasticsearch will not put any value for it, but other columns in the row can still have values (so it's just like how other no-sql handles it). If there's extra column in the data elasticsearch can create dynamic mapping for it, meaning it'll try to identify the field type and create the column for you.

I know logstash support csv but not sure if it has any feature like this. Another question is why your csv column changes? AWS billing csv should have fixed format.

deepu.sundar · November 18, 2016, 7:21pm

Hi Vincent,

Thank you so much for the reply. The columns in the billing file can be customized. For example, we can add user tags also in the DBR file. But we will know when the order of columns or count altered.

suanmeiguo · November 18, 2016, 8:59pm

Got it. Yeah then I think it's not a problem for elasticsearch or logstash. Good luck!

system · December 16, 2016, 8:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Restrict Logstash to map values to its specific Columns Using CSV filter Logstash	2	874	July 6, 2017
How to create dynamic columns for a CSV plugin Logstash	1	918	July 3, 2017
CSV with different number of columns Logstash	1	636	July 6, 2017
Is it possible skip columns from csv file while uploading data into elasticsearch? Logstash	5	1114	December 20, 2019
Logstash can't recognize csv columns Logstash	6	737	September 20, 2019

Indexing through logstash - varying columns in the input

Related topics