CSV filter plugin and changing column headers

I have a set of CSV files which have similar but not identical headers. I import these using the automatic header detection and autdetect numerics to set correct data types.

Depending on the process that generates them, some columns may not match between files and feature different headers.

Furthermore, some columns only start getting values added a few rows down into the file. Here's an example:

I am able to capture partial sets of these files into Elasticsearch, but end up getting many errors such as this one in the process as well:

2020-01-17T16:01:19,784][DEBUG][o.e.a.b.TransportShardBulkAction] [AMIENS] [llmlogs-2020.01-000001][0] failed to execute bulk item (index) index {[llmlogs-2020.01-000001][_doc][6wEFtG8B9YgiWpSw2Rkn], source[n/a, actual length: [10.9kb], max length: 2kb]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [WorkingSetSize] of type [float] in document with id '6wEFtG8B9YgiWpSw2Rkn'. Preview of field's value: 'WorkingSetSize'

It would appear the automatic column recognition is attempting to recognise the header as a value as well. Do you have any advice on why this may be happening?

Do you have pipeline.workers set to 1?

Yes, I had that set to 1 all throughout development. It was mentioned this was necessary for column header auto detection.

Here's my logstash config:

- pipeline.id: pipelineLLM
  pipeline.workers: 1
  pipeline.batch.size: 1
  path.config: "/depot/YAGERTools/ELK/7_4_2/logstash-7_4_2/config/pipelineLLM.config"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.