We will get csv exports of error data from our customer and want to import this export into our own elasticsearch instance to analyze the provided data (log data).
We use losgtash to import csv files to elasticsearch. Each CSV file may have different columns or different column order - depending on the data the customer is sending us.
To be able to deal with "any" provided data I use csv filter plugin with automatic detection of column / field names.
So in my understanding logstash is remembering the column names as long as the pipeline is running. So it needs to be restarted before switching over to the next csv file with different layout, correct?
My idea is now to provide following workflow for the csv import:
- provide csv file in a configured path
- start logstash
- let logstash process the file and wait until it is done
- when logstash is done then shut it down
- remove the csv file from the configured path
If I only have one main pipeline, is there a way to close logstash at end of file automatically? The file is fix, nothing will be added later. If there is such an option I could automate the process. (job is checking for new csv files, if there are any then spawn a logstash container for processing. when it is done, remove the file.)