When I process the first CSV message
country,city,name,address
USA,NYC,John,TestStreet
It process correctly the message
country => USA,
city => NYC,
name => John,
address => TestStreet
But then when I process a second CSV
country,city,address
Canada,Vancouver,GreenStreet
It will mess up the message using the same columns detected from the first message
country => Canada
city => Vancouver
name => GreenStreet
I already activated pipline.workers : 1 in my logstash.yml. I feel somehow it is buffering the first message columns it has detected and does not try to autodetect the column again for the new messages coming.
Thanks, I already set up the pipeline.workers to 1 and it is uncommented but it does not work. The ingesting keeps using the colums it detected the very first time and does not refresh at every new message.
With autodetect_column_names the code eats the first line to use as column names and never changes them. See this thread for a way to support multiple CSV formats.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.