CSV File re-reading/re-parsing issue 7.10

I am trying to re-read a csv file from the beginning using sincedb set to '/dev/null'. logstash is started using --config.reload.automatic. All entries are blank when the file is re-read. This is not consistent, some times data is read, sometimes not. to trigger re-parsing, i just open csvread.config and add/remove spaces at the end.

command line : bin/logstash -f csvread.config --config.reload.automatic

Configuration file :

input {
  file {
    path => "/home/whoami/result6.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  } 
}

filter {
    csv {
      separator => ","
	autodetect_column_names => true
	autogenerate_column_names => true	  
    }
    mutate {
		add_field => { "out_timestamp" => "%{@timestamp}"} 
	}
    mutate {
		rename => { 
          "active" => "ACTIVE"
          "state_name" => "STATE" } 
	}
    mutate {
		update => {  } 
	}
  ruby {
        code => 
          'event.set("OTH_MAPPING",[])'
	}
    prune {
        whitelist_names => ["out_timestamp", "^ACTIVE$", "^STATE$", "^OTH_MAPPING$"]
    }	
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "csv_read"
  }
 }

result6.csv :
sno,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10
2,column info,75,5175,5038,62,88,5190,5040,62,35
1,column info 1,18666,921906,895949,7291,20954,925401,897147,7300,28

sometimes csv is read as below which is an issue :

{
      "OTH_MAPPING" => [],
    "out_timestamp" => "2021-06-24T12:22:36.555Z"
}
{
      "OTH_MAPPING" => [],
    "out_timestamp" => "2010-06-24T12:22:36.555Z"
}
{
      "OTH_MAPPING" => [],
    "out_timestamp" => "2010-06-24T12:22:36.556Z"
}

ideally it should have read as :

{
        "ACTIVE" => "75",
      "OTH_MAPPING" => [],
    "out_timestamp" => "2010-06-24T12:22:18.434Z",
         "STATE" => "column0"
}
{
        "ACTIVE" => "32",
      "OTH_MAPPING" => [],
    "out_timestamp" => "2010-06-24T12:22:18.435Z",
         "STATE" => "Column1"
}

What are your settings for pipeline.workers and pipeline.ordered?

pipeline.workers is not set.
pipeline.ordered is set to auto in once case and is not set in other scenario, its not working in both the cases.

The documentation says that pipeline.workers must be set to 1 for autodetect_column_names to work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.