Csv filter can only be used for one file? (multiple files messes up fields/columns)

Hi,

I'm having a big problem with the csv filter. I got four .csv files, each with a separate .conf.

When I run those configs in stdout they are fine. When I run logstash and have only one of them active, data gets imported fine as well.

However as soon as I got two or more .conf files using the csv filter it messes up the data.

E.g. Index1 looks OK but index2 has some fields from index1 and some column names have become field values etc. It is very strange.

I also noticed that both of those two indexes also have the reverse dns lookup field from by netflow.conf.

Looks like Logstash is doing something strange somewhere or something is wrong with the way I got logstash set up. How can I fix this?

Index1 csv headers

Date	Duration	KBytes	Service	Label	APN	Code	Lat	Lon	CN0	Message	TxKBytes	RxKBytes

Index2 csv headers

Date	Block	Message	Code	Lat	Lon	CN0

index1 json

{
  "_index": "data-2017.09",
  "_type": "data",
  "_id": "AV5vIhbX0t3riYLIY8_W",
  "_version": 1,
  "_score": null,
  "_source": {
    "TxKBytes": "73876",
    "Message": "Power supply was turned off",
    "RxKBytes": "487211",
    "CNO": "67.8",
    "Label": "Default",
    "Service": "Standard",
    "Data": "2017-09-01 13:05:05",
    "Duration": "20:03:46",
    "Lon": "lonvalue",
    "type": "data",
    "Code": "errorcode",
    "path": "/home/test/Desktop/test/test2/data/data.csv",
    "netflow": {
      "ipv4_src_host": "%{[netflow][ipv4_src_addr]}",
      "ipv4_dst_host": "%{[netflow][ipv4_dst_addr]}"
    },
    "@timestamp": "2017-09-11T04:10:58.551Z",
    "KBytes": "1",
    "@version": "1",
    "host": "ELK-test",
    "Lat": "latvalue",
    "APN": "myapn"
  },
  "fields": {
    "@timestamp": [
      1505103058551
    ]
  },
  "sort": [
    1505103058551
  ]
}

index2 json

{
  "_index": "event-2017.09",
  "_type": "event",
  "_id": "AV5vIhYL0t3riYLIY89s",
  "_version": 1,
  "_score": null,
  "_source": {
    "Label": "latvalue",
    "Service": "errorcode",
    "Data": "2017-09-10 11:38:23",
    "Duration": "ADE",
    "type": "event",
    "Code": CNOvalue",
    "path": "/home/test/Desktop/test/test2/event/event.csv",
    "netflow": {
      "ipv4_src_host": "%{[netflow][ipv4_src_addr]}",
      "ipv4_dst_host": "%{[netflow][ipv4_dst_addr]}"
    },
    "@timestamp": "2017-09-11T04:10:58.354Z",
    "KBytes": "Notice: Status (Signal).",
    "@version": "1",
    "host": "ELK-test",
    "APN": "lonvalue"
  },
  "fields": {
    "@timestamp": [
      1505103058354
    ]
  },
  "sort": [
    1505103058354
  ]
}

As you can see the second index has values that A) should not be there and B) are in the incorrect field. Both have the netflow fields as well which should no be there either.

Index1 conf

input {
  file {
    type => "data"
    path => "/home/test/Desktop/test/test2/data/data.csv"
    sincedb_path => "/dev/null"
    start_position => "beginning"
  }
}


filter {
    csv {
    columns => ["Data","Duration","KBytes","Service","Label","APN","Code","Lat","Lon","CNO","Message","TxKBytes","RxKBytes"]
    }
    date {
    match => [ "Date", "yyyy-MM-dd HH:mm:ss" ]
    timezone => "UTC"
    target => "@timestamp"
    }
    mutate {
    remove_field => ["message", "Date"]
    }
}

output {
if [type] == "data" {
elasticsearch {
hosts => localhost
index => "data-%{+YYYY.MM}"
}
}
}

Index2 conf

input {
  file {
    type => "event"
    path => "/home/test/Desktop/test/test2/event/event.csv"
    sincedb_path => "/dev/null"
    start_position => "beginning"
  }
}


filter {
    csv {
    columns => ["Date","Block","Message","Code","Lat","Lon","CNO"]
    }
    date {
    match => [ "Date", "yyyy-MM-dd HH:mm:ss" ]
    timezone => "UTC"
    target => "@timestamp"
    }
    mutate {
    remove_field => ["message", "Date"]
    }
}

output {
if [type] == "event" {
elasticsearch {
hosts => localhost
index => "event-%{+YYYY.MM}"
}
}
}

Instead of splitting conf files on Index basis, pls splitting conf files like this
1_input.conf
2_filter.conf
3_output.conf

Add appropriate sections of each index in each conf file.

Thanks but I'm not sure I understand. That sounds very different from what i've seen/read so far. Do you have any examples?

My understand is that you could use the type field to separate config files.

In my use case, I had to do that, will share more.
On mobile right now, will get back in a couple of hours.

Thank you.

Maybe @magnusbaeck could also chime in as to why the [type] filter apparently cannot be used to separate configs?

Pl refer to the above link.

Type can still be used.
Magnus of course, can solve it in a sec.

You can create a separate file input for each file type and assign a tag there. Then use conditionals based on this tag to select the appropriate csv filter.

In your case, apply if [type] in your filter section of both confs appropriately. It would take care I think.

How is that any different from input type => and output if type == data I got there already?

input
Tag => something

filter
as is

output
If "something" in [tags]

Should work? Should I keep type in input and output?

Apart from separating outputs based on input, you have to separate filter processes as well. Like

filter {
    if [type] == "event" {
        # do stuff
    }
    else if [type] == "data" {
        # do other stuff
    }
}

Having different configurations in different files has no effect on filter segregation since Logstash just concatenates them in one big config internally.

2 Likes

It is the same, but you should use conditionals in your filter block too.

Seems this is doing the trick :slight_smile:

Will test further.

Though I'm seeing problems with converting gps locations to an integer. Logstash is cutting off everything after a comma (,) or dot (.). Is there any way to avoid this?

Please don't ping people that aren't part of the thread like that.

Magnus volunteers his time here :slight_smile:

2 Likes

I'm sorry, wont do it again. I appreciate all the work people put in here trying to help others.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.