Duplicate entries while using mutate

humartinez · August 29, 2019, 7:54pm

Hi there!
Im having a issue (random I think) when using mutate and split. This is my logstash config

input {  
 kafka {
   bootstrap_servers => "localhost:9092"
   topics => ["topic"]
    }
}

filter {
   mutate {
   split => ["message",","]
   add_field => { "field-a" => "%{[message][0]}"}
}
   mutate {
   split => ["message",","]
   add_field => { "field-b" => "%{[message][1]}"}
 }
   mutate {
   split => ["message",","]
   add_field => { "field-c" => "%{[message][2]}"}
}
   mutate {
   split => ["message",","]
   add_field => { "field-d" => "%{[message][3]}"}
   }
}
output {
   elasticsearch {
   hosts => ["localhost:9200"]
   index => "index"
 }
}

An Im getting at index tiem the following entries duplicated

hits" : [
    {
    "_index" : "index",
    "_type" : "_doc",
    "_id" : "wvnd3mwB1Aphc7auPmoX",
    "_score" : 1.0,
    "_source" : {
      "field-c" : [
        "11",
        "11",
        "11"
      ],
      "field-b" : [
        "00211",
        "00211",
        "00211"
      ],
      "@timestamp" : "2019-08-29T19:32:16.745Z",
      "@version" : "1",
      "message" : [
        "sgit",
        "00211",
        "11",
        "pendiente"
      ],
      "field-a" : [
        "sgit",
        "sgit",
        "sgit"
      ],
      "field-d" : [
        "pendiente",
        "pendiente",
        "pendiente"
      ]
    }
  }

I don't know why it's triplicates the values for each field. Take note that the message list gives me one value per field

I also changed my logstash config to this, with the same results

filter {
   mutate {
     add_field => { "field-a" => "%{[message][0]}"}
   }
   mutate {
     add_field => { "field-b" => "%{[message][1]}"}
   }
   mutate {
     add_field => { "field-c" => "%{[message][2]}"}
   }
   mutate {
     add_field => { "field-d" => "%{[message][3]}"}
   }
}

Badger · August 29, 2019, 9:31pm

What does the original [message] field look like? (Save a copy of it in another field.)

I would have written that as

mutate {
    split => { "message" => "," }
    add_field => {
        "field-a" => "%{[message][0]}"
        "field-b" => "%{[message][1]}"
        "field-c" => "%{[message][2]}"
        "field-d" => "%{[message][3]}"
    }
}

humartinez · August 30, 2019, 12:41pm

Yes at first I wrote the filter as you suggested but I have the same results, as you can see in the previous output the original field message is preserved and have the following data

    "message" : [
    "sgit",
    "00211",
    "11",
    "pendiente"
  ]

Maybe I think I've found the problem, but I need your opinion. I was having 3 config files similar to the one that I posted. Each of these files takes a different topic from a kafka queue, then performs the same filter (the 3 files) and finally indexes the output to a different index.

When I was working with just one config file the filters worked well (without duplicating the values of the filter) but when I add the others two files, the filters added more values per field as I have showed you.

Right now Im using just one file with tags to redirect the inputs to the correct output.

I this a recommended config or should I use 2 pipelines? and if so why?

Badger · August 30, 2019, 1:02pm

No, that is the message field after it has been split, but that no longer matters.

If you have three configuration files then they are concatenated and events go through all three. So you effectively have

mutate {
    split => { "message" => "," }
}
mutate {
    add_field => {
        "field-a" => "%{[message][0]}"
        "field-b" => "%{[message][1]}"
        "field-c" => "%{[message][2]}"
        "field-d" => "%{[message][3]}"
    }
}
mutate {
    add_field => {
        "field-a" => "%{[message][0]}"
        "field-b" => "%{[message][1]}"
        "field-c" => "%{[message][2]}"
        "field-d" => "%{[message][3]}"
    }
}
mutate {
    add_field => {
        "field-a" => "%{[message][0]}"
        "field-b" => "%{[message][1]}"
        "field-c" => "%{[message][2]}"
        "field-d" => "%{[message][3]}"
    }
}

which certainly will result in field-* being arrays with three entries. If each configuration file is self-contained then yes, you should run it in its own pipeline.

humartinez · August 30, 2019, 1:08pm

Thanks you so much I didn't know that the config would be concatenated!

system · September 27, 2019, 1:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.