How to exclude duplicates in Aggregate filter?

Hello!

I'm fetching some data from db with multiple associated tables.

My query:

SELECT `item`.`id`, `item`.`name`, `colour`.`name` AS `colour`, `material`.`name` AS `material` FROM `item`
LEFT JOIN `item_has_colour` AS `ic` ON `ic`.`item` = `item`.`id`
LEFT JOIN `colour` ON `colour`.`id` = `ic`.`colour`
LEFT JOIN `item_has_material` AS `im` ON `im`.`item` = `item`.`id`
LEFT JOIN `material` ON `material`.`id` = `im`.`material`
ORDER BY `item`.`id`

It produces the next dataset:

mysql> SELECT `item`.`id`, `item`.`name`, `colour`.`name` AS `colour`, `material`.`name` AS `material` FROM `item` LEFT JOIN `item_has_colour` AS `ic` ON `ic`.`item` = `item`.`id` LEFT JOIN `colour` ON `colour`.`id` = `ic`.`colour` LEFT JOIN `item_has_material` AS `im` ON `im`.`item` = `item`.`id` LEFT JOIN `material` ON `material`.`id` = `im`.`material` ORDER BY `item`.`id`;
+----+-------+--------+----------+
| id | name  | colour | material |
+----+-------+--------+----------+
|  1 | cat   | red    | ceramic  |
|  1 | cat   | green  | ceramic  |
|  1 | cat   | yellow | ceramic  |
|  1 | cat   | red    | plastic  |
|  1 | cat   | green  | plastic  |
|  1 | cat   | yellow | plastic  |
|  2 | dog   | red    | ceramic  |
|  2 | dog   | yellow | ceramic  |
|  3 | mouse | red    | ceramic  |
|  3 | mouse | blue   | ceramic  |
|  3 | mouse | red    | plastic  |
|  3 | mouse | blue   | plastic  |
+----+-------+--------+----------+
12 rows in set (0,00 sec)

So here are all possible combinations present.

Of all this redundant data I need only colours an materials sets for each item.

But obviously, when I aggregate all this with filter:

filter {
  aggregate {
    task_id => "%{id}"
    code => "
      map['id'] = event.get('id')
      map['name'] = event.get('name')
      map['colours'] ||= []
      map['colours'] << {'colour' => event.get('colour')}
      map['materials'] ||= []
      map['materials'] << {'material' => event.get('material')}
      event.cancel()
    "
    push_previous_map_as_event => true
    timeout => 3
  }
}

I'm getting a whole lot of duplicates:

 {
          "name" => "cat",
    "@timestamp" => 2018-07-20T15:27:37.393Z,
      "@version" => "1",
       "colours" => [
        [0] {
            "colour" => "red"
        },
        [1] {
            "colour" => "green"
        },
        [2] {
            "colour" => "yellow"
        },
        [3] {
            "colour" => "red"
        },
        [4] {
            "colour" => "green"
        },
        [5] {
            "colour" => "yellow"
        }
    ],
            "id" => 1,
     "materials" => [
        [0] {
            "material" => "ceramic"
        },
        [1] {
            "material" => "ceramic"
        },
        [2] {
            "material" => "ceramic"
        },
        [3] {
            "material" => "plastic"
        },
        [4] {
            "material" => "plastic"
        },
        [5] {
            "material" => "plastic"
        }
    ]
}

Had tried to get rid of duplicates with conditional not in statement, but simply can't match the syntax.

h = { "colour" => event.get("colour") }
if ! a.include?(h)
    a << h
end

It works!

The only remark the double quotes in your example should be replaced with single ones.

Thank you for help!

By the way, what language is this?

It is Ruby.

For code options you can use either

code => '
     map["id"] = event.get("id")
'

or

code => "
     map['id'] = event.get('id')
"

I lean towards the former, the documentation tends to use the latter.

Hi
i am using following filter code where i am getting duplicate 'Specality'

code => "
map['SpecilaityList'] ||= []
map['SpecilaityList'] << {
'Specialty' => event.get('Specialty')
}
event.cancel()
"

Can you please assist the ruby part to exclude duplicate values. I am getting syntax issues for my tries.

  map['SpecilaityList'] ||= []
  sl = {'Specialty' => event.get('Specialty')}
  if ! map['SpecilaityList'].include?(sl)
    map['SpecilaityList'] << sl
  end

Also you might want to use specialty instead of Specialty, because of lowercase everywhere here by default, AFAIK.

It worked. thank you very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.