Hello!
I am trying to aggregate some data from DB by Logstash.
My data in db looks like this:
+----+----------------+---------------------+----------------------+----------------------+
| # | product_id | product_name | property_name | property_value |
+----+----------------+---------------------+----------------------+----------------------+
| 1 | 100 | pc | colour | black |
+----+----------------+---------------------+----------------------+----------------------+
| 2 | 100 | pc | colour | silver |
+----+----------------+---------------------+----------------------+----------------------+
| 3 | 100 | pc | ram | 16Gb |
+----+----------------+---------------------+----------------------+----------------------+
| 4 | 100 | pc | hdd | 200Gb |
+----+----------------+---------------------+----------------------+----------------------+
| 5 | 101 | printer | colour | black |
+----+----------------+---------------------+----------------------+----------------------+
| 6 | 101 | printer | features | wifi |
+----+----------------+---------------------+----------------------+----------------------+
| 7 | 101 | printer | features | scanner |
+----+----------------+---------------------+----------------------+----------------------+
| 8 | 101 | printer | type | mate |
+----+----------------+---------------------+----------------------+----------------------+
| 9 | 102 | laptop | features | wifi 5Ghz |
+----+----------------+---------------------+----------------------+----------------------+
| 10 | 102 | laptop | colour | white |
+----+----------------+---------------------+----------------------+----------------------+
| 11 | 102 | laptop | hdd | 512Gb |
+----+----------------+---------------------+----------------------+----------------------+
I want to aggregate data by product_id, property_name in the following way:
[
{
"id": 100,
"name": "pc",
"properties": {
"colour": [
"black",
"silver"
],
"hdd": [
"200Gb"
],
"ram": [
"16Gb"
]
}
},
{
"id": 101,
"name": "printer",
"properties": {
"features": [
"wifi",
"scanner"
],
"colour": [
"black"
],
"type": [
"mate"
]
}
},
{
"id": 102,
"name": "laptop",
"properties": {
"features": [
"wifi 5Ghz"
],
"colour": [
"white"
],
"hdd": [
"512Gb"
]
}
}
]
For this purpose I am trying to use aggregate filter plugin and read example #4 of docs and this topic.
Here is my logstash.conf
(filter part):
filter {
aggregate {
task_id => "%{product_id}"
code => "
map['product_id'] = event.get('product_id')
map['product_name'] = event.get('product_name')
map['properties'] ||= {}
map[event.get('property_name')] ||= []
map[event.get('property_name')] << event.get('property_value')
event.cancel()
"
push_previous_map_as_event => true
timeout => 3
}
}
The output of filtering is:
{"product_id":100, "product_name":"pc", "properties":[], "colour":["black","silver"], "ram":["16Gb"], "hdd":["200Gb"]}
{"product_id":101, "product_name":"printer", "properties":[], "colour":["black"], "type":["mate"], "features":["wifi","scanner"]}
{"product_id":102, "product_name":"laptop", "properties":[], "colour":["white"],"hdd":["512Gb"], "features":["wifi 5Ghz"]}
But I need that properties (like "colour", "ram", "hdd") will be inside "properties" field.
For this purpose I tried to use
map['properties'] ||= []
map['properties'] << {
map[event.get('property_name')] ||= []
map[event.get('property_name')] << event.get('property_value')
}
But that doesn't work.
I'm not familiar with that syntax, so any idea how to put properties (like "colour", "ram", "hdd") inside "properties" ? Am I missing something?
Thank you!