Logstash Elasticsearch Lookup

Hello,
I am doing a lookup from logstash into Elasticsearch before loading my data, I am running into issue when the data in log file is in array format, e.g. below is my log line:

{"id":1652437414971,"body":{"data":[{"product":"6274c6408085c1476cbd1023"}]},"function":"saveOrder","type":"REQ"

I am parsing the above json and then using below code in logstash config file:

if [function] == "saveOrder" and [type] == "REQ"{
   elasticsearch {
   hosts => ["https://*.*.*.*:****"]
   index => "products-search"
   user => "*********"
   password => "*********"
   query_template => "/etc/logstash/conf.d/query-categories.json"
   fields => { "category" => "categoryName"
                     "subCategory" => "subCategoryName"
                   }
            }
    }

My query_template is below:

{"size": 1,"query":{"match":{"productId": "%{[body][data][product]}"}}}

However, there are no hits, logstash output is as below

{
          "body" => {
        "data" => [
            [0] {
                "product" => "6274c6408085c1476cbd1023"
            }
        ]
    },
      "function" => "saveOrder",
          "type" => "REQ",
            "id" => 1652437414971
}

Now if I remove the square array brackets ( [ ] ) from data field in the log line and change it to below:

{"id":1652437414971,"body":{"data":{"product":"6274c6408085c1476cbd1023"}},"function":"saveOrder","type":"REQ"}


The query returns the data correctly in logstash output:

{
           "function" => "saveOrder",
       "categoryName" => "ELECTRONICS",
               "body" => {
        "data" => {
            "product" => "6274c6408085c1476cbd1023"
        }
    },
                   "type" => "REQ",
    "subCategoryName" => "Mobile",
                 "id" => 1652437414971
}

Where do I need to make the relevant changes (logstash conf or query_template) to take care of the array?

Hi,

How about {"size": 1,"query":{"match":{"productId": "%{[body][data][0][product]}"}}} ?

Thanks for the response @Tomo_M , however that works only for the first element of the array, what if I have multiple (n) elements like below?

{"id":1652437414971,"body":{"data":[{"product":"6274c6408085c1476cbd1023"},{"product":"627356f8c9af61419a1718b8"}]},"function":"saveOrder","type":"REQ"}

Any ideas?
Thanks.

It depends on the desired output from multiple products. Could you show me as json?

@Tomo_M Below is the output I am getting currently for the log line with 2 products in the array with query_template {"size": 1,"query":{"match":{"productId": "%{[body][data][0][product]}"}}}

{
                 "id" => 1652437414971,
               "type" => "REQ",
               "body" => {
        "data" => [
            [0] {
                "product" => "6274c6408085c1476cbd1023"
            },
            [1] {
                "product" => "627356f8c9af61419a1718b8"
            }
        ]
    },
         "@timestamp" => 2022-05-29T04:08:56.947Z,
           "function" => "saveOrder",
       "categoryName" => "ELECTRONICS",
    "subCategoryName" => "Mobile"
}

Product with id 6274c6408085c1476cbd1023 (array element [0) is correctly coming as Electronics and Mobile, however the product with id 627356f8c9af61419a1718b8 (array element [1]) belongs to a diferrent category/subCategory.
So I am hoping to get output something on below lines

{
                 "id" => 1652437414971,
               "type" => "REQ",
               "body" => {
        "data" => [
            [0] {
                "product" => "6274c6408085c1476cbd1023",
                "categoryName" => "ELECTRONICS",
             "subCategoryName" => "Mobile"
            },
            [1] {
                "product" => "627356f8c9af61419a1718b8",
                "categoryName" => "Health and Beauty",
             "subCategoryName" => "Lipstick"
            }
        ]
    },
         "@timestamp" => 2022-05-29T04:08:56.947Z,
           "function" => "saveOrder"
}

Thanks!

Hmm. You'are planning to use nested field type in the output Elasticsearch.

I have no idea to achive that by logstash. As there is no "for" statement in logstash conf, you may need ruby script but I don't know how to combine Elasticsearch filter plugin with it.
Of course, if the number of product for each data is limited, just repeat the elastic filter plugin by copy&paste is possible but not smart.

How about using split filters plugin? Then you will get multiple events for each product in data section. The data structure become flattened and more flexible for some subsequent analysis.

@Tomo_M yes I was also thinking on the same lines, however if I use the split filter plugin on the data field I am getting below message:

[WARN ] 2022-05-29 19:38:47.605 [[main]>worker0] split - Only String and Array types are splittable. field:data is of type = NilClass

{
                 "id" => 1652437414971,
         "@timestamp" => 2022-05-29T12:51:55.782Z,
               "type" => "REQ",
               "body" => {
        "data" => [
            [0] {
                "product" => "6274c6408085c1476cbd1023"
            },
            [1] {
                "product" => "627356f8c9af61419a1718b8"
            }
        ]
    },
           "@version" => "1",
       "categoryName" => "ELECTRONICS",
               "tags" => [
        [0] "_split_type_failure"
    ],
    "subCategoryName" => "Mobile",
           "function" => "saveOrder"
}

My log line:

{"id":1652437414971,"body":{"data":[{"product":"6274c6408085c1476cbd1023"},{"product":"627356f8c9af61419a1718b8"}]},"function":"saveOrder","type":"REQ"}

Logstash config:

input {
file {
path => "/home/user/test.log"
sincedb_path => "/dev/null"
start_position => "beginning"
type => "json"
codec => "json"
}
}


filter {
    if [function] == "saveOrder" and [type] == "REQ"{
        split {field => "data"}
        elasticsearch {
        hosts => ["https://*.*.*.*:****"]
        index => "products-search"
        user => "*********"
        password => "*********"
       query_template => "/etc/logstash/conf.d/query-categories.json"
       fields => { "category" => "categoryName"
                   "subCategory" => "subCategoryName"
                }
                }
        }
}

output {
        stdout {
          codec => rubydebug
        }

}

Not sure what am doing wrong, any idea?

That's because your split field is body > data field. See here.

Try:

split {field => "[body][data]"}
1 Like

@Tomo_M Thanks a lot for your help, it is working as expected now.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.