Issues with searching data in array

Hi Team,

Issues in searching the data in kibana. It is not working as expected. The situation is

  1. The kind of logs which are collected are firewall logs

  2. There are 2 fields Action and matches.action

  3. Action specifies the end action taken by firewall and matches.action specifies the preliminary action taken. The values can be either one or more of (log, allow, drop, challenge, simulate)

  4. Since I dont want 2 fields for the same kind of data. I merged matches.action to action which makes the field action an array.

  5. When I'm filtering for a data it is showing what else is not required is also shown.

Eg: when searching for "drop" traffic it is showing "simulate" traffic also as well.
Refer the the image below

Data in the filed is like

screenshot

I will be grateful if anyone helps me with this.

Regards
Karthik.

When you say "I merged matches.action to action which makes the field action an array.", did you use nested data types for doing that ?

Hey Krishna,

Thank you for your helping hand.

I didn't use any nested datatype its just a text datatype upon simple merge operation. It would have been a array now.

Can you please guide me what to do. Or feel free to ask any more info required from my end.

Thanks
Karthik. K

If you have an array of objects under one entity, it is recommended to map data in a nested way. Could you please show how does your mapping looks like for "action" field ?

Hi Krishna,

My index mapping for the field action is as follows.

"action" : {
      "type" : "text",
      "fields" : {
        "key" : {
          "type" : "keyword"
        }
      }
    }

I'm using logstash http_input plugin to ingest data to elastic search from a python script. I have a logstash if condition which does the merge operation which is as follows

if [action] == [matches][action] {
mutate {
  remove_field => "[matches][action]"
  }
}
else {
  mutate {
    merge => { "action" => "[matches][action]" }
    remove_field => "[matches][action]"
  }
}

Could you please advise what are the changes I need to make in the mapping and the logstash config as I'm new to this ?

The Goal is I need the field action to be queriable and should be able to generate visualizations based on that.

Thanks
Karthik.K

Your logstash config looks good, but once your array type of data enters elasticsearch, it will convert the data into flat structure. Meaning, your relationship of the array's objects will be lost. That is the reason why you're getting the hits for match field when you're interested in match.action field and vice versa.
Try denormalising your data before you index it to elasticsearch, rather than keeping it as an array.
Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the nested datatypes where in you can define your mapping for your usecase in a below way:
Eg. Create a mapping for your index where you map your action field to "action.second" and your preliminary action (matches.action) to action.first.

PUT my_index
{
  "mappings": {
    "properties": {
      "action": {
        "type": "nested" 
      }
    }
  }
}

When you start indexing, you could potentially tweak your logstash config file, in a way which indexes your data in the below manner:

PUT my_index/_doc/1
{
  "actions" : [
    {
      "first" : "firewall_sampleAction",
      "second" :  "log"
    },
    {
      "first" : "firewall_sampleAction2",
      "second" :  "drop"
    }
  ]
}

So now when you query, your queries could be framed as:\

GET my_index/_search
{
query": {
"nested": {
"path": "action",
"query": {
 "must": [
{ "match": { "action.first": "firewall_sampleAction2" }}
]
}
}
}

Note: if you're end goal is also to create visualisations/dashboards using this data, then you could avoid using these nested datatypes, as Kibana is yet to provide support for those. In that case, try denormalising your data.

I have a question here. Can a nested data type can hold data with same key name and different value

PUT my_index/_doc/1
{
  "actions" : [
    {
      "actiontaken" : "allow",
    },
    {
      "actiontaken" : "drop",
    }
  ]
}

where allow and drop refers to the 2 actions taken by firewall ?

I don't think that will work as expected. As when you try to use that field in queries by accessing it as "actions.actiontaken" , it wouldn't know which value to specifically consider.

Ok understood, Could you please help me with few lines of logstash config how can I add data to a Nested data type

Hi Krishna,

Requesting you with the above.

Regards
Karthik.K