Some Kibana SIEM feature not working with arrays

Hi All,

To be able to use the Elastic SIEM feature with our own data, we have decided to be ECS compliant regarding the values (IP,... ) but also the enrichment values (AS name,... ). We are doing the enrichment our-self with our own software and intelligence. As we can have multiple enrichment values, to simplify we have decided to store enrichment information as array. A single value will also be added as an array, whatever the size of the array, it will be an array. This is much easier for us to manage in our python code.

We have been impressed how we can easily use the Elastic SIEM features just be complying with the ECS schema. However, we are facing some issues regarding how we enrich our events with arrays. Indeed, it looks like all features (as far as I know) works with arrays but with some specific fields. Indeed, the network -> flows feature is complaining about:

String cannot represent an array value: [Akamai International B.V.]

for the field destination.as.organization.name leading to a broken visualization:

This is how our field looks like:

"destination": {
  "as": {
    "organization": {
      "name": [
        "Akamai International B.V."
      ]
    },
    "number": [
      20940
    ]
  }
}

We are using Elastic stack 7.7.1

It looks like the feature expect the field to contain a string rather than array. Fair enough I would say but as it's working with arrays for all others features, I would expect it to work as well.

Would it be worth opening a github ticket to discuss farther?

Thanks!

Hey @obuez !

Thanks for your post. Glad to hear that you have found SIEM + ECS easy to use.

Indeed, one of the cool things about Elasticsearch is that any field can contain 0 or more values of matching types. The error you specified in your post is coming from GraphQL expecting as.organization.name to be a string - this expectation is based off the ECS field reference.

You're right in pointing out that many features do work with arrays. The reason that this autonomous system field is not expecting one is because they are meant to uniquely identify each network. So destination.as is a nested field - were you to want to identify numerous values you could do so as follows (pseudo coded out):

destination.as: [{ organization: { name: <NAME_1> }, number: <NUMBER_1> }, { organization: { name: <NAME_2> }, number: <NUMBER_2> }]

However, it looks like we may not be expecting destination.as to be an array in the flow you pointed out. It could certainly be worth opening a ticket to discuss that further!

I hope that this helps! The various links I referenced go into further detail on this.

Best,
Yara

1 Like

@obuez hope you had a good weekend! A colleague of mine (@Frank_Hassanabad) made a great observation that looking at the mappings it appears that destination.as is not in fact a nested field. It is nested, but not a nested field which are two different things. So it seems our flows are correct in their assumptions.

Our UI does not support the use of nested fields in aggregations (used in the visualizations).

Our mapping is as follows:

"as" : {
          "properties" : {
            "number" : {
              "type" : "long"
            },
            "organization" : {
              "properties" : {
                "name" : {
                  "type" : "keyword",
                  "fields" : {
                    "text" : {
                      "type" : "text",
                      "norms" : false
                    }
                  },
                  "ignore_above" : 1024
                }
              }
            }
          }
        },
1 Like

@obuez,

Would it be worth opening a github ticket to discuss farther?

Yes, it would be worth opening up a Kibana ticket if you want to keep most of your data structures as an array and attach your mapping and anywhere you see bugs. The more clear and concise and specific with your use cases you can be, the better chances of others in the community seeing the same problem or wanting the same use case and :+1: it which increases its priority within our backlogs.

At the moment timeline is the most compliant with arrays as it makes everything it sees into an array, however, during development and when writing features not everything within SIEM follows the rule that anything and everything could be an array so from time to time something is inevitably not going to work.

It is tricky as Elasticsearch allows anything to be an array, but practically as developers write their code they might miss that aspect or expect something to never be an array. Likewise they might expect something to always be an array when sometimes it is not, but this is a lot lot less likely.

Practically, how well is your solution going to hold up using arrays for everything? Since we do not explicitly test every piece of ECS as being an array you are probably going to upgrade and encounter bugs when using new features either from our side or from other parts of the product as the product isn't always tested expecting everything to be any array.

Moving forward it might be better to only keep things arrays that look like they should be arrays but anything else that you don't need as an array as not an array.

How do you find this information out? How do you understand how likely part of ECS is going to be well tested as an array? I would say loading a few beats such as auditbeat, metricbeat, etc..., running a few modules and then seeing what looks like an array directly in the data is your best practical way to make a choice today if you decide to change how you are utilizing things.

This doesn't mean we won't add exhaustive test cases and scenarios with everything being an array in the future to make more parts of our code more resilient, just that this is the most practical way to keep your current solution operating with 100% usability of our security applications.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.