Filtering nested Data


#1

Given is the following Data Structure:

BSEG
[
{ "DMBTR": 402.4, "KOART": "D" },
{ "DMBTR": 356.46, "KOART": "S" }
]

In Kibana we are able to SUM the both DMBTR Fields with SUM BSEG.DMBTR.

But i can't figure out, how to SUM only the Lines where KOART="D". I get always the SUM off all Fields.


(swarmee.net) #2

Hey open7,

The default behaviour for elasticsearch is to store data in a "flat" index. Which means under the covers your data looks like this

"DMBTR": [ 402.4, 356.46 ],
"KOART": ["D" , "S"] 

Basically while the source document has a nested structure - its lost that structure in elasticsearch. So if you try to limit your search to only KOART=D basically what its doing is finding all documents that have at least one "D" and then adding up all the DMBTRs for those documents (there is no linkage between DMBTR and KOART).

If its easy / appropriate you could "spilt" your events in logstash - so you get one event for each BSEG. Or some other transforms in your ingestion process. Maybe that would work for you.

Else you can go down the nested index path. There is a bit of a learning curve with nested indexes - and Kibana can not query the nested components "easily" without custom filters or using vega queries. It all depends on the complexity of your events and how you want to search them.

I did a presentation on complex event modelling in elastic search previously. If you take a look at the presentation here particularly slides 8 and 9 - you will see a pretty clear demonstration of how the default mapping works and what nested mapping provides.


(Werner Kroiss) #3

Thank you for sharing your insights (like the tip to set the "default_operator of AND" and the presentation. To summarize my conclusions:

  1. For searching we can keep the raw data in the nested format.
  2. For analysis we need to split it

For our example this means:
1 index holding all financial documents (for searching)
n Indices for each KOART and each account


(swarmee.net) #4

Heya Fantast7,

Firstly, the absolutely easiest thing to do would be to try to format / transform your input data to look like this (basically put the field values in the field name);

"BSEG.KOART.D" : 402.4,
"BSEG.KOART.S" : 356.46,
I would do this if you can. These fields could be added additionally to the original fields (same data with different structure in the same index).

Second, if you 'split ' a event by a repeating group in your data into multiple sub events you can send all the split events to a single index (no need to have a separate index for each KOART and account). This option seems pretty easy - consider setting best compression in your index settings which should save you some space. E.g.

{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 0,
    "index.codec": "**best_compression**",
    "index.refresh_interval" : "1s"
  },

Thirdly, If you are running elastic 6.2 (which includes vega) and you are willing to learn how to use nested mappings and nested queries and vega-lite, consider duplicating the BSEG field and storing it in both nested and non nested structure in the same index. E.g.

{
  "BSEG": [
    {
      "DMBTR": 402.4,
      "KOART": "D"
    },
    {
      "DMBTR": 356.46,
      "KOART": "S"
    }
  ], "BSEGnested": [
    {
      "DMBTR": 402.4,
      "KOART": "D"
    },
    {
      "DMBTR": 356.46,
      "KOART": "S"
    }
  ]
}

This will means that you need to set a index _mapping before you post any data into the index. Basically just set the "type" of the "BSEGnested" field to be "nested".

"BSEGnested": {
  "type": "nested",
  "properties": {

Once you have done that you can create vega-lite queries against the nested documents. Normal visualisations will not be able be to be run against the nested field - however you will still have the flatten ones to query against.

There are a few options - the third one requires a lot more effort than the first and second one.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.