Vega Sankey sorted (ordered composite aggs)

Fabio-sama · September 2, 2019, 4:11pm

Hi there.

I'd like to create a Vega Sankey visualization correctly sorted.
In fact, I was making such a visualization about traffic from_ip -> to_ip.

Problem is, the most frequent ips in the Sankey are not correct. Comparing them to the ones appearing in a table, to real top ones are not shown in the Sankey.

After some research I found out the composite aggs (used by the Sankey) is not perfectly sorted since it'd be way too heavy in computation.
In fact, running in Dev Tools the same aggs used in the Vega url to build the Sankey, results are absolutely not sorted by doc_count.

Now, what might be a proper solution to this problem (apart from not using Vega Sunkey)?

Also, I tried simply getting the top results using a sized query, but it didn't solve the problem. Ideas?

Here's my query:

GET my_index*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "from_ip"
          }
        }
      ]
    }
  },
  "aggs": {
    "table": {
      "composite": {
        "size": 50,
        "sources": [
          {
            "stk1": {
              "terms": {
                "field": "from_ip.ip",
                "order" : "desc" 
              }
            }
          },
          {
            "stk2": {
              "terms": {
                "field": "to_ip.ip",
                "order" : "desc" 
              }
            }
          }
        ]
      }
    }
  }
}

Fabio-sama · September 3, 2019, 7:49am

To reproduce the issue on a 7.3 Kibana instance you can ingest some fake data in a composite_test running the following in the Dev Tools section:

PUT composite_test/_bulk?refresh
{"index":{"_id":1}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":2}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":3}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":4}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":5}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":6}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":7}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":8}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":9}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":10}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":11}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":12}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":13}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":14}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":15}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":16}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":17}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":18}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":19}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":20}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":21}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":22}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":23}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":24}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":25}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":26}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":27}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":28}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":29}}

And then running the following to make the composite aggs:

GET composite_test/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "from_ip"
          }
        }
      ]
    }
  },
  "aggs": {
    "table": {
      "composite": {
        "size": 50,
        "sources": [
          {
            "stk1": {
              "terms": {
                "field": "from_ip.keyword"
              }
            }
          },
          {
            "stk2": {
              "terms": {
                "field": "to_ip.keyword"
              }
            }
          }
        ]
      }
    }
  }
}

You will most likely have a response like the following:

"aggregations" : {
    "table" : {
      "after_key" : {
        "stk1" : "from_C",
        "stk2" : "to_B"
      },
      "buckets" : [
        {
          "key" : {
            "stk1" : "from_A",
            "stk2" : "to_B"
          },
          "doc_count" : 13
        },
        {
          "key" : {
            "stk1" : "from_B",
            "stk2" : "to_B"
          },
          "doc_count" : 5
        },
        {
          "key" : {
            "stk1" : "from_C",
            "stk2" : "to_B"
          },
          "doc_count" : 10
        }
      ]
    }
  }

As you can see, they're not sorted by the doc_count, which is what I'd like to obtain.

system · October 1, 2019, 7:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Vega sankey sort by Kibana vega	2	547	August 2, 2022
Vega and aggregations Kibana vega	2	892	December 18, 2018
Multilevel Vega Sankey Paths Not Lining Up Kibana vega	4	924	September 24, 2024
Sankey vega using netflow collection with Elastiflow/Elasticsearch/Kibana Kibana vega	1	640	April 11, 2023
Vega aggregation - clueless Elasticsearch	1	729	March 16, 2019

Vega Sankey sorted (ordered composite aggs)

Related topics