Vega Sankey sorted (ordered composite aggs)

Hi there.

I'd like to create a Vega Sankey visualization correctly sorted.
In fact, I was making such a visualization about traffic from_ip -> to_ip.

Problem is, the most frequent ips in the Sankey are not correct. Comparing them to the ones appearing in a table, to real top ones are not shown in the Sankey.

After some research I found out the composite aggs (used by the Sankey) is not perfectly sorted since it'd be way too heavy in computation.
In fact, running in Dev Tools the same aggs used in the Vega url to build the Sankey, results are absolutely not sorted by doc_count.

Now, what might be a proper solution to this problem (apart from not using Vega Sunkey)?

Also, I tried simply getting the top results using a sized query, but it didn't solve the problem. Ideas?

Here's my query:

GET my_index*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "from_ip"
          }
        }
      ]
    }
  },
  "aggs": {
    "table": {
      "composite": {
        "size": 50,
        "sources": [
          {
            "stk1": {
              "terms": {
                "field": "from_ip.ip",
                "order" : "desc" 
              }
            }
          },
          {
            "stk2": {
              "terms": {
                "field": "to_ip.ip",
                "order" : "desc" 
              }
            }
          }
        ]
      }
    }
  }
}

To reproduce the issue on a 7.3 Kibana instance you can ingest some fake data in a composite_test running the following in the Dev Tools section:

PUT composite_test/_bulk?refresh
{"index":{"_id":1}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":2}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":3}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":4}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":5}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":6}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":7}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":8}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":9}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":10}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":11}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":12}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":13}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":14}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":15}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":16}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":17}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":18}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":19}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":20}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":21}}
{"from_ip":"from_B","to_ip":"to_B"}
{"index":{"_id":22}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":23}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":24}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":25}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":26}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":27}}
{"from_ip":"from_C","to_ip":"to_B"}
{"index":{"_id":28}}
{"from_ip":"from_A","to_ip":"to_B"}
{"index":{"_id":29}}

And then running the following to make the composite aggs:

GET composite_test/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "from_ip"
          }
        }
      ]
    }
  },
  "aggs": {
    "table": {
      "composite": {
        "size": 50,
        "sources": [
          {
            "stk1": {
              "terms": {
                "field": "from_ip.keyword"
              }
            }
          },
          {
            "stk2": {
              "terms": {
                "field": "to_ip.keyword"
              }
            }
          }
        ]
      }
    }
  }
}

You will most likely have a response like the following:

"aggregations" : {
    "table" : {
      "after_key" : {
        "stk1" : "from_C",
        "stk2" : "to_B"
      },
      "buckets" : [
        {
          "key" : {
            "stk1" : "from_A",
            "stk2" : "to_B"
          },
          "doc_count" : 13
        },
        {
          "key" : {
            "stk1" : "from_B",
            "stk2" : "to_B"
          },
          "doc_count" : 5
        },
        {
          "key" : {
            "stk1" : "from_C",
            "stk2" : "to_B"
          },
          "doc_count" : 10
        }
      ]
    }
  }

As you can see, they're not sorted by the doc_count, which is what I'd like to obtain.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.