Join Style Queries in Kibana

Hi,
Can someone explain to me how I can formulate a query that:

  • Matches documents where the message field contains "string 1" or "string 2"
  • Where the transaction_id is the same for all documents

Thx
D

Elasticsearch, and therefore Kibana, does not support joins, so there is no automatic way of doing this. Often you need to change how you model your data. If you can describe your data in greater detail someone might be able to provide guidance or even point to some workarounds.

I'll need to come back on this @Christian_Dahlqvist. In the meantime could you show me some examples of the kind of thing you're suggesting?

@Christian_Dahlqvist I've looked at this team's data and it is minimally formatted. Everything they're interested in is contained in the message field.

So, a kibana query they might run could look something like this:

level:"ERROR" AND message: "trans-id: xxxxxxxxxxxx" AND ( message: "Some State Info 1" OR message:"Some State Info 2")

The issue with this query, aside from the fact they're sending largely unformatted data, is that they don't know the transaction id at the time when they come to run the query.

How can they structure their data so that they could run a query that returns the two types of log output where the value of trans-id is the same for all returned docs?

I do not know. Maybe someone else have suggestions.

@Christian_Dahlqvist Is there anyone in your team who can shed additional light on this?

I do not work for Elastic so can not forward within their organisation.

I have a hard time understanding from the question what the desired output is to be honest. So you say the "trans-id" should be the same for all returned docs? But what if there are multiple different trans-id that are appearing among multiple documents? Should they all be returned or just a random set of them?

So let's look at the following example documents (and for the sake of this example let's assume they all match the rest of the query, i.e. are log ERROR and have either of the messages in them you're filtering for):

ID trans-id
1 t1
2 t1
3 t2
4 t2
5 t3
6 null
7 null

Which of the above documents (by ID) would you expect should that query return?

Cheers,
Tim

@timroes There are multiple trans-ids. Let me show this a different way:

Each row below is a separate document:

message                 trans-id
--------------------------------
Order Complete          1000
In-Progress             1001
In-Progress             1002
On-Hold                 1001

What we want to do is search for documents containing the phrases "In-Progress" or "On-Hold" where there is a common trans-id. Does that clarify things for you?

Unfortunately does not clarify it for me.

But let's use you're example above and add a couple of more data to it:

message                 trans-id
--------------------------------
Order Complete          1000
In-Progress             1001
In-Progress             1002
On-Hold                 1001
On-Hold                 1002
Order-Complete          1003
On-Hold                 1003

If we have those documents, would you expect your search to return the last 6 documents (i.e. all that have at least ONE message with on-hold/in-progress and ANY message with different trans-id, or do the different trans-id documents ALL need to have an in-progress/on-hold message (so especially should those two documents with 1003 be returned or not)?

In either way there is no easy query language syntax for those, since to figure out how many documents are having the same trans-id you need to run an aggregation on that field. The result of an aggregation is never documents, but only specific "buckets", so you could only return buckets with specific values (like the trans-id), but not raw documents. Also to filter out buckets that have only one document in it (i.e. there are NOT multiple documents with the same trans-id) you'll need to use the bucket selector aggregation, which is currently not naitively supported in Kibana anywhere. So the only way you can achieve those results somehow would be the Vega visualization, which allows you to craft a raw query, or just using a raw query in general (e.g. in the Dev Tools in Kibana) to retrieve the result.

If you tell me if a specific information from those documents (like the trans-id) is enough for you and you don't need to see the full documents, I can help you craft an example how such a query could look.

Cheers,
Tim

Thanks @timroes. So what I would like is to find only trans-ids for which there are log lines with 'On-Hold' or 'In-Progress'. Which, based on your reply, sounds like some kind of aggregation may work.

The query that you'll need (and can send via the Dev Tools > Console in Kibana) looks around the following:

GET your_index/_search
{
  "size": 0,
  "query": {
    // .. You can copy that part from using the query you've got (without the trans-id in it)
    // and using the "inspector" in the Discover Menu and then Requests, to see the actual
    // request send, and copy the "query" part from there.
  },
  "aggs": {
    "trans_ids": {
      "terms": {
        "field": "trans_id",
        "size": 100
      },
      "aggs": {
        "filtered": {
          "bucket_selector": {
            "buckets_path": {
              "transCount": "_count"
            },
            "script": "params.transCount > 1"
          }
        }
      }
    }
  }
}

This should return you a response that contains under aggregations.trans_ids.buckets only the transaction ids, that match the criteria you described.

Since bucket_selector is currently not yet implemented in Kibana, you can't easily build a visualization based on that. The only way you currently have is using the Vega visualization with that query to put it into a visualization.

Cheers,
Tim

Thank you for this @timroes, very interesting :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.