Can I use SQL EXCEPT to compare fields present in two indexes?

Hello! I have two indexes that I'd like to compare:

  • winlogbeat-*
  • winlogbeat_rollup


winlogbeat_rollup is the rollup of winlogbeat-* and contains its historical data. What I'm looking to do is to compare these two indexes using the common field, "host.name". I'd like this comparison query to return the host.name values that are present in winlogbeat_rollup (historical data) but are not present in winlogbeat-*. The idea is to eventually use a time constraint on winlogbeat-* so that the query returns host names that I'm no longer receiving records for.


I figured the easiest way to do this was with the following SQL query:

POST _sql?format=txt
{
    "query" : """
    SELECT host.name.terms.value as Server FROM "winlogbeat_rollup"
    EXCEPT
    SELECT host.name as Server FROM "winlogbeat-*"
    """
}

However this returns the following parsing error:
line 4:3: mismatched input 'SELECT' expecting {<EOF>, ',', 'FULL', 'GROUP', 'HAVING', 'INNER', 'JOIN', 'LEFT', 'LIMIT', 'NATURAL', 'ORDER', 'PIVOT', 'RIGHT', 'WHERE', LIMIT_ESC}

Is the SQL Except supported in Kibana? If not, is there another way I can form a query that will provide me with the results I'm looking for?

It appears that EXCEPT is not supported. There is a limitations page in the ES SQL docs, but that doesn't call EXCEPT out specifically, but I did check the supported terms in the lexer code and EXCEPT is nowhere to be found.

I was going to suggest trying NOT EXISTS, LEFT JOIN, or NOT IN instead, but it appears those are not supported yet either. For example I attempted to run a query with NOT IN and got this error:

{
  "type" : "parsing_exception",
  "reason" : "line 3:10: IN query not supported yet"
}

I think we've discovered two problems:

  1. If EXCEPT is not supported, it should have a more descriptive error message like the one above
  2. The aforementioned "limitations" docs should have more info on what terms aren't supported

I've reached out to the team in charge of the ES SQL engine, I'll reply here when I hear back.

Here is a high-level example of a workaround to achieve an EXCEPT-like result using rollups:

PUT rollup1/_doc/1
{
  "rollup": true,
  "host": {
    "name": "host1"
  }
}
PUT rollup2/_doc/1
{
  "rollup": true,
  "host": {
    "name": "host2"
  }
}
PUT raw1/_doc/1
{
  "host": {
    "name": "host1"
  }
}
PUT raw2/_doc/2
{
  "host": {
    "name": "host3"
  }
}
POST /_sql?format=txt&pretty
{
  "query": """
  SELECT host.name FROM (
    SELECT
      host.name,
      MAX(CASE WHEN rollup THEN 1 ELSE 0 END) in_rollup,
      MAX(CASE WHEN NOT NVL(rollup, false) THEN 1 ELSE 0 END) in_original
    FROM "rollup*,raw*"
    GROUP BY host.name
  )
  WHERE in_rollup = 1 AND in_original != 1
  """
}

It results in the following:

   host.name   
---------------
host2      

Notice the "rollup": true field in the rollup indexes that makes it possible to figure out in the query where is the hostname coming from (the _index meta field is not supported).


One extra gotcha for your particular use case: You have host.name.terms.value (rollup index) and host.name (logbeat index). The index mapping in this case won’t be compatible ( host.name is an object in one vs a keyword/text in the other). One way to solve this would be with multiple queries and re-alias, but since we cannot do join or any set operations between subqueries the only alternative I see is runtime fields. Remapping host.name.terms.value and host.name to the host_name runtime field in both indexes.

I hope that helps!

That makes sense. I've had some issues though when I've attempted mapping the runtime values. I followed the examples found here. These were my first attempts:

PUT temptest-serverlist
{
  "mappings": {
    "dynamic": "runtime",
    "properties": {
      "host_name": {
        "type": "keyword",
        "script": {
            "source": "doc['host.name'].value"
        }
      }
    }
  }
}

PUT temptest-serverlist
{
  "mappings": {
    "runtime": {
      "host_name": {
        "type": "keyword",
        "script": {
            "source": "doc['host.name'].value"
        }
      }
    }
  }
}

These both gives "index already exists" error as I'm trying to map a runtime field in an existing index and this structure seems to only be for only new indexes. I wasn't able to find a concrete answer within the documentation to map a runtime field within an existing index but this is what I tried:

PUT /temptest-serverlist/_mappings
{
  "properties": {
    "host_name": {
      "type": "keyword",
      "script":{
        "source": "doc['host.name'].value"
      }
    }
  }
}

This returns an error as "script" is not a supported parameter of "properties." I tried "runtime" in place of it but that's not a supported parameter of _mappings. Do you know how to add a runtime field to an existing index?

Thanks for the help you've already provided!

Sure, you were really close, you'll want to do this:

PUT /temptest-serverlist/_mapping
{
  "runtime": {
    "host_name": {
      "type": "keyword",
      "script":{
        "source": "emit(doc['host.name'].value)"
      }
    }
  }
}
  1. You're not changing the mapping's properties field, you're changing the runtime field
  2. You need to use an emit in your painless script (source)
  3. You don't need to use dynamic mapping, but you can if you want to

You're welcome!