How to do such select in Kibana (or ES)?

I'd like to select all companies that are new in 2018 (were not selling in 2017). In SQL I would to following:

SELECT * FROM table WHERE order_date BETWEEN '2018-01-01' AND '2018-12-31' AND company NOT IN (SELECT company from tabela3 where order_date BETWEEN '2017-01-01' AND '2017-12-31')

I know that we have no 'IN' operator in ES but maybe there's another way to do that?
Regards,
G.

In order to select the companies in Kibana you would need to filter company documents based on their own properties, you can't use documents in another index or query like you can in SQL.

If you have a ton of order documents then maybe you could update your company documents with a "2017_orders", "2018_orders", "2019_order" counter that you increment each time they place an order. Then you could filter on companies with 2017_orders:0 AND 2018_orders:>0

If you don't have a ton of order documents then you could include a nested document for each order in the company documents. This will give you more flexibility in querying, like if you want to filter by arbitrary time ranges:

# create the index
PUT /customers
{
  "mappings": {
    "_doc": {
      "properties": {
        "customer_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "orders": {
          "type": "nested",
          "properties": {
            "date": {
              "type": "date"
            },
            "amount_cents": {
              "type": "integer"
            }
          }
        }
      }
    }
  }
}

# index some test data
POST /customers/_doc
{
  "customer_name": "foo",
  "orders": [
    { "date": "2018-02-01" }
  ]
}
POST /customers/_doc
{
  "customer_name": "bar",
  "orders": [
    { "date": "2017-12-01" }
  ]
}

# query to find the customer names
POST /customers/_search
{
  "size": 0,
  "query": {
    "nested": {
      "path": "orders",
      "query": {
        "bool": {
          "must": [
            {
              "range": {
                "orders.date": {
                  "gte": "2018-01-01",
                  "lte": "2018-12-31"
                }
              }
            }
          ],
          "must_not": [
            {
              "range": {
                "orders.date": {
                  "gte": "2017-01-01",
                  "lte": "2017-12-31"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "company_names": {
      "terms": {
        "field": "customer_name.keyword",
        "size": 10
      }
    }
  }
}

From there you would have to use a custom filter by clicking the "Edit Query DSL" button inside a new filter and paste the "nested" bit from the query in there:

image

Thank you very much for this suggestion. Maybe I'm wrong but both queries will not give me what I want.

  • If I mark documents with year marker then doing select I will just receive orders from 2017 or from 2018 (I will not know if the company that placed order in 2017 is new or not).
  • looks like second query is choosing documents with order date 2018-01-01, 2018-12-31 and not 2017-01-01, 2017-12-31. But we do not know if those companies with order date 2018-X-X are new or not.
    But maybe I'm taking ES nested query in wrong way.

I believe you are right Gosforth. A couple things come to mind.

  1. Create an entity index for each company. You could have a nested field for each year, and would update the year field (ex. 2018) with the latest order. Then you can search and filter and only the company documents that match your year queries would come back. This would be a very small index.

  2. Write for example a python query to get a list of companies from year 1 and put in an array. Then query companies from year 2 into another array. Then loop through year 1 to see if not in year 2 array.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.