How to get a count of new users comparing two date ranges

Hello everyone!

So I'm having trouble with a query that I have working on SQL (Impala) but can't get to work on ElasticSearch.

On Impala I have the following table:

trx_date customer_id commerce_id
20181201 1 123
20181202 2 123
20181203 3 123
20181215 1 123
20181216 2 123
20181217 3 123
20181225 1 123
20181226 4 123
20181227 5 123

The querys that work on Impala are the following:

The first one will count the number of unique customers that bought something between a certain date (2018-12-01 and 2018-12-24):

SELECT COUNT(DISTINCT consumer_id) FROM table
WHERE trx_date BETWEEN 20181201 AND 20181224
AND commerce_id = 123 ;

Result would be:

3 (customers: 1, 2, 3)

Then I can query for the new unique users between a new date (2018-12-25 and 2018-12-31) and compare them to the old users (query above):

WITH tmp_01 AS (
    SELECT DISTINCT consumer_id FROM             
    table
    WHERE trx_date BETWEEN 20181201 AND 20181224
    AND commerce_id = 123
), tmp_25 AS (
    SELECT DISTINCT consumer_id FROM 
    table
    WHERE trx_date BETWEEN 20181225 AND 20181231
    AND commerce_id = 123
)
SELECT
    count(*)
FROM tmp_25
LEFT JOIN tmp_01 ON tmp_25.consumer_id = tmp_01.consumer_id
WHERE tmp_01.consumer_id IS NULL ;

Result would be:

2 (customers: 4, 5)

Now onto ElasticSearch!

I have the following documents:

{
    "trx_date": "December 1st 2018, 12:30:25.000",
    "consumer_id": "1",
    "commerce_id": 123
},
{
    "trx_date": "December 2nd 2018, 12:30:25.000",
    "consumer_id": "2",
    "commerce_id": 123
},
{
    "trx_date": "December 3rd 2018, 12:30:25.000",
    "consumer_id": "3",
    "commerce_id": 123
},
{
    "trx_date": "December 15th 2018, 12:30:25.000",
    "consumer_id": "1",
    "commerce_id": 123
},
{
    "trx_date": "December 16th 2018, 12:30:25.000",
    "consumer_id": "2",
    "commerce_id": 123
},
{
    "trx_date": "December 17th 2018, 12:30:25.000",
    "consumer_id": "3",
    "commerce_id": 123
},
{
    "trx_date": "December 25th 2018, 12:30:25.000",
    "consumer_id": "1",
    "commerce_id": 123
},
{
    "trx_date": "December 26th 2018, 12:30:25.000",
    "consumer_id": "4",
    "commerce_id": 123
},
{
    "trx_date": "December 27th 2018, 12:30:25.000",
    "consumer_id": "5",
    "commerce_id": 123
}

And I would like to have the same two results that I posted above.

The query I have so far is:

{
  "size": 0,
  "_source": {
    "excludes": []
  },
  "aggs": {
    "2": {
      "terms": {
        "field": "time.keyword",
        "size": 10,
        "order": {
          "_term": "asc"
        }
      },
      "aggs": {
        "3": {
          "date_range": {
            "field": "trx_date",
            "ranges": [
              {
                "from": "2018-12-25T00:00:00.000",
                "to": "2018-12-31T23:59:59.999"
              }
            ]
          },
          "aggs": {
            "1": {
              "cardinality": {
                "field": "consumer_id"
              }
            }
          }
        }
      }
    }
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    "created",
    "date"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        },
        {
          "match_phrase": {
            "pos_id": {
              "commerce_id": "15017353"
            }
          }
        },
        {
          "range": {
            "trx_date": {
              "gte": 1543640400000,
              "lte": 1545713999999,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "filter": [],
      "should": [],
      "must_not": []
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.