Hello everyone!
So I'm having trouble with a query that I have working on SQL (Impala) but can't get to work on ElasticSearch.
On Impala I have the following table:
trx_date | customer_id | commerce_id |
---|---|---|
20181201 | 1 | 123 |
20181202 | 2 | 123 |
20181203 | 3 | 123 |
20181215 | 1 | 123 |
20181216 | 2 | 123 |
20181217 | 3 | 123 |
20181225 | 1 | 123 |
20181226 | 4 | 123 |
20181227 | 5 | 123 |
The querys that work on Impala are the following:
The first one will count the number of unique customers that bought something between a certain date (2018-12-01 and 2018-12-24):
SELECT COUNT(DISTINCT consumer_id) FROM table
WHERE trx_date BETWEEN 20181201 AND 20181224
AND commerce_id = 123 ;
Result would be:
3 (customers: 1, 2, 3)
Then I can query for the new unique users between a new date (2018-12-25 and 2018-12-31) and compare them to the old users (query above):
WITH tmp_01 AS (
SELECT DISTINCT consumer_id FROM
table
WHERE trx_date BETWEEN 20181201 AND 20181224
AND commerce_id = 123
), tmp_25 AS (
SELECT DISTINCT consumer_id FROM
table
WHERE trx_date BETWEEN 20181225 AND 20181231
AND commerce_id = 123
)
SELECT
count(*)
FROM tmp_25
LEFT JOIN tmp_01 ON tmp_25.consumer_id = tmp_01.consumer_id
WHERE tmp_01.consumer_id IS NULL ;
Result would be:
2 (customers: 4, 5)
Now onto ElasticSearch!
I have the following documents:
{
"trx_date": "December 1st 2018, 12:30:25.000",
"consumer_id": "1",
"commerce_id": 123
},
{
"trx_date": "December 2nd 2018, 12:30:25.000",
"consumer_id": "2",
"commerce_id": 123
},
{
"trx_date": "December 3rd 2018, 12:30:25.000",
"consumer_id": "3",
"commerce_id": 123
},
{
"trx_date": "December 15th 2018, 12:30:25.000",
"consumer_id": "1",
"commerce_id": 123
},
{
"trx_date": "December 16th 2018, 12:30:25.000",
"consumer_id": "2",
"commerce_id": 123
},
{
"trx_date": "December 17th 2018, 12:30:25.000",
"consumer_id": "3",
"commerce_id": 123
},
{
"trx_date": "December 25th 2018, 12:30:25.000",
"consumer_id": "1",
"commerce_id": 123
},
{
"trx_date": "December 26th 2018, 12:30:25.000",
"consumer_id": "4",
"commerce_id": 123
},
{
"trx_date": "December 27th 2018, 12:30:25.000",
"consumer_id": "5",
"commerce_id": 123
}
And I would like to have the same two results that I posted above.
The query I have so far is:
{
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"terms": {
"field": "time.keyword",
"size": 10,
"order": {
"_term": "asc"
}
},
"aggs": {
"3": {
"date_range": {
"field": "trx_date",
"ranges": [
{
"from": "2018-12-25T00:00:00.000",
"to": "2018-12-31T23:59:59.999"
}
]
},
"aggs": {
"1": {
"cardinality": {
"field": "consumer_id"
}
}
}
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
"created",
"date"
],
"query": {
"bool": {
"must": [
{
"match_all": {}
},
{
"match_phrase": {
"pos_id": {
"commerce_id": "15017353"
}
}
},
{
"range": {
"trx_date": {
"gte": 1543640400000,
"lte": 1545713999999,
"format": "epoch_millis"
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
}
}