Hey,
We have an Elasticsearch cluster in production with 20 nodes which hold a
few TBs of data and loading millions of documents a day.
We use Elasticsearch for analytics purposes and the main thing we're
interested is counting unique users.
We started the production search with Elasticsearch 0.9.X, when there was
no cardinality aggregation, therefor we were bound to create the document
structure as seen below.
Most of our queries are looking for the unique users count based on a date
range and specific segments.
Some of our analytic UI screens require executing hundreds of queries in
parallel and one even requires thousands of queries.
When migrating to V1.4, we hoped to start using the aggregation feature,
but even with the doc_values enabled, we experience aggregation time of
minutes...
We're running on c3.8xlarge EC2 instances with 60GB RAM, of which 30GB are
allocated to ES heap.
We have 6 indexes with 2 replicas each, each index has 20 shards.
Each aggregation/query is performed against a single index (see aggregation
example below).
Has anyone dealt with such use cases?
Thanks!
Document structure :
{
"user": {
"_ttl": {
"enabled": true
},
"properties": {
"events": {
"type": "nested",
"properties": {
"event_time": {
"type": "date",
"format": "dateOptionalTime",
"doc_values" : true
},
"segments": {
"properties": {
"segment": {
"type": "string",
"index": "not_analyzed",
"doc_values" : true
}
}
}
}
}
}
}
}
For example :
{
"_index": "...",
"_type": "...",
"_id": "...",
"_version": 1,
"_score": 1,
"_source": {
"events": [
{
"event_time": "2014-11-03",
"segments": [
{
"segment": "ALICE"
},
{
"segment": "BOB"
}
]
},
{
"event_time": "2014-11-04",
"segments": [
{
"segment": "RON"
},
{
"segment": "YULA"
}
]
}
]
}
}
Aggegation example :
{
"size": 0,
"query": {
"nested": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"events.event_time": {
"from": "2014-11-17",
"to": "2014-11-24",
"include_lower": true,
"include_upper": true
}
}
}
]
}
}
}
},
"path": "events"
}
},
"aggregations": {
"nested": {
"nested": {
"path": "events"
},
"aggregations": {
"segments": {
"terms": {
"field": "events.segments.segment",
"size": 0
},
"aggregations": {
"uu": {
"reverse_nested": {}
}
}
}
}
}
}
}
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/acbc3022-8845-4170-999d-d0b2bc9dfeb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.