Hi,
I am building an analytics engine for documents. I am storing logs on each
hit.
I need a histogram of "hits" but also of "visits", the visits can be
deduced by a session_id.
So multiple hits have the same session_id.
The mappings looks like:
{
"docHit" : {
"properties" : {
"doc_id" : {"type" : "long", "index" : "not_analyzed"},
"section_id" : {"type" : "string", "index" : "not_analyzed"},
...
}
}
}
So I can return a histogram of "hits" for a particular document:
{
"query": {
"term": {
"doc_id": 444
}
},
"facets": {
"hits_per_day": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
}
}
}
}
But how do I do the same for "visits"?
If I wanted to get total visits for a document I could try:
{
"query": {
"term": {
"doc_id": 444
}
},
"facets": {
"visits": {
"terms": {
"field": "session_uid",
size: 0
}
}
}
}
Which would return:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0.0,
"hits": []
},
"facets": {
"visits": {
"_type": "terms",
"missing": 0,
"total": 6,
"other": 0,
"terms": [{
"term": "26A1473FFBF2CC5E3A8FCC9BF2240241",
"count": 4
}, {
"term": "3EC91387409740A1429676BB2A9CE02D",
"count": 2
}]
}
}
}
But I would need to return ALL terms and compute the length of
facets.visits.terms, which would be stupidly slow.
Is there a straight forward way to tackle this use case?
Thanks!
Alex