Find documents with multiple occurrences within a time span

adammichalik · February 6, 2018, 10:16am

I have documents with a following structure:

{
  "timestamp": "2018-01-01T12:34:56.000Z",
  "message": "Some message",
  "idNumber": 12345678
}

The message field holds various messages, there are multiple messages per idNumber.
I would like to find the idNumbers for which there are two or more documents with message equal to "Hello" within one hour. How can I achieve that (if it's possible at all)?

Mark_Harwood · February 6, 2018, 4:29pm

This feels a little hacky but works on a small set of data:

PUT test
{
  "settings": {
	"number_of_replicas": 0,
	"number_of_shards": 1
  },
  "mappings": {
	"doc":{
	  "properties": {
		"date":{
		  "type":"date",
		  "format":"yyyy-MM-dd HH:mm"
		},
		"id":{
		  "type":"keyword"
		},
		"message":{
		  "type":"text"
		}
	  }
	}
  }
}
POST test/doc/_bulk
{"index":{}}
{"date":"2018-01-01 01:01", "id":"bad","message":"hello"}
{"index":{}}
{"date":"2018-01-01 01:02", "id":"bad","message":"hello"}
{"index":{}}
{"date":"2018-01-01 01:02", "id":"good","message":"hello"}
{"index":{}}
{"date":"2018-01-01 03:02", "id":"good","message":"hello"}



GET test/_search?q=hello
{
  "size": 0,
  "aggs": {
	"terms": {
	  "terms": {
		"min_doc_count": 2,
		"shard_min_doc_count": 1,
		"script": {
		  "source": """
 doc['date'].value.getHourOfDay().toString() + "-"+
 doc['date'].value.getDayOfMonth().toString()+ "-"+ 
 doc['date'].value.getMonthOfYear().toString()+ "-"+
 doc['date'].value.getYear().toString()+ "-"+
 doc["id"].value
"""
		}
	  }
	}
  }
}

I'm sure there's a better way of formatting the date but essentially I combine the ID and a string representation of an hour-level bucket. Obviously this doesn't spot 2 events close to an hour boundary e.g. 12:59 and 13:01 but might be good enough. If you have many unique IDs and they are spread across multiple shards then this will not scale and you need to think about indexing approaches that bring related data closer together.

adammichalik · February 7, 2018, 1:37pm

Thanks, Mark. This is for reporting purposes, so speed is not so much of a requirement and the results can be approximate as long as I can get a good sample of occurrences. I'll try that out.

system · March 7, 2018, 1:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Could I query all docs where there are multiple occurrences of my term? Elasticsearch	3	1641	February 11, 2019
Time-based searches near a certain timestamp Elasticsearch	1	287	July 6, 2017
Query term aggregation to count occurence Elasticsearch	1	416	July 5, 2017
Multiple documents same id find difference Elasticsearch	2	410	April 12, 2018
Performing between operation on documents in Elasticsearch Elasticsearch	2	592	July 5, 2017

Find documents with multiple occurrences within a time span

Related topics