Could I query all docs where there are multiple occurrences of my term?

tchung · January 7, 2019, 10:22pm

I want to use Elasticsearch to index event data and as events come in, I want to append to an "events" array in the document. Say it ends up looking like this:

{
  "user":"abc",
  "events": [
    {
      "action": "foo",
      "date": "2015-01-01"
    },
    {
      "action": "foo",
      "date": "2016-01-01"
    },
    {
      "action": "bar",
      "date": "2017-01-01"
    }
  ]
}

And another doc looks like:

{
  "user":"def",
  "events": [
    {
      "action": "foo",
      "date": "2017-01-01"
    },
    {
      "action": "baz",
      "date": "2018-01-01"
    }
  ]
}

I get that I can query/filter for all users/docs with action=foo but could I:

find all the documents where action=foo has occurred, say, exactly 2 times?
find all the documents where action=foo has occurred 2 times between some start and end date.

Thanks in advance.

Mark_Harwood · January 11, 2019, 11:08am

To avoid "cross matching" of the dates for foo and bar you'll need to use the nested data type for the events.

I've not yet figured out how to trim results to those with 2 different events but this should at least avoid confusing matches on the dates of your "foo" events with those of "bar" events::

DELETE test
PUT test
{
  "mappings": {
	"_doc": {
	  "properties": {
		"user":{
		  "type":"keyword"
		},
		"events": {
		  "type": "nested",
		  "properties": {
			"action": {
			  "type": "keyword"
			},
			"date":{
			  "type":"date"
			}
		  }
		}
	  }
	}
  }
}
PUT test/_doc/1
{
  "user":"abc",
  "events": [
	{
	  "action": "foo",
	  "date": "2015-01-01"
	},
	{
	  "action": "foo",
	  "date": "2016-01-01"
	},
	{
	  "action": "bar",
	  "date": "2017-01-01"
	}
  ]
}
PUT test/_doc/2
{
  "user":"abc",
  "events": [
	{
	  "action": "foo",
	  "date": "2015-01-01"
	},
	{
	  "action": "bar",
	  "date": "2016-01-01"
	},
	{
	  "action": "foo",
	  "date": "2017-01-01"
	}
  ]
}
GET test/_search
{
  "query": {
	"bool": {
	  "must": [
		{
		  "nested": {
			"path": "events",
			"query": {
			  "bool": {
				"must": [
				  {
					"match":{
					  "events.action":"foo"
					}
				  },
				  {
					"range": {
					  "events.date": {
						"gte": "2013-01-01",
						"lte": "2016-12-31"
					  }
					}
				  }
				]
			  }
			},
			"inner_hits": {}
		  }
		}
	  ]
	}
  },
  "stored_fields": [
	"user"
  ]
}

The use of inner_hits prevents you from seeing non-matching events. Your client would need to trim results that only matched one event,

Having said that, what can be hard to match at query time with searches/aggregations can be trivial at index time using an update script - see entity centric indexing
Useful attributes like an "outage" range field can be derived by a script recognising the 2 event types of interest and storing a derived value when those events come in as an update.

tchung · January 14, 2019, 7:39pm

Thanks @Mark_Harwood. I watched the video about entity centric indexing and it looks good and I will try that approach.

I think I can index individual events as they occur (the "doc" will be the event) into an "events" index. To find users that have done "foo exactly twice" (like in my original post), I can periodically update a secondary "user" index and a user doc, that has a "didFooTwice" boolean. Then a simple bool query can get me what I want.

system · February 11, 2019, 7:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Displaying documents in result multiple times with sorting Elasticsearch	2	462	February 23, 2018
Find documents with multiple occurrences within a time span Elasticsearch	3	767	March 7, 2018
Indexing and querying large text documents with multiple term occurences Elasticsearch	2	425	November 1, 2019
Output recurring events multiple times in a range query Elasticsearch	1	572	March 13, 2018
Elasticsearch query to search word occurred as n times in document Elasticsearch	1	466	May 9, 2018

Could I query all docs where there are multiple occurrences of my term?

Related topics