Search for availability of resources by dates


(Rick) #1

Hi,

I have a database with resources that have 2 different types of availability on certain days.

The data looks like this (only the relevant part):

{name: "resource_1", available_on: ["2015-12-12", "2015-12-13", "2015-12-15"], probably_available_on: ["2015-12-18", "2015-12-20", "2015-12-24"]}

I want to search for resources that are available on at least X% of days in a certain date range. The search should return hits ordered by most 'available_on' and 'probably_available_on' (the first one with a higher boost).

I managed to do that just with the 'available_on' days (Example for the range 2015-12-11 to 2015-12-15):

{ "query": { "bool": { "should": [ { "term": { "available_on": "2015-12-11" } }, { "term": { "available_on": "2015-12-12" } }, { "term": { "available_on": "2015-12-13" } }, { "term": { "available_on": "2015-12-14" } }, { "term": { "available_on": "2015-12-15" } } ], "minimum_should_match": "70%" } } }

  • Is this the correct way to do it or can that be archived with something like a range filter?
  • How do I add the 'probably_available_on' field with a lower boost?
  • Is it possible to retrieve the actual number of matching 'available_on' and 'probably_available_on' days for every hit with the search result?

Have a nice day,
Rick


(Mark Harwood) #2

Doesn't look like a bad way.

Here's the answer to your other questions on boosting etc:

POST test/doc
{
	"name":"mark",
	"availability":["monday","tuesday"],
	"possible_availability":["friday"]

}
GET test/doc/_search
{
   "query": {
	  "bool": {
		 "should": [
			{
			   "bool": {
				  "should": [
					 {
						"term": {
						   "possible_availability": "monday",
						   "_name": "monday"
						}
					 },
					 {
						"term": {
						   "availability": "monday",
						   "_name": "monday",
						   "boost":2
						}
					 }
				  ]
			   }
			},
			{
			   "bool": {
				  "should": [
					 {
						"term": {
						   "possible_availability": "tuesday",
						   "_name": "tuesday"
						}
					 },
					 {
						"term": {
						   "availability": "tuesday",
						   "_name": "tuesday",
						   "boost":2
						}
					 }
				  ]
			   }
			},            
			{
			   "bool": {
				  "should": [
					 {
						"term": {
						   "possible_availability": "wednesday",
						   "_name": "wednesday"
						}
					 },
					 {
						"term": {
						   "availability": "wednesday",
						   "_name": "wednesday",
						   "boost":2
						}
					 }
				  ]
			   }
			}                        
		 ],
		 "minimum_number_should_match": 2
	  }
   }
}

The usual default ranking heuristics of IDF (how rare a word is) are still in effect here so it would also make sense to wrap each term query in a constant_score query [1].

The use of "_name" in the query provides metadata that is echoed back in each hit to show what matched or not.
For fancier ranking you may want to look at the function_score [2]

Cheers
Mark

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html
[2] https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_using_function_score


(Rick) #3

Thanks a lot! That's exactly what I need.


(system) #4