Cannot search tags with multi_match phrase [ElasticSearch 5.1.2]


(Lorenzo) #1

Hi to everybody,

I have an index containing topics of a forum.

I match the phrase (specified by users) with multi_match in fields title, body and tags.
Users can also "search any words" or "search the phrase"; if use "search any words" all works fine.

If user choose "search the phrase" I use a multi_match type "phrase" but the search failed.

Into elasticsearch logs I found:

field "tags" was indexed without position data; cannot run PhraseQuery

When I had created index I had used this mapping:
"mappings" : {
"topics" : {
"date_detection" : false,
"dynamic_templates" : [
{
"generic_not_analyzed" : {
"match" : ".+",
"unmatch" : "(title)|(body)|(clean_body)",
"match_pattern" : "regex",
"mapping" : {
"index": "not_analyzed"
}
}
}
],
"properties" : {
"created": {"type": "date", "format": "dateTimeNoMillis"},
"lastupdate": {"type": "date", "format": "dateTimeNoMillis"},
"ts_author_oldest_post": {"type": "date", "format": "dateTimeNoMillis"},
"ts_sort_dashboard": {"type": "date", "format": "dateTimeNoMillis"},
"tags" : {"type" : "string", "index" : "not_analyzed"},
"gallery": {
"properties": {
"approved": {"type": "date", "format": "dateTimeNoMillis"},
"thumbs_update": {"type": "date", "format": "dateTimeNoMillis"}
}
},
"template": {
"properties": {
"approved": {"type": "date", "format": "dateTimeNoMillis"}
}
},
"posts" : {
"type": "nested",
"include_in_parent": true,
"properties": {
"submitted" : {"type": "date", "format": "dateTimeNoMillis"},
"body" : {"type": "string", "analyzer": "answ_html_default"}
}
},
"followers" : {
"type": "nested",
"include_in_parent": true,
"properties": {
"id": {"type": "integer"},
"level": {"type": "string"},
"name": {"type": "string"},
"read": {"type": "boolean"},
"date_read": {"type": "date", "format": "dateTimeNoMillis"},
"date_follow": {"type": "date", "format": "dateTimeNoMillis"}
}
}
}
}
}

Someone can help me?


(Mark Harwood) #2

I suspect it is something to do with the tags field being "not_analyzed" but I was unable to reproduce here. Can you share a problem query and maybe a doc with the irrelevant fields removed?


(Lorenzo) #3

I had tried to remove "not_analyzed", but when logs wrote:

Fielddata is disabled on text fields by default. Set fielddata=true on [tags] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.


(Mark Harwood) #4

Let's back up a little.
Do you really need to perform phrase queries on a tags field?
Would you ever need to search for docs where the order/proximity of tags in a doc are important?


(Lorenzo) #5

Tags in my documents are made by one or more words.

Example: "error", "generic", "generic error"

When user search "generic error" choosing "search the phrase", I want to find all douments with tags, title or body containing "generic error".

After your consideration, since I was already using a custom analyzer, I tried to use another custom analyzer with a keyword tokenizer only used when use choose "search the phrase".

But the results aren't as I expected, without errors.

I'm thinking my ways are wrong...

Have you got any idea?


(Mark Harwood) #6

And presumably if they search for "error" you also want to match docs tagged with "generic error"?


(Lorenzo) #7

Yes exactly


(Mark Harwood) #8

So we are talking about support for free-text search, analyzers, phrase queries etc on this tags field.
But also I expect you want to have a structured, "not analyzed" form of the tag for analytics e.g. bar charts in Kibana.
These 2 goals are achieved by defining the mapping of the tags field to support both operations using separate logical fields in the index. An example of search and analytics below:

DELETE test
PUT test
{
   "settings": {
	  "index": {
		 "number_of_shards": 1
	  }    
   },
   "mappings": {
	  "log": {
		 "properties": {
			"tags": {
			   "type": "keyword",
			   "fields":{
				   "4search":{
					   "type":"text"
				   }
			   }
			}
		 }
	  }
   }
}
POST test/log
{
	"tags":["generic error"]
}

GET test/log/_search
{
   "query": {
	  "multi_match": {
		 "query": " error",
		 "fields": ["tags.4search"]
	  }
   }
}
GET test/log/_search
{
   "query": {
	  "multi_match": {
		 "query": "generic error",
		 "fields": ["tags.4search"],
		 "type":"phrase"
	  }
   }
}
GET test/log/_search
{
   "aggs": {
	  "tagTypes": {
		 "terms": {
			"field": "tags"
		 }
	  }
   }
}

(Lorenzo) #9

Thank you very much!!!

I implement it immediately.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.