Cannot search tags with multi_match phrase [ElasticSearch 5.1.2]

Hi to everybody,

I have an index containing topics of a forum.

I match the phrase (specified by users) with multi_match in fields title, body and tags.
Users can also "search any words" or "search the phrase"; if use "search any words" all works fine.

If user choose "search the phrase" I use a multi_match type "phrase" but the search failed.

Into elasticsearch logs I found:

field "tags" was indexed without position data; cannot run PhraseQuery

When I had created index I had used this mapping:
"mappings" : {
"topics" : {
"date_detection" : false,
"dynamic_templates" : [
{
"generic_not_analyzed" : {
"match" : ".+",
"unmatch" : "(title)|(body)|(clean_body)",
"match_pattern" : "regex",
"mapping" : {
"index": "not_analyzed"
}
}
}
],
"properties" : {
"created": {"type": "date", "format": "dateTimeNoMillis"},
"lastupdate": {"type": "date", "format": "dateTimeNoMillis"},
"ts_author_oldest_post": {"type": "date", "format": "dateTimeNoMillis"},
"ts_sort_dashboard": {"type": "date", "format": "dateTimeNoMillis"},
"tags" : {"type" : "string", "index" : "not_analyzed"},
"gallery": {
"properties": {
"approved": {"type": "date", "format": "dateTimeNoMillis"},
"thumbs_update": {"type": "date", "format": "dateTimeNoMillis"}
}
},
"template": {
"properties": {
"approved": {"type": "date", "format": "dateTimeNoMillis"}
}
},
"posts" : {
"type": "nested",
"include_in_parent": true,
"properties": {
"submitted" : {"type": "date", "format": "dateTimeNoMillis"},
"body" : {"type": "string", "analyzer": "answ_html_default"}
}
},
"followers" : {
"type": "nested",
"include_in_parent": true,
"properties": {
"id": {"type": "integer"},
"level": {"type": "string"},
"name": {"type": "string"},
"read": {"type": "boolean"},
"date_read": {"type": "date", "format": "dateTimeNoMillis"},
"date_follow": {"type": "date", "format": "dateTimeNoMillis"}
}
}
}
}
}

Someone can help me?

I suspect it is something to do with the tags field being "not_analyzed" but I was unable to reproduce here. Can you share a problem query and maybe a doc with the irrelevant fields removed?

I had tried to remove "not_analyzed", but when logs wrote:

Fielddata is disabled on text fields by default. Set fielddata=true on [tags] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

Let's back up a little.
Do you really need to perform phrase queries on a tags field?
Would you ever need to search for docs where the order/proximity of tags in a doc are important?

Tags in my documents are made by one or more words.

Example: "error", "generic", "generic error"

When user search "generic error" choosing "search the phrase", I want to find all douments with tags, title or body containing "generic error".

After your consideration, since I was already using a custom analyzer, I tried to use another custom analyzer with a keyword tokenizer only used when use choose "search the phrase".

But the results aren't as I expected, without errors.

I'm thinking my ways are wrong...

Have you got any idea?

And presumably if they search for "error" you also want to match docs tagged with "generic error"?

Yes exactly

So we are talking about support for free-text search, analyzers, phrase queries etc on this tags field.
But also I expect you want to have a structured, "not analyzed" form of the tag for analytics e.g. bar charts in Kibana.
These 2 goals are achieved by defining the mapping of the tags field to support both operations using separate logical fields in the index. An example of search and analytics below:

DELETE test
PUT test
{
   "settings": {
	  "index": {
		 "number_of_shards": 1
	  }    
   },
   "mappings": {
	  "log": {
		 "properties": {
			"tags": {
			   "type": "keyword",
			   "fields":{
				   "4search":{
					   "type":"text"
				   }
			   }
			}
		 }
	  }
   }
}
POST test/log
{
	"tags":["generic error"]
}

GET test/log/_search
{
   "query": {
	  "multi_match": {
		 "query": " error",
		 "fields": ["tags.4search"]
	  }
   }
}
GET test/log/_search
{
   "query": {
	  "multi_match": {
		 "query": "generic error",
		 "fields": ["tags.4search"],
		 "type":"phrase"
	  }
   }
}
GET test/log/_search
{
   "aggs": {
	  "tagTypes": {
		 "terms": {
			"field": "tags"
		 }
	  }
   }
}
1 Like

Thank you very much!!!

I implement it immediately.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.