The term(s) filter and the standard analyzer

jpblair · April 19, 2016, 2:50pm

I have a document type that contains an array of objects about a person, one of which is their name. For example:

"creators":[
	{
		"id":"john.doe",
		"name":"John Doe"	
	},
	{
		"id":"jane.smith",
		"name":"Jane Smith"
	}
]

I want to have a filter that limits search results based on user input matching creators.name. Currently, I'm using the term filter. However, the standard analyzer only creates tokens for (to use the first example) "John" and "Doe", but NOT "John Doe", so my current method only works for "John" or "Doe", but not "John Doe". I considered using the terms filter instead and parsing the input based on whitespace, but then that would also match "John Smith", which is not ideal.

Is there a different way I should be querying, or do I need to use a different analyzer that would also treat "John Doe" as a token? And would these approaches properly address people with more than 2 names? Thanks.

cbuescher · April 19, 2016, 3:55pm

Hi,

there are a few options here, depending on what else you want to do with that field. First, if you require exact matching of the name field, the field should probably be index as "index" : "not_analyzed". That way you can use a 'term' query/filter and it will only match if it is exact. If, on top of that, you want some analysis like lowercasing but no tokenization you can roll your own analyzer, for example using the KeywordTokenizer and then add lowercasing on top.
If you also need the name in creator.name to be full-text searchable (e.g. find all Smith persons), you can index the same field in multiple ways, using the fields parameter when setting up the mappings.

Hope this helps.

jpblair · April 19, 2016, 5:38pm

Thanks for your reply. I still want the user to be able to provide just one of the names, so I wouldn't want to only have the exact match. I guess that's where the fields parameter could come in. Basically what would be ideal is for it to be tokenized as ["John","Doe","John Doe"] (or perhaps just the lowercased versions of those), but perhaps that's not possible without going under the hood to create my own analyzer/create multiple mappings.

cbuescher · April 20, 2016, 8:27am

Okay, so you could try indexing both an analyzed and a non-analyzed version of the name using multi-fields and then query on both fields. That should already rank exact matches higher in your results. You could also try boosting the not-analyzed field if that alone doesn't help.

jpblair · April 25, 2016, 9:22pm

I saw another example of this where the "match phrase" query was used. Perhaps this would work? The only problem is that this is part of a filter, rather than a query.

Topic		Replies	Views
Search and filter with the lists Elasticsearch	5	805	February 20, 2017
Query on multiple fields Elasticsearch	7	612	July 6, 2017
Best way to index human names? Elasticsearch	1	425	October 6, 2021
Precise field matching without defining a not_analyzed extra field possible? Elasticsearch	7	418	July 6, 2017
What search to use? Elasticsearch	7	753	July 5, 2017

The term(s) filter and the standard analyzer

Related topics