I want to have a filter that limits search results based on user input matching creators.name. Currently, I'm using the term filter. However, the standard analyzer only creates tokens for (to use the first example) "John" and "Doe", but NOT "John Doe", so my current method only works for "John" or "Doe", but not "John Doe". I considered using the terms filter instead and parsing the input based on whitespace, but then that would also match "John Smith", which is not ideal.
Is there a different way I should be querying, or do I need to use a different analyzer that would also treat "John Doe" as a token? And would these approaches properly address people with more than 2 names? Thanks.
there are a few options here, depending on what else you want to do with that field. First, if you require exact matching of the name field, the field should probably be index as "index" : "not_analyzed". That way you can use a 'term' query/filter and it will only match if it is exact. If, on top of that, you want some analysis like lowercasing but no tokenization you can roll your own analyzer, for example using the KeywordTokenizer and then add lowercasing on top.
If you also need the name in creator.name to be full-text searchable (e.g. find all Smith persons), you can index the same field in multiple ways, using the fields parameter when setting up the mappings.
Thanks for your reply. I still want the user to be able to provide just one of the names, so I wouldn't want to only have the exact match. I guess that's where the fields parameter could come in. Basically what would be ideal is for it to be tokenized as ["John","Doe","John Doe"] (or perhaps just the lowercased versions of those), but perhaps that's not possible without going under the hood to create my own analyzer/create multiple mappings.
Okay, so you could try indexing both an analyzed and a non-analyzed version of the name using multi-fields and then query on both fields. That should already rank exact matches higher in your results. You could also try boosting the not-analyzed field if that alone doesn't help.
I saw another example of this where the "match phrase" query was used. Perhaps this would work? The only problem is that this is part of a filter, rather than a query.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.