Best way to index human names?

rdate · September 8, 2021, 12:30am

Hello! I am building an index that includes personal names and after a bunch of searching around I haven't been able to find the best way to index these names to allow for the large amount of variation in name construction.

For example, take the name "J. R. R. Tolkien". For building the index field, I am using a text field with a custom analyzer that filters out any periods, lowercases, and tokenizes on whitespace. If the source text is "J. R. R. Tolkien" I end up with the tokens ["j", "r", "r", "tolkien"] in my index. However, the user input query text realistically could be (ignoring case) "j.r.r. tolkien", "j. r. r. tolkien", "jrr tolkien", "j r r tolkien" or "tolkien", both with or without spaces and punctuation. I have found that given the tokens mentioned before, I get poor results from a match query when the query has spaces are omitted (i.e. "jrr tolkien") given that each initial is a separate token in the index. It seems like I would want both versions on my name field somehow?

What is the best way to handle cases like this? Do I need to create a separate field for each variation? I do have some millions of documents, so it's also not realistic to create each variation by hand.

system · October 6, 2021, 12:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The term(s) filter and the standard analyzer Elasticsearch	5	851	July 5, 2017
How to create a setup for searching for people Elasticsearch	3	573	July 5, 2017
Prefix Query for People's names/Lookup Elasticsearch	2	1107	July 5, 2017
Human names searching - how to improve results Elasticsearch	8	7402	July 1, 2020
Index and search accented text Elasticsearch	1	575	August 14, 2017

Best way to index human names?

Related topics