I am looking for synonym dictionaries of person names that I can use with
the Elasticsearch synonym analyser.
e.g. dictionaries that map "Ted" to "Edward", and "Bill" to "William".
I am curious to know what others are using.
So far I have found these two possible sources:
some noise/variation in the names recorded for each person
The input is of this form:
personID recorded_name
======= =============
1 Rob
1 Robert
1 Bob
2 Dave
2 David
2 Alice
...
The output is a weighted graph of name<->variant e.g Robert== Bob with a
strong confidence rating.
Using this I know not just real names but also typos e.g. that "Janes" is
more likely to be "James" than "Jane" (a common typo due to key locations
on keyboard).
On Thursday, January 29, 2015 at 5:28:33 AM UTC, David Kemp wrote:
I am looking for synonym dictionaries of person names that I can use with
the Elasticsearch synonym analyser.
e.g. dictionaries that map "Ted" to "Edward", and "Bill" to "William".
I am curious to know what others are using.
So far I have found these two possible sources:
some noise/variation in the names recorded for each person
The input is of this form:
personID recorded_name
======= =============
1 Rob
1 Robert
1 Bob
2 Dave
2 David
2 Alice
...
The output is a weighted graph of name<->variant e.g Robert== Bob with a
strong confidence rating.
Using this I know not just real names but also typos e.g. that "Janes" is
more likely to be "James" than "Jane" (a common typo due to key locations
on keyboard).
On Thursday, January 29, 2015 at 5:28:33 AM UTC, David Kemp wrote:
I am looking for synonym dictionaries of person names that I can use with
the Elasticsearch synonym analyser.
e.g. dictionaries that map "Ted" to "Edward", and "Bill" to "William".
I am curious to know what others are using.
So far I have found these two possible sources:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.