Hi,
First what I am trying to achieve is I have a list of all the valid cities
name and a list of contacts with a city field amongst others, I need to
validate that every cities in the
contact are either valid or run an approximate search to find the closest
city matching the name in the contact.
Now what I would do in a relational database is dump the cities in one
table, the contacts in another and join them to find all the contacts with
an invalid city but in such database I would not be able to run approximate
searchs
with tools like fuzzy queries provided by elasticsearch, so my question is
can I do this in elasticsearch and how ?
There is a lot of cities so including them in the query would result in a
really big query and since I will need to do the same things for other
attributes where the references include millions of values I hope there is
another way xD
From what I understand there is no way to really "join" two indices but how
would you do that if you had to ?
I have no constraints so I can change my schema as needed if it helps
achieving my goal.
What you can do to implement this is to create to indices. One
containing your contacts and one containing the city names.
When you need to validate a contact's city name or you just to run an
approximate city name search you can use the fuzzy like this query:
Just put the city name in your "like_text" parameter and execute the
query on the index that contains the city names.
Warning: fuzzy queries in general are one of the more expensive
queries to execute in the Lucene version ES uses at the moment (3.6).
How expensive is difficult to say. Best way is just to try it out in
your in your environment. Create an index that contains all unique
city names
and run some queries. The upcoming Lucene 4 can execute fuzzy queries
much faster compared to Lucene 3.x.
Hi,
First what I am trying to achieve is I have a list of all the valid cities
name and a list of contacts with a city field amongst others, I need to
validate that every cities in the
contact are either valid or run an approximate search to find the closest
city matching the name in the contact.
Now what I would do in a relational database is dump the cities in one
table, the contacts in another and join them to find all the contacts with
an invalid city but in such database I would not be able to run approximate
searchs
with tools like fuzzy queries provided by elasticsearch, so my question is
can I do this in elasticsearch and how ?
There is a lot of cities so including them in the query would result in a
really big query and since I will need to do the same things for other
attributes where the references include millions of values I hope there is
another way xD
From what I understand there is no way to really "join" two indices but how
would you do that if you had to ?
I have no constraints so I can change my schema as needed if it helps
achieving my goal.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.