New to search stuff doing some research looking at either ES or Solr.
Lets say I want to create an address database to verify addresses with
street name, city, zipcode etc. worldwide.
Each of these fields needs to be searchable. For example you can
search on zipcode to bring up all cities, or vice versa.
I also want to separate the countries so I dont need to search through
the US database when looking up UK address. This concept is easy to
grasp in mySQL etc. By defining country=US ... but in ES will US be
one index? Or will all the streets, city, zipcode + US be one index
each?
How many indexes do you need and how do you count this? And can you
separate databases like this to avoid unnecessary searching?
If the database before import has 50 million rows, how many records
will this end up in ES if each row for example contains street name,
city, zipcode (in theory)?
Do you think ES could search 50 million records/rows under 0.5
seconds? Maybe in a cloud environment? Or do you need a really big
cluster for this?
You are not in the SQL world anymore. Just forget what you know about SQL searches.
That say, you can now ask yourself : why should I separate countries ?
If country is a field of your address document, just add country=US when you search for US address.
If you have 50m rows that turns into 50m documents, you will have 50m docs in ES.
You can easily create on your laptop 1m or 2m docs and see how much it takes (HD size)...
New to search stuff doing some research looking at either ES or Solr.
Lets say I want to create an address database to verify addresses with
street name, city, zipcode etc. worldwide.
Each of these fields needs to be searchable. For example you can
search on zipcode to bring up all cities, or vice versa.
I also want to separate the countries so I dont need to search through
the US database when looking up UK address. This concept is easy to
grasp in mySQL etc. By defining country=US ... but in ES will US be
one index? Or will all the streets, city, zipcode + US be one index
each?
How many indexes do you need and how do you count this? And can you
separate databases like this to avoid unnecessary searching?
If the database before import has 50 million rows, how many records
will this end up in ES if each row for example contains street name,
city, zipcode (in theory)?
Do you think ES could search 50 million records/rows under 0.5
seconds? Maybe in a cloud environment? Or do you need a really big
cluster for this?
My initial though was that separating countries would make search
faster.
Considering USA is really huge, like 40 million records. And a smaller
country can be at most 1 million records. So searching through the non-
US list would be faster than having to use the same database.
But maybe it does not work this way. And having everything in the same
database wont slow it down?
The address verification is always country specific.
You are not in the SQL world anymore. Just forget what you know about SQL searches.
That say, you can now ask yourself : why should I separate countries ?
If country is a field of your address document, just add country=US when you search for US address.
If you have 50m rows that turns into 50m documents, you will have 50m docs in ES.
You can easily create on your laptop 1m or 2m docs and see how much it takes (HD size)...
New to search stuff doing some research looking at either ES or Solr.
Lets say I want to create an address database to verify addresses with
street name, city, zipcode etc. worldwide.
Each of these fields needs to be searchable. For example you can
search on zipcode to bring up all cities, or vice versa.
I also want to separate the countries so I dont need to search through
the US database when looking up UK address. This concept is easy to
grasp in mySQL etc. By defining country=US ... but in ES will US be
one index? Or will all the streets, city, zipcode + US be one index
each?
How many indexes do you need and how do you count this? And can you
separate databases like this to avoid unnecessary searching?
If the database before import has 50 million rows, how many records
will this end up in ES if each row for example contains street name,
city, zipcode (in theory)?
Do you think ES could search 50 million records/rows under 0.5
seconds? Maybe in a cloud environment? Or do you need a really big
cluster for this?
You can definitely separate to an index per country, it will be faster to
search. But, it won't be by much if you use filters to filter the country,
thanks to how filters work and the fact that they are nicely cached. Its
really up to you.
My initial though was that separating countries would make search
faster.
Considering USA is really huge, like 40 million records. And a smaller
country can be at most 1 million records. So searching through the non-
US list would be faster than having to use the same database.
But maybe it does not work this way. And having everything in the same
database wont slow it down?
The address verification is always country specific.
You are not in the SQL world anymore. Just forget what you know about
SQL searches.
That say, you can now ask yourself : why should I separate countries ?
If country is a field of your address document, just add country=US when
you search for US address.
If you have 50m rows that turns into 50m documents, you will have 50m
docs in ES.
You can easily create on your laptop 1m or 2m docs and see how much it
takes (HD size)...
New to search stuff doing some research looking at either ES or Solr.
Lets say I want to create an address database to verify addresses with
street name, city, zipcode etc. worldwide.
Each of these fields needs to be searchable. For example you can
search on zipcode to bring up all cities, or vice versa.
I also want to separate the countries so I dont need to search through
the US database when looking up UK address. This concept is easy to
grasp in mySQL etc. By defining country=US ... but in ES will US be
one index? Or will all the streets, city, zipcode + US be one index
each?
How many indexes do you need and how do you count this? And can you
separate databases like this to avoid unnecessary searching?
If the database before import has 50 million rows, how many records
will this end up in ES if each row for example contains street name,
city, zipcode (in theory)?
Do you think ES could search 50 million records/rows under 0.5
seconds? Maybe in a cloud environment? Or do you need a really big
cluster for this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.