Hi all,
I am a newbie on ES and currently trying to setup the ES schema for an
existing real estate application currently using SQL for structured data
search.
I am seeking some advice/validation on my approach of modelling/indexing
the data which is originally stored in a relational DB. Search will mostly
be on structured data although keyword search will be added as well as a
secondary feature.
So here are my assumptions:
-
Since most of the queries are location-specific I am considering using sharding
with routing based on the "city" field. That way all searches for houses
e.g. in New York would retrieve data from a single shard leading to better
performance for the majority of the queries. For queries that are not
city-specific all shards would need to be queried of course. Still this is
better than routing based on the id of the houses that is totally random
(autoincrement). -
If it wasn't for my point above, we would be using parent-child to
model the relationship between a realtor and his houses. But I understand
this would mean a realtor and all his houses would need to reside on the
shame shard. So we are thinking of using a* flat model* where each listing
also holds the information of the realtor (especially searchable fields -
e.g. only return houses from realtors that are paying members). -
The challenge here would be that we would need to update all the houses
of a specific realtor every time some of his data are modified. Is there
an easy way to apply the same modification to all his indexed houses in one
go? Or do we have to retrieve all his houses from the relational DB and
then queue those so that they are reindexed?
Would be grateful for any comments on my assumptions 1 & 2 as well as any
hints on my quesion 3.
thanks in advance