Advice on schema - data modelling

Hi all,

I am a newbie on ES and currently trying to setup the ES schema for an
existing real estate application currently using SQL for structured data
search.

I am seeking some advice/validation on my approach of modelling/indexing
the data which is originally stored in a relational DB. Search will mostly
be on structured data although keyword search will be added as well as a
secondary feature.

So here are my assumptions:

  1. Since most of the queries are location-specific I am considering using sharding
    with routing based on the "city" field
    . That way all searches for houses
    e.g. in New York would retrieve data from a single shard leading to better
    performance for the majority of the queries. For queries that are not
    city-specific all shards would need to be queried of course. Still this is
    better than routing based on the id of the houses that is totally random
    (autoincrement).

  2. If it wasn't for my point above, we would be using parent-child to
    model the relationship between a realtor and his houses. But I understand
    this would mean a realtor and all his houses would need to reside on the
    shame shard. So we are thinking of using a* flat model* where each listing
    also holds the information of the realtor (especially searchable fields -
    e.g. only return houses from realtors that are paying members).

  3. The challenge here would be that we would need to update all the houses
    of a specific realtor
    every time some of his data are modified. Is there
    an easy way to apply the same modification to all his indexed houses in one
    go? Or do we have to retrieve all his houses from the relational DB and
    then queue those so that they are reindexed?

Would be grateful for any comments on my assumptions 1 & 2 as well as any
hints on my quesion 3.

thanks in advance