Multiple databases / tables (of similar schema) for one Elasticsearch index?


#1

We want to create integrated search of similar databases / tables with similar fields and schema, e.g. United States real estate listings search of multiple real estate properties databases such as these:

  1. http://n3.datasn.io/data/api/v1/n3_chennan/real_estate_houses/main/list/?app=html-bunker&manifest=row
  2. http://n3.datasn.io/data/api/v1/n3a2/real_estate_homes_for_sale/main/list/?app=html-bunker&manifest=row
  3. http://n3.datasn.io/data/api/v1/n3_chennan/real_estate_agents_in_us/main/list/?app=html-bunker&manifest=row

Each of the 3 data sets above is a MySQL database query, joined by multiple tables.

You can see each of them has similar fields but also different fields. We are not familiar with Elasticsearch yet and we are thinking about switching to it but we are not sure if it can do this? Integrated search of multiple databases with similar fields?


(Mark Walkom) #2

That should be ok, you do want to reduce the amount of differences to the minimal.

Otherwise just put them in their own index.


#3

Thanks for the reply.

If they are in their own index, can they be all searched at once?

I mean, to the user, there's only one search interface. If I have 1000 such indices as we might very well reach 1000 databases, could this be an issue?


(Mark Walkom) #4

Yes, but then you can search multiple indices at once as well.

Potentially, primarily because you would have a lot of shards and that takes resources to maintain.


#5

Thanks for the tips. However, the schemas of all the databases are all different, with different tables and fields, is it still possible in this regard?

So with one search, for instance, a search of "python" and salary range from "100,000" to "150,000" against 1000 job openings databases, we want it to search through ALL the fields and ALL the tables of ALL the databases by the keyword "python" but only through salary from and to fields by the provided salary range "100,000" to "150,000", and return results from ALL the 1000 databases that satisfy both criteria.

Is this possible?