Good practice in distributing data in indexes

Hello everyone,

Here is a small description of the situtation.

We have approximately 5,000 customers who use a tool connected to a Postgres database. Each client can manipulate the data in a database of its own. Each database contains approximately 30 tables. The model of each base is identical. So we have this:

db_customer_1

  • table_1
  • table2
  • ...
  • table_30
    ...
    db_customer_5000
  • table_1
  • ...

We would like to index all this data in Elasticsearch. But we can not choose how to create our indexes.

First proposal:
We put all the data of the table_1 of each database in an index called index_table_1, with an identifier on the document allowing to know the customer. About 30 indexes.

Second solution:
We put the data for each table_1 for each customer in a single index called index_customer_1_table_1. At present, approximately 150000 indexes.

We are afraid that the first solution will cause performance issues because the data, even if based on the same model, does not belong to the same client.

We are afraid with the second solution, because the increase in the number of indexes can be a problem in the medium term.

Can you help us ?

Thank you

The number of indices in option 2 will be an immediate problem so I would recommend option 1.

Thank you for your reply.
Is there a way a specific mapping or something else that I can put on the field customer_id of my document? Like Indexe in DBMS. To allow my aggregation to be as efficient as possible? Because they will always start with match:{customer_id:****,....}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.