Single Index vs. Multiple Indices

Hello,

What is better to use, a single index or multiple ones?
Which of the two gives a better performance in terms of searching?

Knowing that if I stored data in multiple indices, I will search only in one of these indices (based on the request).

Thanks in advance.

That is an impossible question to answer as it depends a lot on the use-case. You will in my opinion need to provide a lot more details if you want any useful feedback. The nest way to find out is however to test with real data and real queries.

I have multiple clients and I need to store their data separately, meaning that all search requests will be client-dependent.
What else do I need to provide you with?

How many clients are you expecting? Do you have control over the data format?

I expect like 10-20 clients.
I need to save posts which have the following:
Id, Title (string), Description (string), Url (string), Tags (list), Date, Terms (list), Type (string), Importance (int).

Please note that I'm only using one node of ElasticSearch (not a cluster of nodes).

Look, I found this on the website:
TIP: In order to reduce the number of indices and avoid large and sprawling mappings, consider storing data with similar structure in the same index rather than splitting into separate indices based on where the data comes from. It is important to find a good balance between the number of indices and the mapping size for each individual index.

So, probably it's better to have a single index. Is that correct?
That way, I'll save the client as a property of the post.

IMO it's a no brainier, store each client in a separate index. Since you'll never (or rarely) need to query across customers, putting them into separate indexies will boost query performance as you scale.

If you want more info about this, try reading up on how the lucene inverted indexies work. In a single index, all customers terms and text indexed together. In multi index, you're only searching over one customers data at a time.

If you do not expect the number of clients to grow significantly beyond that, having an index per customer could work, but I would recommend to reduce the number of primary shards to one (unless you have very large data volumes) for each Ione of them. You do want to avoid a lot of small shards as this can be very inefficient. The approach of setting a field per customer and store then all in a single index would also work as long as you can control the data format. This would be the preferred option if you were expecting to grow the number of clients significantly.

I see, actually I can't control the number of clients (they increase with time).
The data isn't too large, so If I'm going in the direction of using separate indices, I will probably use one shard as you've advised.
Concerning the other option (using a single index), what do you mean by controlling the data format?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.