How is performance affected on distribution of data over multiple indices


(narinder.izap) #1

Hi there,
We are going to have billions of records in the es, so we
have plan to distribute the data over multiple indices, I want to know how
it is going to affect the performance, which of the options will be better
in terms of performance,

Single Index with data distributed in multiple types

OR

Multiple Indices

For example:
Multiple Indices:

Index 1:
"user" all data related to user.

Index 2:
"media", all the media related to user, (photos, videos etc), which are
stored in "INDEX 1", related with some id in those documents.

Index 3:
"comments", all the comments related to users, and media those are stored
in above two indices.

Single Index

"master_index"

type: user, media, comments


(Drew Raines) #2

Narinder Kaur wrote:

We are going to have billions of records in the es, so we have plan
to distribute the data over multiple indices, I want to know how it
is going to affect the performance, which of the options will be
better in terms of performance,

Single Index with data distributed in multiple types

OR

Multiple Indices

I would go for single index with types. That will give you more
flexibility in relating & retrieving the data.

There's no inherent limit to the size of an index. It's just an
abstraction on top of one or more shards. You'll want to tune the
number of shards based on how big your docs are, how many nodes,
etc.

-Drew


(Karussell) #3

It also depends on how many times you'll update the index and how fast
updated docs should pop up in searches.

If you frequently update (a lot) data and you need them pop up in
(near) realtime, then you should prefer smaller indices.
Also it gives you the flexibility (but also more work) to split
indices not only on type but also on date (e.g. for comments etc).

Peter.

On 28 Okt., 20:52, Drew Raines d...@raines.me wrote:

Narinder Kaur wrote:

We are going to have billions of records in the es, so we have plan
to distribute the data over multiple indices, I want to know how it
is going to affect the performance, which of the options will be
better in terms of performance,

Single Index with data distributed in multiple types

OR

Multiple Indices

I would go for single index with types. That will give you more
flexibility in relating & retrieving the data.

There's no inherent limit to the size of an index. It's just an
abstraction on top of one or more shards. You'll want to tune the
number of shards based on how big your docs are, how many nodes,
etc.

-Drew


(system) #4