Any suggestion about indexes?

We are investigating to create indexes for a new ES system. From 6.x, ES only supports one type in one index, that means we need to create many indexes, Does that mean new versions of ES has better performance to handle many shards , and is able to handle many many indexes, such as hundreds of or thousands of indexes, which have many shards in total ?

We plan to set up our new system on ES 6.x, the system is to store entity metadata for different apps into ES , then search from ES.
All apps have about 21 common properties , their data are similar, but only one different property called "appOwnData", it is an object type, each app can define its "appOwnData" on its demands. So each app's "appOwnData" should be different. Each app may have about 2 - 10 properties in "appOwnData", There maybe 4 - 10 apps.

myEntity: {
myId,
myname,
....
appOwnData,
}

The most general search cases are to query something in one app in one org. There are also some cases to search in several apps in one org , or search in several apps in all orgs.

We also plan to generate new index in a new year, such as 2017_app1, 2017_app2, 2018_app1, 2018_app2,
then in future there will be many indexes.

It seems like it is better to create one index for one app because of the "appOwnData", instead of putting all app into one big index.

Here suppose ES cluster nodes can be set up enough many to meet requirement.

I have concern on performance with so many indexes , Any suggestions?

In early versions of Elasticsearch (1.x), fields with the same name could have different mappings for different types, even within the same index. In subsequent releases stronger checks on mappings have been enforced, and now a single field must have a single mapping across all types within an index. When you consider whether to store data which at least partially differs in structure in the same index, you have two factors you need to consider; mapping conflicts and sparse fields.

If the different structures are likely to just result in sparse fields, e.g. fields that are only defined in a small subset of documents within an index, this used to result in inflated size on disk. This has been dramatically improved in Elasticsearch 6.x.

If you have mapping conflicts, these are trickier to deal with, but if you have been using any recent version of Elasticsearch, you should not have that problem as it has been enforced for quite some time. One way to get around this would be to enforce a naming convention on properties based on a prefix or suffix, so that this can be used to determine the mapping of the field. All fields that follow the naming convention get the correct mapping while all other fields get a default mapping, e.g. keyword.

As far as I know there are no dramatic improvements here in version 6.x. Each shard still comes with some overhead as described in this blog post, which makes having lots of small shards and indices inefficient. I believe the general recommendation still is to keep the number of shards per node in the hundreds rather than in the thousands, although it is possible a higher or lower number may be applicable to your use case as this is a general recommendation.

I would recommend that you avoid creating an index per app and time frame and try to consolidate them into as few indices as possible.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.