How to split multiple typed index out to prepare for ES upgrade (where multiple types are deprecated)

We are currently running a cluster with ES 2.3.2 that has one large index with the following properties:

  • 762 GB (366 million docs)
  • 25 data nodes; 3 master nodes; 3 client nodes
  • 23 shards / 1 replica

This one index has 20+ types, each with a few common and many unique fields. I am redesigning the cluster with the following goals:

  1. Remove multiple types in an index so that we can upgrade ES. Though multi-types are supported in v5, we want to do the work to prep for v6 now.
  2. Break up the large index into more manageable smaller indexes

I have set up a new identical cluster. I modified the indexing so that I have one index per type. I allocated a shard count based on the relative size of the data with a minimum of 2, and a max of 5 shards. After indexing all of our data into this new cluster, I am finding that the same query against the new cluster is slower than against the old cluster.

I figured this was due to the explosion of shards (i.e. was 23 primary, and now it is 78). I closed all but one index (that has a shard count of 2), then ran a test where I targeted a single type against my old monolithic index, and the new single-typed index (using a homebrew tool to run requests in parallel and parse out the "took"). I find that if I do a "size: 0", my new cluster is faster. When I return 7 or 8 they seem to be in parity. It then goes downhill where our default query of 30 records returned is about twice as slow. I am guessing this is because there are fewer threads to do the actual retrieval in the smaller index with two shards vs the large one with 23.

What is the recommendation for moving away from multi-typed indexes when the following is true:

  • There are many types
  • The types have very different mappings
  • There is a huge variance in size per type running from 4 mb to 154 gb

I am currently contemplating putting them all in one type with one massive mapping (I don't think there are any fields with the same name but different mappings), but that seems really ugly.

Any suggestions welcome,

Thanks,
~john

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Hey John T,

Sorry I didn't see this until just now. Please check out this article for recommendations on how to reindex your indices into single-type ones. You can do as you say--one huge mapping with a single top-level type and nested subtypes under it, but since you have types currently with very different data, it's probably better to create multiple indices.

Here's the article:
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/removal-of-types.html#removal-of-types

Here's a webinar where I walk you through different strategies for splitting your multiple types. https://www.elastic.co/webinars/upgrading-your-elastic-stack