Best practices for mappings (if they don't differ)

Hi Everyone

I'm new to elasticsearch and I had to maintain a running elasticsearch instance. I found myself analyzing the indices and the load process. I figured out that the solution has 4 different indices with 4 different kind of mappings (for each index one mapping).

So far so good. I found out that each load of documents creates a new mapping (per dynamic URL), so the solution ended up having a lot of mappings (more than 100 and growing) looking exactly the same.

  • I asked myself if there is any good reason to build such a design?
  • Can you imagine a scenario where such a design could make sense?

Thanks for your advices and your time.

Best regards
Matt

P.S. At the moment I’m in the decission process to redesign the load process of data: I think that the best scenario would be to have one mapping for all data in an index, especially when the mappings does not differ from each other. This makes reindexing at schema changes more maintainable.

Under the covers, types are simply a field called "_type" which hold the type name. So if all your documents in the index share the same mapping but just have different types, it's no different than putting them all in one type and using some kind of discriminatory field (e.g. "category_type") to differentiate between them.

This is a use-case where multiple types in one index are fine, since they essentially share the same mapping.

It get's increasingly worse if the various types in an index do not share the same mapping. Since, under the covers, they actually do share the same mapping, what you end up with is an index that has very "sparse" coverage. Some of the documents use one part of the mapping, while other documents use a different part. This sparsity creates performance problems for Lucene, as well as larger on-disk footprint.

You can read more about this in detail here: https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html

So basically, your use-case is fine since they all share the same mapping. But you're also right that it isn't much different from putting them into a single type. At that point, it's up to you which you think is easier to manage.

Hi Zachary

Thanks to dive into my problem.

Under the covers, types are simply a field called "_type" which hold the type name.

That's where my problem of understanding come from. The situation is like that: The "application" which feeeds the elasticsearch uses the post command like this:

  • POST myindex/type-0001/AnyID {...}
  • POST myindex/type-0002/AnyID {...}
  • POST myindex/type-...../AnyID {...}

This creates a new mapping for each day(load) . If I call "GET myindex/_mapping/_all" I can see the mapping list like this:

"mappings": {
"type-0001": { "properties": { ... },
"type-0002": { "properties": { ... },
"type-000n": { "properties": { ... }
}

As I mentioned in my last post they look all the same. I assume they're all created dynamic not explicit.

I've read this two links:
https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html
https://www.elastic.co/blog/found-elasticsearch-mapping-introduction

... but I can't find a solution to influence this kind of dynamic mapping i.e. if I want to specify a field as "not-analyzed". I tried it with dynamic templates, but this seams only to work for new fields and if you always refer to the same type.

Do you know a solution for influencing the creation of new mappings?

Thanks for your time an patience.

Best regards
Matt

Yeah, that's a little confusing. Even though the type is internally "just a field", externally they each obtain their own mapping. This is sorta a historical legacy due to how mappings used to work, where different mappings inside an index could be wildly different (and conflicting).

This isn't the case anymore, so it's less of a useful distinction to have each type with their own mapping.

That's partially why I suggested if they are all essentially identical, to just group them into a single type instead of dynamically creating type-0001 - type-000n.

but I can't find a solution to influence this kind of dynamic mapping i.e. if I want to specify a field as "not-analyzed". I tried it with dynamic templates, but this seams only to work for new fields and if you always refer to the same type.

To answer your question, you can solve this with a Default Mapping. This mapping is applied as the default for all types in an index. So if they all share the same set of fields, you can setup a default to be applied regardless of the type.

But I'd encourage you to consolidate into a single type... it'll be easier to manage, and make ES/Lucene happier in the long run (less work for your master node to do, etc)

Hi Zachary

Thanks for your explanations and samples. I forgot to mention the EL version I work with: 1.7. This versions seams to be a litte bit outdated due to the fact that some major changes came with version 2.x.

Best regards
Matt