Types and Indices. One to one?

totallymike · May 22, 2015, 7:06pm

Hi there. I've been digging deeper into Elasticsearch lately, and I'm wondering about the ramifications of multiple types in a single index.

For some background, I got into Elasticsearch because a number of Ruby on Rails projects I've worked on use it. It seems to be idiomatic for the various Ruby gems to split up each document type into its own index.

However, that limits the ability to do parent-child queries and so on.

What's the story behind indices and types? Should they stay split up unless one needs to reason about the relationships between documents? Should they all go in a single index just for fun? Does it matter tremendously one way or another?

My instinct tells me to keep them separated. I would suspect (without any data to back it up) that ES has an easier time of just about everything (searching, indexing, storage, partitioning) when document types are broken out into different indices.

loren · May 23, 2015, 6:47pm

I think your instincts are correct. As someone who has blown up a production site because I did not split two types into two indexes, my default thinking is 1:1 unless there is some good reason to do otherwise. This is especially true if it makes sense for fields to have the same name across types.

Calculations like IDF apply to a field across the entire index, regardless of type or other filtered fields that you may be using to logically partition your data.

Example: I had a reasonable sized (~1M) index with one type type1 with a field called my_text. Queries, aggregations, and so on were super fast and lots of end users hit pages that relied on these queries.

Then I added about a billion documents of type type2, also containing a field called my_text with the same mapping. I thought that was clever because I could then do some reasoning about my_text across the types.

What happened instead was that my original queries involving my_text on type1 got slow enough to back up the application server layer. Then I ran out of heap.

From a relevancy point of view, you can inadvertently change my_text search results on type1 by adding documents of type2, as the IDF across the index for that field changes. Ditto for suggestions.

Finally, for time-based indexes like you'd get with Logstash, it's easier to maintain different storage/replica/retention policies when you have the types split into separate physical indexes.

warkolm · May 24, 2015, 2:04am

If an index is analogous to a DB, then a type is analogous to a table. Keeping things separate is good data hygiene.
However if you have lots of different types it may make sense to normalise them and group similar things.

totallymike · May 24, 2015, 7:43pm

Thank you both. That's more or less how I felt. I appreciate having a little more weight behind those instincts.

roytmana · May 29, 2015, 7:54pm

i thought that at some point (when naming of fields was refactored for multifield and other scenarios) lucene field names was made to include type name so no collisions would occur across types. I guess I was wrong. Could anyone comment on it?

Topic		Replies	Views
One index for each type, or force all documents to fit one type in one index? Elasticsearch	2	609	August 23, 2017
Alternative for mapping type Elasticsearch	6	414	September 29, 2020
Elasticsearch Architecture Elasticsearch	5	477	August 30, 2018
Performance Issue with Single-Type Index vs. Multi-Type Index in Elasticsearch Elasticsearch user-experience	1	5	November 15, 2024
Splitting small amount of data over multiple vs a single index? Elasticsearch	1	522	December 13, 2017

Types and Indices. One to one?

Related topics