More indices vs. more types

Christian_Aust · May 17, 2012, 5:18pm

Hi all,

how is having multiple indices with just one document type different from
having one index with multiple document types? When do I choose what?

My application sorts topics into namespaces. Most of the time I need to
search all topics that belong to a namespace. My first implementation was
to have an index for each topic, doing a multi-index search for a
namespace. It works, but would the other way be different? How? I use
faceting a lot.

Some numbers: A topic contains 10^2-10^6 documents. A namespace consists of
10-30 topics. Usually I have ~20 different namespaces per application
instance.

Any help is appreciated. Kind regards,

Christian

Eric_Jain · May 18, 2012, 6:53pm

On May 17, 10:18 am, Christian Aust <christian.a...@software-
consultant.net> wrote:

how is having multiple indices with just one document type different from
having one index with multiple document types? When do I choose what?

One advantage of multiple indexes is that you can close indexes that
are no longer needed. Searching multiple indexes should also be
faster--but only if the indexes are spread over enough machines. Have
you considered having one index per namespace?

Christian_Aust · May 18, 2012, 7:39pm

Am 18.05.2012 um 20:53 schrieb Eric Jain:

On May 17, 10:18 am, Christian Aust <christian.a...@software-
consultant.net> wrote:

how is having multiple indices with just one document type different from
having one index with multiple document types? When do I choose what?

One advantage of multiple indexes is that you can close indexes that
are no longer needed. Searching multiple indexes should also be
faster--but only if the indexes are spread over enough machines. Have
you considered having one index per namespace?

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know. Does anybody else? Regards,

Christian

Eric_Jain · May 18, 2012, 8:36pm

On Fri, May 18, 2012 at 12:39 PM, Christian Aust
christian.aust@software-consultant.net wrote:

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Right (assuming the indexes are all on one machine or have few documents).

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know if there is a general answer to that question. If queries
are run on a single namespace (and there are enough documents in each
namespace), having one index (or perhaps shard) per namespace seems
like the way to go.

I don't think elasticsearch has issues handling a few hundred indexes
or index with a few dozen types, but there's no way around doing your
own performance testing...

Seb · May 19, 2012, 8:49am

ES has bit trouble handling a big number of constantly open indexes,
don't do it!

I had something in production with a couple of hundred indexes and it
pretty much died every
2 weeks because of memory issues. Fortunately we were able to convert
those indexes to types and haven't had "big" issues with ES since.

Shay told me 6 months ago that it is much better to use few indexes
with types and aliases than actual indexes - I'm sure that advice
still applies.

You also should expect occasional shard failures resulting in
inconsistencies but that can be easily mitigated by just opening and
closing an index or
simply restarting the node. I've to do that every 1 or 2 months.

On May 19, 12:36 am, Eric Jain eric.j...@gmail.com wrote:

On Fri, May 18, 2012 at 12:39 PM, Christian Aust

christian.a...@software-consultant.net wrote:

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Right (assuming the indexes are all on one machine or have few documents).

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know if there is a general answer to that question. If queries
are run on a single namespace (and there are enough documents in each
namespace), having one index (or perhaps shard) per namespace seems
like the way to go.

I don't think elasticsearch has issues handling a few hundred indexes
or index with a few dozen types, but there's no way around doing your
own performance testing...

Clinton_Gormley · May 19, 2012, 9:51am

You also should expect occasional shard failures resulting in
inconsistencies but that can be easily mitigated by just opening and
closing an index or
simply restarting the node. I've to do that every 1 or 2 months.

I'm curious as to why you get occasional shard failures. We've been
making heavy use of ES for over 2 years now, and I never need to touch
my boxes. They just keep running.

Are you using virtual servers or your own boxes? What environment, EC2
or hosted? How much memory, CPU etc?

clint

Seb · May 20, 2012, 6:53am

Hi clint,

I'm running two dedicated dell servers with xeon L5520, 72GB.
The index that is failing is an very active one with many thousands of
writes per day.
What are you using to store your indices, I'm using the local
filesystem.

Here is the exception I'm getting, to me it seems like the file
pointer is wrong.

[2012-05-17 00:00:06,989][WARN ][index.shard.service ] [Spot]
[classifieds][0] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[classifieds][0] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
789)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:
419)
at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineRefresher$1.run(InternalIndexShard.java:706)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.FileNotFoundException: /home/user/els_main/data/
search/nodes/0/indices/classifieds/0/index/_pft0.prx (Operation not
permitted)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(Unknown Source)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:70)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:92)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
79)
at org.elasticsearch.index.store.Store
$StoreDirectory.openInput(Store.java:452)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:
89)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:705)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:680)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:
201)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3651)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3588)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
452)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
401)
at
org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:
428)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:
448)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:
396)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:
520)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
764)
... 5 more
[2012-05-17 00:00:07,325][WARN ][index.merge.scheduler ] [Spot]
[classifieds][0] failed to merge
java.io.FileNotFoundException: /home/user/els_main/data/search/nodes/0/
indices/classifieds/0/index/_pft0.prx (Operation not permitted)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(Unknown Source)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:70)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:92)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
79)
at org.elasticsearch.index.store.Store
$StoreDirectory.openInput(Store.java:452)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:
89)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:705)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:680)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:
201)
at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:
4086)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:
4040)
at
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:
354)
at
org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvider
$CustomConcurrentMergeScheduler.merge(ConcurrentMergeSchedulerProvider.java:
104)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:
2746)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:
2740)
at
org.elasticsearch.index.engine.robin.RobinEngine.maybeMerge(RobinEngine.java:
963)
at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineMerger$1.run(InternalIndexShard.java:750)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

On May 19, 1:51 pm, Clinton Gormley cl...@traveljury.com wrote:

You also should expect occasional shard failures resulting in
inconsistencies but that can be easily mitigated by just opening and
closing an index or
simply restarting the node. I've to do that every 1 or 2 months.

I'm curious as to why you get occasional shard failures. We've been
making heavy use of ES for over 2 years now, and I never need to touch
my boxes. They just keep running.

Are you using virtual servers or your own boxes? What environment, EC2
or hosted? How much memory, CPU etc?

clint

Radu_Gheorghe1 · May 20, 2012, 12:58pm

Hi Christian,

It also depends on the number of shards and replicas you have
configured per-index.

I have no idea what the absolute limit of total shards is (I suppose
it also depends on your hardware), but I think that having many
indices would slow your searches down. Because each shard is a
separate Lucene index. But indexing operations should get faster.

So if you get a lot of documents to be indexed, having many indices
(thus shards) should help. If not, I would stick with multiple types.

Although, as Eric said, you need to test to be sure.

On 18 mai, 22:39, Christian Aust <christian.a...@software-
consultant.net> wrote:

Am 18.05.2012 um 20:53 schrieb Eric Jain:

On May 17, 10:18 am, Christian Aust <christian.a...@software-
consultant.net> wrote:

how is having multiple indices with just one document type different from
having one index with multiple document types? When do I choose what?

One advantage of multiple indexes is that you can close indexes that
are no longer needed. Searching multiple indexes should also be
faster--but only if the indexes are spread over enough machines. Have
you considered having one index per namespace?

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know. Does anybody else? Regards,

Christian

smime.p7s
5KVizualizaţiDescărcaţi

Clinton_Gormley · May 21, 2012, 8:35am

Hiya

I'm running two dedicated dell servers with xeon L5520, 72GB.
The index that is failing is an very active one with many thousands of
writes per day.
What are you using to store your indices, I'm using the local
filesystem.

[2012-05-17 00:00:07,325][WARN ][index.merge.scheduler ] [Spot]
[classifieds][0] failed to merge
java.io.FileNotFoundException: /home/user/els_main/data/search/nodes/0/
indices/classifieds/0/index/_pft0.prx (Operation not permitted)

I'm wondering if you are running into an open file limit, or running
out of inodes. Merging increases the number of filehandles considerably
(but temporarily).

Have a look in /var/log/messages|syslog - see if there is anything
there, and try raising your ulimit -n. Hopefully that'll help

clint

Topic		Replies	Views
Multiple Indices vs Multiple type Elasticsearch	1	293	July 6, 2017
Many types in single index vs multiple indices Elasticsearch	1	437	July 6, 2017
Types and Indices. One to one? Elasticsearch	5	1979	July 6, 2017
ES 6.x : multi indices or one index with field type? Elasticsearch	2	453	March 2, 2018
Multiple small indexes or one index with potential mapping explosion Elasticsearch	3	1158	September 29, 2020

More indices vs. more types

Related topics