More indices vs. more types

Hi all,

how is having multiple indices with just one document type different from
having one index with multiple document types? When do I choose what?

My application sorts topics into namespaces. Most of the time I need to
search all topics that belong to a namespace. My first implementation was
to have an index for each topic, doing a multi-index search for a
namespace. It works, but would the other way be different? How? I use
faceting a lot.

Some numbers: A topic contains 10^2-10^6 documents. A namespace consists of
10-30 topics. Usually I have ~20 different namespaces per application
instance.

Any help is appreciated. Kind regards,

Christian

On May 17, 10:18 am, Christian Aust <christian.a...@software-
consultant.net> wrote:

how is having multiple indices with just one document type different from
having one index with multiple document types? When do I choose what?

One advantage of multiple indexes is that you can close indexes that
are no longer needed. Searching multiple indexes should also be
faster--but only if the indexes are spread over enough machines. Have
you considered having one index per namespace?

Am 18.05.2012 um 20:53 schrieb Eric Jain:

On May 17, 10:18 am, Christian Aust <christian.a...@software-
consultant.net> wrote:

how is having multiple indices with just one document type different from
having one index with multiple document types? When do I choose what?

One advantage of multiple indexes is that you can close indexes that
are no longer needed. Searching multiple indexes should also be
faster--but only if the indexes are spread over enough machines. Have
you considered having one index per namespace?

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know. Does anybody else? Regards,

Christian

On Fri, May 18, 2012 at 12:39 PM, Christian Aust
christian.aust@software-consultant.net wrote:

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Right (assuming the indexes are all on one machine or have few documents).

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know if there is a general answer to that question. If queries
are run on a single namespace (and there are enough documents in each
namespace), having one index (or perhaps shard) per namespace seems
like the way to go.

I don't think elasticsearch has issues handling a few hundred indexes
or index with a few dozen types, but there's no way around doing your
own performance testing...

ES has bit trouble handling a big number of constantly open indexes,
don't do it!

I had something in production with a couple of hundred indexes and it
pretty much died every
2 weeks because of memory issues. Fortunately we were able to convert
those indexes to types and haven't had "big" issues with ES since.

Shay told me 6 months ago that it is much better to use few indexes
with types and aliases than actual indexes - I'm sure that advice
still applies.

You also should expect occasional shard failures resulting in
inconsistencies but that can be easily mitigated by just opening and
closing an index or
simply restarting the node. I've to do that every 1 or 2 months.

On May 19, 12:36 am, Eric Jain eric.j...@gmail.com wrote:

On Fri, May 18, 2012 at 12:39 PM, Christian Aust

christian.a...@software-consultant.net wrote:

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Right (assuming the indexes are all on one machine or have few documents).

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know if there is a general answer to that question. If queries
are run on a single namespace (and there are enough documents in each
namespace), having one index (or perhaps shard) per namespace seems
like the way to go.

I don't think elasticsearch has issues handling a few hundred indexes
or index with a few dozen types, but there's no way around doing your
own performance testing...

You also should expect occasional shard failures resulting in
inconsistencies but that can be easily mitigated by just opening and
closing an index or
simply restarting the node. I've to do that every 1 or 2 months.

I'm curious as to why you get occasional shard failures. We've been
making heavy use of ES for over 2 years now, and I never need to touch
my boxes. They just keep running.

Are you using virtual servers or your own boxes? What environment, EC2
or hosted? How much memory, CPU etc?

clint

Hi clint,

I'm running two dedicated dell servers with xeon L5520, 72GB.
The index that is failing is an very active one with many thousands of
writes per day.
What are you using to store your indices, I'm using the local
filesystem.

Here is the exception I'm getting, to me it seems like the file
pointer is wrong.

[2012-05-17 00:00:06,989][WARN ][index.shard.service ] [Spot]
[classifieds][0] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[classifieds][0] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
789)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:
419)
at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineRefresher$1.run(InternalIndexShard.java:706)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.FileNotFoundException: /home/user/els_main/data/
search/nodes/0/indices/classifieds/0/index/_pft0.prx (Operation not
permitted)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(Unknown Source)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:70)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:92)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
79)
at org.elasticsearch.index.store.Store
$StoreDirectory.openInput(Store.java:452)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:
89)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:705)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:680)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:
201)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3651)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3588)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
452)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
401)
at
org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:
428)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:
448)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:
396)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:
520)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
764)
... 5 more
[2012-05-17 00:00:07,325][WARN ][index.merge.scheduler ] [Spot]
[classifieds][0] failed to merge
java.io.FileNotFoundException: /home/user/els_main/data/search/nodes/0/
indices/classifieds/0/index/_pft0.prx (Operation not permitted)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(Unknown Source)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:70)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:92)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
79)
at org.elasticsearch.index.store.Store
$StoreDirectory.openInput(Store.java:452)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:
89)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:705)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:680)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:
201)
at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:
4086)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:
4040)
at
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:
354)
at
org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvider
$CustomConcurrentMergeScheduler.merge(ConcurrentMergeSchedulerProvider.java:
104)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:
2746)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:
2740)
at
org.elasticsearch.index.engine.robin.RobinEngine.maybeMerge(RobinEngine.java:
963)
at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineMerger$1.run(InternalIndexShard.java:750)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

On May 19, 1:51 pm, Clinton Gormley cl...@traveljury.com wrote:

You also should expect occasional shard failures resulting in
inconsistencies but that can be easily mitigated by just opening and
closing an index or
simply restarting the node. I've to do that every 1 or 2 months.

I'm curious as to why you get occasional shard failures. We've been
making heavy use of ES for over 2 years now, and I never need to touch
my boxes. They just keep running.

Are you using virtual servers or your own boxes? What environment, EC2
or hosted? How much memory, CPU etc?

clint

Hi Christian,

It also depends on the number of shards and replicas you have
configured per-index.

I have no idea what the absolute limit of total shards is (I suppose
it also depends on your hardware), but I think that having many
indices would slow your searches down. Because each shard is a
separate Lucene index. But indexing operations should get faster.

So if you get a lot of documents to be indexed, having many indices
(thus shards) should help. If not, I would stick with multiple types.

Although, as Eric said, you need to test to be sure.

On 18 mai, 22:39, Christian Aust <christian.a...@software-
consultant.net> wrote:

Am 18.05.2012 um 20:53 schrieb Eric Jain:

On May 17, 10:18 am, Christian Aust <christian.a...@software-
consultant.net> wrote:

how is having multiple indices with just one document type different from
having one index with multiple document types? When do I choose what?

One advantage of multiple indexes is that you can close indexes that
are no longer needed. Searching multiple indexes should also be
faster--but only if the indexes are spread over enough machines. Have
you considered having one index per namespace?

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know. Does anybody else? Regards,

Christian

smime.p7s
5KVizualizaţiDescărcaţi

Hiya

I'm running two dedicated dell servers with xeon L5520, 72GB.
The index that is failing is an very active one with many thousands of
writes per day.
What are you using to store your indices, I'm using the local
filesystem.

[2012-05-17 00:00:07,325][WARN ][index.merge.scheduler ] [Spot]
[classifieds][0] failed to merge
java.io.FileNotFoundException: /home/user/els_main/data/search/nodes/0/
indices/classifieds/0/index/_pft0.prx (Operation not permitted)

I'm wondering if you are running into an open file limit, or running
out of inodes. Merging increases the number of filehandles considerably
(but temporarily).

Have a look in /var/log/messages|syslog - see if there is anything
there, and try raising your ulimit -n. Hopefully that'll help

clint