Inserts get slower when index become large


(Barak Yaish) #1

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)
and using java bulk api for insertions (bulk size is 100000). I
noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster? Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?

BTW, BulkResponse.getTookInMillis() always return 0..

Thanks.


(Shay Banon) #2

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is the index size get with 100M docs (you get that from the index status command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each doc, which I can't tell). Adding more threads to the mix will help indexing time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the work will be distributed between more shards. Note that if you have a single machine now, and you add another one, then the replicas will be allocated on it, and indexing might actually be a bit slower (since it need to be performed on the replica as well).

Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?

You can try and increase the index.refresh_interval (defaults to 1s), this can help when indexing. There are other low level lucene configuration that we can try, but lets first try what I suggested above (use FS).

BTW, BulkResponse.getTookInMillis() always return 0..

Strange, I will check.

Thanks.


(Barak Yaish) #3

When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).

I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.

On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is the index size get with 100M docs (you get that from the index status command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each doc, which I can't tell). Adding more threads to the mix will help indexing time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the work will be distributed between more shards. Note that if you have a single machine now, and you add another one, then the replicas will be allocated on it, and indexing might actually be a bit slower (since it need to be performed on the replica as well).

Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?

You can try and increase the index.refresh_interval (defaults to 1s), this can help when indexing. There are other low level lucene configuration that we can try, but lets first try what I suggested above (use FS).

BTW, BulkResponse.getTookInMillis() always return 0..

Strange, I will check.

Thanks.


(Barak Yaish) #4

From the thread dumps on the two machines it looks like only one
machine is working (the one that booted last), is this expected? I
used bin/elasticsearch -f -Xmx15g -Xms15g to boot both machines. On
the first machine jstack show only waiting threads, like these:

"New I/O client worker #1-12" daemon prio=10 tid=0x000000005f9de800
nid=0x1971 runnable [0x000000004051d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)
- locked <0x00002aaab8dfc7c8> (a sun.nio.ch.Util$1)
- locked <0x00002aaab8dfc7e0> (a java.util.Collections
$UnmodifiableSet)
- locked <0x00002aaab8dfb7c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:
38)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:
164)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"New I/O server boss #2 ([id: 0x35e80f3a, /0:0:0:0:0:0:0:0:9200])"
daemon prio=10 tid=0x000000005eafa000 nid=0x1960 runnable
[0x00000000426af000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)
- locked <0x00002aaab8e02130> (a sun.nio.ch.Util$1)
- locked <0x00002aaab8e02118> (a java.util.Collections
$UnmodifiableSet)
- locked <0x00002aaab8dfb838> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink
$Boss.run(NioServerSocketPipelineSink.java:241)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

while on the second machine there threads doing Lucene stuff:

"myindex[Cutthroat][tp]-pool-1-thread-103" daemon prio=10
tid=0x000000004b1b0000 nid=0x1050 runnable [0x0000000042a7b000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.document.AbstractField.isTokenized(AbstractField.java:
133)
at
org.apache.lucene.analysis.Analyzer.getOffsetGap(Analyzer.java:133)
at
org.elasticsearch.index.analysis.NamedAnalyzer.getOffsetGap(NamedAnalyzer.java:
89)
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAnalyzer.java:
66)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
201)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
246)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
826)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:
802)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:
438)
- locked <0x00002aaab9b8ee10> (a java.lang.Object)
at
org.elasticsearch.index.engine.robin.RobinEngine.bulk(RobinEngine.java:
223)
at
org.elasticsearch.index.shard.service.InternalIndexShard.bulk(InternalIndexShard.java:
257)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:
237)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:
182)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:
175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

This is my client: client = new
TransportClient().addTransportAddress( new
InetSocketTransportAddress( "dev1", 9300 ) ).addTransportAddress( new
InetSocketTransportAddress( "dev2", 9300 ) );

Is it me doing something wrong?

On Jan 26, 4:21 pm, barak barak.ya...@gmail.com wrote:

When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).

I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.

On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is the index size get with 100M docs (you get that from the index status command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each doc, which I can't tell). Adding more threads to the mix will help indexing time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the work will be distributed between more shards. Note that if you have a single machine now, and you add another one, then the replicas will be allocated on it, and indexing might actually be a bit slower (since it need to be performed on the replica as well).

Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?

You can try and increase the index.refresh_interval (defaults to 1s), this can help when indexing. There are other low level lucene configuration that we can try, but lets first try what I suggested above (use FS).

BTW, BulkResponse.getTookInMillis() always return 0..

Strange, I will check.

Thanks.


(Shay Banon) #5

Have the two machines discovered each one? You can see in the logs if they did or not.
On Wednesday, January 26, 2011 at 4:35 PM, barak wrote:

From the thread dumps on the two machines it looks like only one
machine is working (the one that booted last), is this expected? I
used bin/elasticsearch -f -Xmx15g -Xms15g to boot both machines. On
the first machine jstack show only waiting threads, like these:

"New I/O client worker #1-12" daemon prio=10 tid=0x000000005f9de800
nid=0x1971 runnable [0x000000004051d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8dfc7c8> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8dfc7e0> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb7c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:

at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:
164)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"New I/O server boss #2 ([id: 0x35e80f3a, /0:0:0:0:0:0:0:0:9200])"
daemon prio=10 tid=0x000000005eafa000 nid=0x1960 runnable
[0x00000000426af000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8e02130> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8e02118> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb838> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink
    $Boss.run(NioServerSocketPipelineSink.java:241)
    at
    org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:

at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

while on the second machine there threads doing Lucene stuff:

"myindex[Cutthroat][tp]-pool-1-thread-103" daemon prio=10
tid=0x000000004b1b0000 nid=0x1050 runnable [0x0000000042a7b000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.document.AbstractField.isTokenized(AbstractField.java:
133)
at
org.apache.lucene.analysis.Analyzer.getOffsetGap(Analyzer.java:133)
at
org.elasticsearch.index.analysis.NamedAnalyzer.getOffsetGap(NamedAnalyzer.java:
89)
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAnalyzer.java:
66)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
201)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
246)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
826)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:
802)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:
438)

  • locked <0x00002aaab9b8ee10> (a java.lang.Object)
    at
    org.elasticsearch.index.engine.robin.RobinEngine.bulk(RobinEngine.java:

at
org.elasticsearch.index.shard.service.InternalIndexShard.bulk(InternalIndexShard.java:
257)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:
237)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:
182)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:
175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

This is my client: client = new
TransportClient().addTransportAddress( new
InetSocketTransportAddress( "dev1", 9300 ) ).addTransportAddress( new
InetSocketTransportAddress( "dev2", 9300 ) );

Is it me doing something wrong?

On Jan 26, 4:21 pm, barak barak.ya...@gmail.com wrote:

When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).

I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.

On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is the index size get with 100M docs (you get that from the index status command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each doc, which I can't tell). Adding more threads to the mix will help indexing time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the work will be distributed between more shards. Note that if you have a single machine now, and you add another one, then the replicas will be allocated on it, and indexing might actually be a bit slower (since it need to be performed on the replica as well).

Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?

You can try and increase the index.refresh_interval (defaults to 1s), this can help when indexing. There are other low level lucene configuration that we can try, but lets first try what I suggested above (use FS).

BTW, BulkResponse.getTookInMillis() always return 0..

Strange, I will check.

Thanks.


(vpunski) #6

Hi,
I can confirm this behaviour on two clusters (9 and 6 load balanced
machines) we have (simple local storage used, 8G RAM machines).
Using simpler non bulk scenario, 20 parallel clients, when after about 5M
documents, indexing time reaches 500 ms (10 ms at the start).
From OS perspective, strace shows hundreds of IO operations on file system
(stats, open, read, delete).

On Wed, Jan 26, 2011 at 4:36 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Have the two machines discovered each one? You can see in the logs if
they did or not.

On Wednesday, January 26, 2011 at 4:35 PM, barak wrote:

From the thread dumps on the two machines it looks like only one
machine is working (the one that booted last), is this expected? I
used bin/elasticsearch -f -Xmx15g -Xms15g to boot both machines. On
the first machine jstack show only waiting threads, like these:

"New I/O client worker #1-12" daemon prio=10 tid=0x000000005f9de800
nid=0x1971 runnable [0x000000004051d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8dfc7c8> (a sun.nio.ch.Util$$1)
  • locked <0x00002aaab8dfc7e0> (a java.util.Collections
    $$UnmodifiableSet)
  • locked <0x00002aaab8dfb7c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at

org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:
38)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:
164)
at

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at

org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"New I/O server boss #2 ([id: 0x35e80f3a, /0:0:0:0:0:0:0:0:9200])"
daemon prio=10 tid=0x000000005eafa000 nid=0x1960 runnable
[0x00000000426af000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8e02130> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8e02118> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb838> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at

org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink
$Boss.run(NioServerSocketPipelineSink.java:241)
at

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at

org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

while on the second machine there threads doing Lucene stuff:

"myindex[Cutthroat][tp]-pool-1-thread-103" daemon prio=10
tid=0x000000004b1b0000 nid=0x1050 runnable [0x0000000042a7b000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.document.AbstractField.isTokenized(AbstractField.java:
133)
at
org.apache.lucene.analysis.Analyzer.getOffsetGap(Analyzer.java:133)
at

org.elasticsearch.index.analysis.NamedAnalyzer.getOffsetGap(NamedAnalyzer.java:
89)
at

org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAnalyzer.java:
66)
at

org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
201)
at

org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
246)
at

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
826)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriiter.java:

at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998)
at

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:
438)

  • locked <0x00002aaab9b8ee10> (a java.lang.Object)
    atorg.elasticsearch.index.engine.robin.RobinEngine.bulk(RobinEngine.java:

at

org.elasticsearch.index.shard.service.InternalIndexShard.bulk(InternalIndexShard.java:
257)
at

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:
237)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:
182)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:
175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

This is my client: client = new
TransportClient().addTransportAddress( new
InetSocketTransportAddress( "dev1", 9300 ) ).addTransportAddress( new
InetSocketTransportAddress( "dev2", 9300 ) );

Is it me doing something wrong?

On Jan 26, 4:21 pm, barak barak.ya...@gmail.com wrote:

When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).

I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.

On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is
the index size get with 100M docs (you get that from the index status
command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each
doc, which I can't tell). Adding more threads to the mix will help indexing
time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the
work will be distributed between more shards. Note that if you have a single
machine now, and you add another one, then the replicas will be allocated on
it, and indexing might actually be a bit slower (since it need to be
performed on the replica as well).

Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?

You can try and increase the index.refresh_interval (defaults to 1s), this
can help when indexing. There are other low level lucene configuration that
we can try, but lets first try what I suggested above (use FS).

BTW, BulkResponse.getTookInMillis() always return 0..

Strange, I will check.

Thanks.


(Barak Yaish) #7

They did, on the first node:

[Dorma] added {[Cutthroat][ExY4anwvS32rCx9zvleMYg][inet[/
10.80.0.23:9300]],}, reason: zen-disco-receive(from node[[Cutthroat]
[ExY4anwvS32rCx9zvleMYg][inet[/10.80.0.23:9300]]])

on the second:

[2011-01-26 09:27:26,743][INFO ][cluster.service ]
[Cutthroat] detected_master [Dorma][dCGXAqcITouRFxsoMDDY7A][inet[/
10.80.0.22:9300]], added {[Dorma][dCGXAqcITouRFxsoMDDY7A][inet[/
10.80.0.22:9300]],}, reason: zen-disco-receive(from [[Dorma]
[dCGXAqcITouRFxsoMDDY7A][inet[/10.80.0.22:9300]]])

In addition, queries against the first node return results:

[admin@dev3 apache]$ curl -XGET 'http://nodeA:9200/myindex/record/
_search?q=hits:630'
{"took":6301,"timed_out":false,"_shards":{"total":5,"successful":
5,"failed":0},"hits":{"total":95460,"max_s...

On Jan 26, 4:36 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Have the two machines discovered each one? You can see in the logs if they did or not.

On Wednesday, January 26, 2011 at 4:35 PM, barak wrote:

From the thread dumps on the two machines it looks like only one
machine is working (the one that booted last), is this expected? I
used bin/elasticsearch -f -Xmx15g -Xms15g to boot both machines. On
the first machine jstack show only waiting threads, like these:

"New I/O client worker #1-12" daemon prio=10 tid=0x000000005f9de800
nid=0x1971 runnable [0x000000004051d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8dfc7c8> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8dfc7e0> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb7c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(Selec torUtil.java:

at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.j ava:
164)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenami ngRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerR unnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"New I/O server boss #2 ([id: 0x35e80f3a, /0:0:0:0:0:0:0:0:9200])"
daemon prio=10 tid=0x000000005eafa000 nid=0x1960 runnable
[0x00000000426af000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8e02130> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8e02118> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb838> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSi nk
    $Boss.run(NioServerSocketPipelineSink.java:241)
    at
    org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenami ngRunnable.java:

at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerR unnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

while on the second machine there threads doing Lucene stuff:

"myindex[Cutthroat][tp]-pool-1-thread-103" daemon prio=10
tid=0x000000004b1b0000 nid=0x1050 runnable [0x0000000042a7b000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.document.AbstractField.isTokenized(AbstractField.java:
133)
at
org.apache.lucene.analysis.Analyzer.getOffsetGap(Analyzer.java:133)
at
org.elasticsearch.index.analysis.NamedAnalyzer.getOffsetGap(NamedAnalyzer.j ava:
89)
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAn alyzer.java:
66)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFie ld.java:
201)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocField ProcessorPerThread.java:
246)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java :
826)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:
802)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.jav a:
438)

  • locked <0x00002aaab9b8ee10> (a java.lang.Object)
    at
    org.elasticsearch.index.engine.robin.RobinEngine.bulk(RobinEngine.java:

at
org.elasticsearch.index.shard.service.InternalIndexShard.bulk(InternalIndex Shard.java:
257)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnRepl ica(TransportShardBulkAction.java:
237)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication OperationAction.java:
182)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication OperationAction.java:
175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

This is my client: client = new
TransportClient().addTransportAddress( new
InetSocketTransportAddress( "dev1", 9300 ) ).addTransportAddress( new
InetSocketTransportAddress( "dev2", 9300 ) );

Is it me doing something wrong?

On Jan 26, 4:21 pm, barak barak.ya...@gmail.com wrote:

When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).

I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.

On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is the index size get with 100M docs (you get that from the index status command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each doc, which I can't tell). Adding more threads to the mix will help indexing time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the work will be distributed between more shards. Note that if you have a single machine now, and you add another one, then the replicas will be allocated on it, and indexing might actually be a bit slower (since it need to be performed on the replica as well).

Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?

You can try and increase the index.refresh_interval (defaults to 1s), this can help when indexing. There are other low level lucene configuration that we can try, but lets first try what I suggested above (use FS).

BTW, BulkResponse.getTookInMillis() always return 0..

Strange, I will check.

Thanks.


(Shay Banon) #8

If its there, then its working. If you see data getting created on the second node data directory, then your thread dump simply did not catch it while it was doing anything.
On Wednesday, January 26, 2011 at 5:05 PM, barak wrote:

They did, on the first node:

[Dorma] added {[Cutthroat][ExY4anwvS32rCx9zvleMYg][inet[/
10.80.0.23:9300]],}, reason: zen-disco-receive(from node[[Cutthroat]
[ExY4anwvS32rCx9zvleMYg][inet[/10.80.0.23:9300]]])

on the second:

[2011-01-26 09:27:26,743][INFO ][cluster.service ]
[Cutthroat] detected_master [Dorma][dCGXAqcITouRFxsoMDDY7A][inet[/
10.80.0.22:9300]], added {[Dorma][dCGXAqcITouRFxsoMDDY7A][inet[/
10.80.0.22:9300]],}, reason: zen-disco-receive(from [[Dorma]
[dCGXAqcITouRFxsoMDDY7A][inet[/10.80.0.22:9300]]])

In addition, queries against the first node return results:

[admin@dev3 apache]$ curl -XGET 'http://nodeA:9200/myindex/record/
_search?q=hits:630'
{"took":6301,"timed_out":false,"_shards":{"total":5,"successful":
5,"failed":0},"hits":{"total":95460,"max_s...

On Jan 26, 4:36 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Have the two machines discovered each one? You can see in the logs if they did or not.

On Wednesday, January 26, 2011 at 4:35 PM, barak wrote:

From the thread dumps on the two machines it looks like only one
machine is working (the one that booted last), is this expected? I
used bin/elasticsearch -f -Xmx15g -Xms15g to boot both machines. On
the first machine jstack show only waiting threads, like these:

"New I/O client worker #1-12" daemon prio=10 tid=0x000000005f9de800
nid=0x1971 runnable [0x000000004051d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8dfc7c8> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8dfc7e0> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb7c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(Selec torUtil.java:

at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.j ava:
164)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenami ngRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerR unnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"New I/O server boss #2 ([id: 0x35e80f3a, /0:0:0:0:0:0:0:0:9200])"
daemon prio=10 tid=0x000000005eafa000 nid=0x1960 runnable
[0x00000000426af000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8e02130> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8e02118> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb838> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSi nk
    $Boss.run(NioServerSocketPipelineSink.java:241)
    at
    org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenami ngRunnable.java:

at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerR unnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

while on the second machine there threads doing Lucene stuff:

"myindex[Cutthroat][tp]-pool-1-thread-103" daemon prio=10
tid=0x000000004b1b0000 nid=0x1050 runnable [0x0000000042a7b000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.document.AbstractField.isTokenized(AbstractField.java:
133)
at
org.apache.lucene.analysis.Analyzer.getOffsetGap(Analyzer.java:133)
at
org.elasticsearch.index.analysis.NamedAnalyzer.getOffsetGap(NamedAnalyzer.j ava:
89)
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAn alyzer.java:
66)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFie ld.java:
201)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocField ProcessorPerThread.java:
246)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java :
826)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:
802)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.jav a:
438)

  • locked <0x00002aaab9b8ee10> (a java.lang.Object)
    at
    org.elasticsearch.index.engine.robin.RobinEngine.bulk(RobinEngine.java:

at
org.elasticsearch.index.shard.service.InternalIndexShard.bulk(InternalIndex Shard.java:
257)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnRepl ica(TransportShardBulkAction.java:
237)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication OperationAction.java:
182)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication OperationAction.java:
175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

This is my client: client = new
TransportClient().addTransportAddress( new
InetSocketTransportAddress( "dev1", 9300 ) ).addTransportAddress( new
InetSocketTransportAddress( "dev2", 9300 ) );

Is it me doing something wrong?

On Jan 26, 4:21 pm, barak barak.ya...@gmail.com wrote:

When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).

I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.

On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is the index size get with 100M docs (you get that from the index status command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each doc, which I can't tell). Adding more threads to the mix will help indexing time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the work will be distributed between more shards. Note that if you have a single machine now, and you add another one, then the replicas will be allocated on it, and indexing might actually be a bit slower (since it need to be performed on the replica as well).

Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?

You can try and increase the index.refresh_interval (defaults to 1s), this can help when indexing. There are other low level lucene configuration that we can try, but lets first try what I suggested above (use FS).

BTW, BulkResponse.getTookInMillis() always return 0..

Strange, I will check.

Thanks.


(Barak Yaish) #9

Another question - if I'll break the index to multiple indices, can
you estimate how that would affect the indexing and search?

On Jan 26, 5:07 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If its there, then its working. If you see data getting created on the second node data directory, then your thread dump simply did not catch it while it was doing anything.

On Wednesday, January 26, 2011 at 5:05 PM, barak wrote:

They did, on the first node:

[Dorma] added {[Cutthroat][ExY4anwvS32rCx9zvleMYg][inet[/
10.80.0.23:9300]],}, reason: zen-disco-receive(from node[[Cutthroat]
[ExY4anwvS32rCx9zvleMYg][inet[/10.80.0.23:9300]]])

on the second:

[2011-01-26 09:27:26,743][INFO ][cluster.service ]
[Cutthroat] detected_master [Dorma][dCGXAqcITouRFxsoMDDY7A][inet[/
10.80.0.22:9300]], added {[Dorma][dCGXAqcITouRFxsoMDDY7A][inet[/
10.80.0.22:9300]],}, reason: zen-disco-receive(from [[Dorma]
[dCGXAqcITouRFxsoMDDY7A][inet[/10.80.0.22:9300]]])

In addition, queries against the first node return results:

[admin@dev3 apache]$ curl -XGET 'http://nodeA:9200/myindex/record/
_search?q=hits:630'
{"took":6301,"timed_out":false,"_shards":{"total":5,"successful":
5,"failed":0},"hits":{"total":95460,"max_s...

On Jan 26, 4:36 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Have the two machines discovered each one? You can see in the logs if they did or not.

On Wednesday, January 26, 2011 at 4:35 PM, barak wrote:

From the thread dumps on the two machines it looks like only one
machine is working (the one that booted last), is this expected? I
used bin/elasticsearch -f -Xmx15g -Xms15g to boot both machines. On
the first machine jstack show only waiting threads, like these:

"New I/O client worker #1-12" daemon prio=10 tid=0x000000005f9de800
nid=0x1971 runnable [0x000000004051d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8dfc7c8> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8dfc7e0> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb7c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(Selec torUtil.java:

at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.j ava:
164)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenami ngRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerR unnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"New I/O server boss #2 ([id: 0x35e80f3a, /0:0:0:0:0:0:0:0:9200])"
daemon prio=10 tid=0x000000005eafa000 nid=0x1960 runnable
[0x00000000426af000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8e02130> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8e02118> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb838> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSi nk
    $Boss.run(NioServerSocketPipelineSink.java:241)
    at
    org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenami ngRunnable.java:

at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerR unnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

while on the second machine there threads doing Lucene stuff:

"myindex[Cutthroat][tp]-pool-1-thread-103" daemon prio=10
tid=0x000000004b1b0000 nid=0x1050 runnable [0x0000000042a7b000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.document.AbstractField.isTokenized(AbstractField.java:
133)
at
org.apache.lucene.analysis.Analyzer.getOffsetGap(Analyzer.java:133)
at
org.elasticsearch.index.analysis.NamedAnalyzer.getOffsetGap(NamedAnalyzer.j ava:
89)
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAn alyzer.java:
66)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFie ld.java:
201)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocField ProcessorPerThread.java:
246)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java :
826)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:
802)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.jav a:
438)

  • locked <0x00002aaab9b8ee10> (a java.lang.Object)
    at
    org.elasticsearch.index.engine.robin.RobinEngine.bulk(RobinEngine.java:

at
org.elasticsearch.index.shard.service.InternalIndexShard.bulk(InternalIndex Shard.java:
257)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnRepl ica(TransportShardBulkAction.java:
237)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication OperationAction.java:
182)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication OperationAction.java:
175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

This is my client: client = new
TransportClient().addTransportAddress( new
InetSocketTransportAddress( "dev1", 9300 ) ).addTransportAddress( new
InetSocketTransportAddress( "dev2", 9300 ) );

Is it me doing something wrong?

On Jan 26, 4:21 pm, barak barak.ya...@gmail.com wrote:

When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).

I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.

On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is the index size get with 100M docs (you get that from the index status command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each doc, which I can't tell). Adding more threads to the mix will help indexing time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the work will be distributed between more shards. Note that if you have a single machine now,

...

read more »


(Shay Banon) #10

To increase indexing TPS, you can increase the number of shards (and make sure you have enough machines to make use of it). The number of indices is not really relevant (you can have 1 index with 10 shards, or 10 indices with 1 shard).
On Wednesday, January 26, 2011 at 11:39 PM, barak wrote:

Another question - if I'll break the index to multiple indices, can
you estimate how that would affect the indexing and search?

On Jan 26, 5:07 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If its there, then its working. If you see data getting created on the second node data directory, then your thread dump simply did not catch it while it was doing anything.

On Wednesday, January 26, 2011 at 5:05 PM, barak wrote:

They did, on the first node:

[Dorma] added {[Cutthroat][ExY4anwvS32rCx9zvleMYg][inet[/
10.80.0.23:9300]],}, reason: zen-disco-receive(from node[[Cutthroat]
[ExY4anwvS32rCx9zvleMYg][inet[/10.80.0.23:9300]]])

on the second:

[2011-01-26 09:27:26,743][INFO ][cluster.service ]
[Cutthroat] detected_master [Dorma][dCGXAqcITouRFxsoMDDY7A][inet[/
10.80.0.22:9300]], added {[Dorma][dCGXAqcITouRFxsoMDDY7A][inet[/
10.80.0.22:9300]],}, reason: zen-disco-receive(from [[Dorma]
[dCGXAqcITouRFxsoMDDY7A][inet[/10.80.0.22:9300]]])

In addition, queries against the first node return results:

[admin@dev3 apache]$ curl -XGET 'http://nodeA:9200/myindex/record/
_search?q=hits:630'
{"took":6301,"timed_out":false,"_shards":{"total":5,"successful":
5,"failed":0},"hits":{"total":95460,"max_s...

On Jan 26, 4:36 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Have the two machines discovered each one? You can see in the logs if they did or not.

On Wednesday, January 26, 2011 at 4:35 PM, barak wrote:

From the thread dumps on the two machines it looks like only one
machine is working (the one that booted last), is this expected? I
used bin/elasticsearch -f -Xmx15g -Xms15g to boot both machines. On
the first machine jstack show only waiting threads, like these:

"New I/O client worker #1-12" daemon prio=10 tid=0x000000005f9de800
nid=0x1971 runnable [0x000000004051d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8dfc7c8> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8dfc7e0> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb7c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(Selec torUtil.java:

at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.j ava:
164)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenami ngRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerR unnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"New I/O server boss #2 ([id: 0x35e80f3a, /0:0:0:0:0:0:0:0:9200])"
daemon prio=10 tid=0x000000005eafa000 nid=0x1960 runnable
[0x00000000426af000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)

  • locked <0x00002aaab8e02130> (a sun.nio.ch.Util$1)
  • locked <0x00002aaab8e02118> (a java.util.Collections
    $UnmodifiableSet)
  • locked <0x00002aaab8dfb838> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSi nk
    $Boss.run(NioServerSocketPipelineSink.java:241)
    at
    org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenami ngRunnable.java:

at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerR unnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

while on the second machine there threads doing Lucene stuff:

"myindex[Cutthroat][tp]-pool-1-thread-103" daemon prio=10
tid=0x000000004b1b0000 nid=0x1050 runnable [0x0000000042a7b000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.document.AbstractField.isTokenized(AbstractField.java:
133)
at
org.apache.lucene.analysis.Analyzer.getOffsetGap(Analyzer.java:133)
at
org.elasticsearch.index.analysis.NamedAnalyzer.getOffsetGap(NamedAnalyzer.j ava:
89)
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAn alyzer.java:
66)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFie ld.java:
201)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocField ProcessorPerThread.java:
246)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java :
826)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:
802)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.jav a:
438)

  • locked <0x00002aaab9b8ee10> (a java.lang.Object)
    at
    org.elasticsearch.index.engine.robin.RobinEngine.bulk(RobinEngine.java:

at
org.elasticsearch.index.shard.service.InternalIndexShard.bulk(InternalIndex Shard.java:
257)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnRepl ica(TransportShardBulkAction.java:
237)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication OperationAction.java:
182)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication OperationAction.java:
175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

This is my client: client = new
TransportClient().addTransportAddress( new
InetSocketTransportAddress( "dev1", 9300 ) ).addTransportAddress( new
InetSocketTransportAddress( "dev2", 9300 ) );

Is it me doing something wrong?

On Jan 26, 4:21 pm, barak barak.ya...@gmail.com wrote:

When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).

I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.

On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:

Hello,

I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)

Can you try with default index storage (FS) and see how it goes? How big is the index size get with 100M docs (you get that from the index status command).

and using java bulk api for insertions (bulk size is 100000). I

Can you try and lower the bulk size? (It really depends on the size of each doc, which I can't tell). Adding more threads to the mix will help indexing time.

noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?

Yes, adding more machines to the mix will help indexing speed, since the work will be distributed between more shards. Note that if you have a single machine now,

...

read more »


(system) #11