Have the two machines discovered each one? You can see in the logs if
they did or not.
On Wednesday, January 26, 2011 at 4:35 PM, barak wrote:
From the thread dumps on the two machines it looks like only one
machine is working (the one that booted last), is this expected? I
used bin/elasticsearch -f -Xmx15g -Xms15g to boot both machines. On
the first machine jstack show only waiting threads, like these:
"New I/O client worker #1-12" daemon prio=10 tid=0x000000005f9de800
nid=0x1971 runnable [0x000000004051d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)
- locked <0x00002aaab8dfc7c8> (a sun.nio.ch.Util$$1)
- locked <0x00002aaab8dfc7e0> (a java.util.Collections
$$UnmodifiableSet)
- locked <0x00002aaab8dfb7c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:
38)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:
164)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
"New I/O server boss #2 ([id: 0x35e80f3a, /0:0:0:0:0:0:0:0:9200])"
daemon prio=10 tid=0x000000005eafa000 nid=0x1960 runnable
[0x00000000426af000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
210)
at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
69)
- locked <0x00002aaab8e02130> (a sun.nio.ch.Util$1)
- locked <0x00002aaab8e02118> (a java.util.Collections
$UnmodifiableSet)
- locked <0x00002aaab8dfb838> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink
$Boss.run(NioServerSocketPipelineSink.java:241)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at
org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:
46)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
while on the second machine there threads doing Lucene stuff:
"myindex[Cutthroat][tp]-pool-1-thread-103" daemon prio=10
tid=0x000000004b1b0000 nid=0x1050 runnable [0x0000000042a7b000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.document.AbstractField.isTokenized(AbstractField.java:
133)
at
org.apache.lucene.analysis.Analyzer.getOffsetGap(Analyzer.java:133)
at
org.elasticsearch.index.analysis.NamedAnalyzer.getOffsetGap(NamedAnalyzer.java:
89)
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAnalyzer.java:
66)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
201)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
246)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
826)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriiter.java:
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:
438)
- locked <0x00002aaab9b8ee10> (a java.lang.Object)
atorg.elasticsearch.index.engine.robin.RobinEngine.bulk(RobinEngine.java:
at
org.elasticsearch.index.shard.service.InternalIndexShard.bulk(InternalIndexShard.java:
257)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:
237)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:
182)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:
175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
This is my client: client = new
TransportClient().addTransportAddress( new
InetSocketTransportAddress( "dev1", 9300 ) ).addTransportAddress( new
InetSocketTransportAddress( "dev2", 9300 ) );
Is it me doing something wrong?
On Jan 26, 4:21 pm, barak barak.ya...@gmail.com wrote:
When I used the memory index storage, index size of 100M docs was
54.3gb (res.getIndices().get( "myindex" ).getStoreSize()).
I started the indexing again on 2 machines ( old data was deleted ),
with the fs index storage (same bulk). After indexing 95M docs index
size is 77.2gb. Indexing speed is a bit faster then before, I'll check
how bulk size and number of threads will affect.
On Jan 26, 12:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:
On Wednesday, January 26, 2011 at 12:12 AM, barak wrote:
Hello,
I'm using elasticsearch-0.15.0-SNAPSHOT. I would like to check how ES
performs when number of documents is about 300 millions. Each document
contains 5 numeric and string fields and array of 2 fields elements.
I've started a node with "-Xmx15g -Xms15g -
Des.index.storage.type=memory" (available memory on machine is 32G)
Can you try with default index storage (FS) and see how it goes? How big is
the index size get with 100M docs (you get that from the index status
command).
and using java bulk api for insertions (bulk size is 100000). I
Can you try and lower the bulk size? (It really depends on the size of each
doc, which I can't tell). Adding more threads to the mix will help indexing
time.
noticed that after 100M docs indexed the insert operation become
slower and slower, I guess this is expected, but would duration of 30
and more seconds be normal?
No exceptions seen in server or client side. Is adding more machines
will make the indexing faster?
Yes, adding more machines to the mix will help indexing speed, since the
work will be distributed between more shards. Note that if you have a single
machine now, and you add another one, then the replicas will be allocated on
it, and indexing might actually be a bit slower (since it need to be
performed on the replica as well).
Any other configs I missed? Is there a
way to determine where most of the indexing time is spent?
You can try and increase the index.refresh_interval (defaults to 1s), this
can help when indexing. There are other low level lucene configuration that
we can try, but lets first try what I suggested above (use FS).
BTW, BulkResponse.getTookInMillis() always return 0..
Strange, I will check.
Thanks.