Answers inline.
Regarding the slow I/O. When I analyzed the creation of the Lucene index
files I see that they are created without any special flags (such as no
buffering or write through). This means that we’re paying costs twice –
when we write the file we’re going cache data in Windows’ Cache Manager,
which takes a lot of memory (which is then not available to the application
itself) but when we read the file we don’t actually read it using the
cache, which makes the operation slow. Any ideas?
On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote:
Shooting in the dark here, but here it goes:
- Do you have anything else running on the system? for example AVs are
known to cause slow-downs for such services, and other I/O or memory heavy
services could cause thrashing or just general slowdown
No, nothing else is running on that machine. Initially it was working fast
it got slower with that amount of data that in index. Moreover is there a
way to increase buffer size for the Lucene index files (.tim, .doc, and
.pos) from 8K to something much bigger.
- What JVM version are you running this with?
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
OS_NAME="Windows"
OS_VERSION="5.2"
OS_ARCH="amd64"
- If you changed any of the default settings for merge factors etc - can
you revert that and try again?
Tried before was same behavior.
- Can you try with embedded=false and see if it makes a difference?
Tried before was same behavior.
--
Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/
On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman <ikess...@gmail.com<javascript:>
wrote:
Hi,
I have configured a single node ES with logstash 1.4.0 (8GB memory) with
the following configuration:
index.number_of_shards: 7
number_of_replicas: 0
refresh_interval: -1
translog.flush_threshold_ops: 100000
merge.policy.merge_factor: 30
codec.bloom.load: false
min_shard_index_buffer_size: 12m
compound_format : true
indices.fielddata.cache.size: 15%
indices.fielddata.cache.expire: 5m
indices.cache.filter.size: 15%
indices.cache.filter.expire: 5m
Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ.
OS: 64bit WindowsServer 2012 R2
My raw data is CSV file and I use grok as a filter to parse it with
output configuration (elasticsearch { embedded => true flush_size =>
100000 idle_flush_time => 30 }).
Row data size is about 100GB events per day which ES tries to input into
one index (with 7 shards).
At the beginning the insert was fast however after a while
it's got extremely slow, 1.5K doc in 8K seconds
Currently the index has around 140Million docs with size of 55GB.
When I have analyzed the write to the disk with ProcMon I have seen that
the process is writing in an interleaved manner to three kinds of files
(.tim, .doc, and .pos) in 4K and 8K segments, instead of batching writes to
some reasonable number.
Appreciate the help.
All the best,
Yitzhak
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6c4207d-8858-4440-81d6-61298fa5b0e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.