Slow Bulk Insert

Hi Radu,

Thanks for the reply this was extremely interesting, regarding the slow
indexing i m running this locally on my development machine which has 4GB
of RAM and allocating 1GB for Elastic search and as you said i can see a
high amount of I/O and CPU usage. I was just testing stuff before i try
them out on the actual server.

So the server has 128GB of RAM
So i shoud allocate 64GB to Elastic search but how much should i allocate
for index_buffer_size?
also would it be ideal to allocate lets say min_index_buffer_size 10% and max_index_buffer_size
50%
Or would it be ideal to put index_buffer_size to something like 50%
Or to the other extreme put indices.memory.min_shard_index_buffer_size 10%
(which would imply a total usage of roughly 50%)

Also as regards to bulk updating do you suggest i turn off
the refresh_interval bulk insert and turn it back on then run an optimize
with segment = 5?

As Regards segments_per_tier what is being referred to by tier? and also
what would be the ideal number to maximise insert speeds?

Also once bulk inserting has been completed can i dan retweek these setting
to increase search speed instead of insert speed?

On Thursday, 31 January 2013 08:39:52 UTC+1, Radu Gheorghe wrote:

Hello Shawn,

The first thing I'd do is to monitor and see what's the bottleneck.
Initial suspects are CPU and I/O (which also includes high CPU usage by I/O
waits). It also depends on how many documents are already in there - if you
want your performance tests to be accurate you should have the same
starting point.

The rule of thumb is to allocate 50% of your total RAM to ES. I'm not sure
if you already did that, because I didn't see how much RAM you have.

Regarding refresh_interval, make sure it's applied: the index-specific
setting will override the one that's in the configuration. So I'd suggest
you try updating those settings via the Indices Update Settings API:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Some other things that might help are to increase the thresholds for the
transaction log:
Elasticsearch Platform — Find real-time answers at scale | Elastic

And to increase the index_buffer_size:
Elasticsearch Platform — Find real-time answers at scale | Elastic

As for the merge policy, tuning it for more segments will trade some
search performance for indexing performance. But increasing the
floor_segment size is going to create more concurrent merging, especially
coupled with higher max_merge_at_once* settings. So I'd only increase
segments_per_tier.

If you still have too much stress on I/O, you can try throttling merges
some more:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 5:12 PM, Shawn Ritchie <xrit...@gmail.com<javascript:>

wrote:

Some additional information tweaking around i set

index.refresh_interval: 3600s
index.merge.policy.floor_segment: 200mb
index.merge.policy.max_merge_at_once: 128
index.merge.policy.segments_per_tier: 256

Produces the following logs in the slow merge logs

[2013-01-30 16:08:00,871][TRACE][index.merge.scheduler ] [cluster]
[companies][0] merge [_3l] starting..., merging [128] segments, [529] docs,
[108.5mb] size, into [108.5mb] estimated_size
[2013-01-30 16:08:10,066][TRACE][index.merge.scheduler ] [cluster]
[companies][0] merge [_3l] done, took [9.1s]

maybe if could help shed some light. Basically the fastest i got it to go
is aorund 11 seconds per 1000 records.

On Wednesday, 30 January 2013 12:34:44 UTC+1, Shawn Ritchie wrote:

Just some additional information i'm Posting the method in the following
manner (I do not think it should be chunking my HTTP request)

try
{
string URL = "http://localhost:9200/_bulk/";

            HttpWebRequest request = 

(HttpWebRequest)WebRequest.Create(URL);
request.Method = "POST";
request.Timeout = System.Threading.Timeout.Infinite;
request.ContentType =
"application/x-www-form-urlencoded";
//request.ContentLength = data == null ? 0 : data.Length;
//StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8);
StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8,
(104857600));
requestWriter.Write(data);
requestWriter.Close();

            try
            {
                WebResponse webResponse = request.GetResponse();
                //Stream webStream = webResponse.GetResponseStream();
                //StreamReader responseReader = new 

StreamReader(webStream);
//responseReader.Close();
}
catch (WebException)
{
throw;
}
catch (Exception)
{
throw;
}
}
catch (Exception)
{
throw;
}

On Wednesday, 30 January 2013 11:43:38 UTC+1, Shawn Ritchie wrote:

Hi,

The machine is basically my local machine using Elasticsearch
version elasticsearch-0.19.11 AND jre version 7

I do have some custom settings;
node.name: "Test"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 3
index.number_of_replicas: 0
index.refresh_interval: 1s
index.merge.policy.floor_segment: 100mb
index.merge.policy.max_merge_at_once: 100
index.merge.policy.segments_per_tier: 200
bootstrap.mlockall: true
index.search.slowlog.level: TRACE
index.search.slowlog.threshold.query.warn: 200ms
index.search.slowlog.threshold.query.info: 200ms
index.search.slowlog.threshold.query.debug: 200ms
index.search.slowlog.threshold.query.trace: 200ms
index.search.slowlog.threshold.fetch.warn: 200ms
index.search.slowlog.threshold.fetch.info: 200ms
index.search.slowlog.threshold.fetch.debug: 200ms
index.search.slowlog.threshold.fetch.trace: 200ms
monitor.jvm.gc.ParNew.warn: 1000ms
monitor.jvm.gc.ParNew.info: 700ms
monitor.jvm.gc.ParNew.debug: 400ms
monitor.jvm.gc.ConcurrentMarkSweep.warn: 1s
monitor.jvm.gc.ConcurrentMarkSweep.info: 1s
monitor.jvm.gc.ConcurrentMarkSweep.debug: 1s

Basically i'm tesing everything out before i try them out on the server.

Regards
Shawn

On Wednesday, 30 January 2013 11:29:10 UTC+1, Radu Gheorghe wrote:

Hello Shawn,

I don't know why that happens from what you described, but maybe some
clues will pop up if you say some more about your cluster and how you're
indexing. For example:

  • how many nodes, shards, replicas. What sort of hardware are your
    nodes running on
  • how often you're refreshing
  • other non-default settings

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 12:18 PM, Shawn Ritchie xrit...@gmail.comwrote:

Hi guys

I'm trying to bulk insert batches of 1000 documents into elastic
search using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the
slow merge logs. As a side note were trying to optimize performance so we
have some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.