Slow Bulk Insert

Hi guys

I'm trying to bulk insert batches of 1000 documents into elastic search
using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the slow
merge logs. As a side note were trying to optimize performance so we have
some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

1 Like

Hello Shawn,

I don't know why that happens from what you described, but maybe some clues
will pop up if you say some more about your cluster and how you're
indexing. For example:

  • how many nodes, shards, replicas. What sort of hardware are your nodes
    running on
  • how often you're refreshing
  • other non-default settings

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 12:18 PM, Shawn Ritchie xritchie@gmail.com wrote:

Hi guys

I'm trying to bulk insert batches of 1000 documents into Elasticsearch
using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the slow
merge logs. As a side note were trying to optimize performance so we have
some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Forgot to give you the link to the response from the bulk insert. (
http://pastebin.com/RNL8x796 )

I also tried turning of the interval rate (-1) but this still produced the
same results.

On Wednesday, 30 January 2013 11:18:24 UTC+1, Shawn Ritchie wrote:

Hi guys

I'm trying to bulk insert batches of 1000 documents into Elasticsearch
using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the slow
merge logs. As a side note were trying to optimize performance so we have
some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

The machine is basically my local machine using Elasticsearch
version elasticsearch-0.19.11 AND jre version 7

I do have some custom settings;
node.name: "Test"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 3
index.number_of_replicas: 0
index.refresh_interval: 1s
index.merge.policy.floor_segment: 100mb
index.merge.policy.max_merge_at_once: 100
index.merge.policy.segments_per_tier: 200
bootstrap.mlockall: true
index.search.slowlog.level: TRACE
index.search.slowlog.threshold.query.warn: 200ms
index.search.slowlog.threshold.query.info: 200ms
index.search.slowlog.threshold.query.debug: 200ms
index.search.slowlog.threshold.query.trace: 200ms
index.search.slowlog.threshold.fetch.warn: 200ms
index.search.slowlog.threshold.fetch.info: 200ms
index.search.slowlog.threshold.fetch.debug: 200ms
index.search.slowlog.threshold.fetch.trace: 200ms
monitor.jvm.gc.ParNew.warn: 1000ms
monitor.jvm.gc.ParNew.info: 700ms
monitor.jvm.gc.ParNew.debug: 400ms
monitor.jvm.gc.ConcurrentMarkSweep.warn: 1s
monitor.jvm.gc.ConcurrentMarkSweep.info: 1s
monitor.jvm.gc.ConcurrentMarkSweep.debug: 1s

Basically i'm tesing everything out before i try them out on the server.

Regards
Shawn

On Wednesday, 30 January 2013 11:29:10 UTC+1, Radu Gheorghe wrote:

Hello Shawn,

I don't know why that happens from what you described, but maybe some
clues will pop up if you say some more about your cluster and how you're
indexing. For example:

  • how many nodes, shards, replicas. What sort of hardware are your nodes
    running on
  • how often you're refreshing
  • other non-default settings

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 12:18 PM, Shawn Ritchie <xrit...@gmail.com<javascript:>

wrote:

Hi guys

I'm trying to bulk insert batches of 1000 documents into Elasticsearch
using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the slow
merge logs. As a side note were trying to optimize performance so we have
some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Just some additional information i'm Posting the method in the following
manner (I do not think it should be chunking my HTTP request)

try
{
string URL = "http://localhost:9200/_bulk/";

            HttpWebRequest request = 

(HttpWebRequest)WebRequest.Create(URL);
request.Method = "POST";
request.Timeout = System.Threading.Timeout.Infinite;
request.ContentType = "application/x-www-form-urlencoded";
//request.ContentLength = data == null ? 0 : data.Length;
//StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8);
StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8,
(104857600));
requestWriter.Write(data);
requestWriter.Close();

            try
            {
                WebResponse webResponse = request.GetResponse();
                //Stream webStream = webResponse.GetResponseStream();
                //StreamReader responseReader = new 

StreamReader(webStream);
//responseReader.Close();
}
catch (WebException)
{
throw;
}
catch (Exception)
{
throw;
}
}
catch (Exception)
{
throw;
}

On Wednesday, 30 January 2013 11:43:38 UTC+1, Shawn Ritchie wrote:

Hi,

The machine is basically my local machine using Elasticsearch
version elasticsearch-0.19.11 AND jre version 7

I do have some custom settings;
node.name: "Test"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 3
index.number_of_replicas: 0
index.refresh_interval: 1s
index.merge.policy.floor_segment: 100mb
index.merge.policy.max_merge_at_once: 100
index.merge.policy.segments_per_tier: 200
bootstrap.mlockall: true
index.search.slowlog.level: TRACE
index.search.slowlog.threshold.query.warn: 200ms
index.search.slowlog.threshold.query.info: 200ms
index.search.slowlog.threshold.query.debug: 200ms
index.search.slowlog.threshold.query.trace: 200ms
index.search.slowlog.threshold.fetch.warn: 200ms
index.search.slowlog.threshold.fetch.info: 200ms
index.search.slowlog.threshold.fetch.debug: 200ms
index.search.slowlog.threshold.fetch.trace: 200ms
monitor.jvm.gc.ParNew.warn: 1000ms
monitor.jvm.gc.ParNew.info: 700ms
monitor.jvm.gc.ParNew.debug: 400ms
monitor.jvm.gc.ConcurrentMarkSweep.warn: 1s
monitor.jvm.gc.ConcurrentMarkSweep.info: 1s
monitor.jvm.gc.ConcurrentMarkSweep.debug: 1s

Basically i'm tesing everything out before i try them out on the server.

Regards
Shawn

On Wednesday, 30 January 2013 11:29:10 UTC+1, Radu Gheorghe wrote:

Hello Shawn,

I don't know why that happens from what you described, but maybe some
clues will pop up if you say some more about your cluster and how you're
indexing. For example:

  • how many nodes, shards, replicas. What sort of hardware are your nodes
    running on
  • how often you're refreshing
  • other non-default settings

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 12:18 PM, Shawn Ritchie xrit...@gmail.comwrote:

Hi guys

I'm trying to bulk insert batches of 1000 documents into Elasticsearch
using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the
slow merge logs. As a side note were trying to optimize performance so we
have some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Some additional information tweaking around i set

index.refresh_interval: 3600s
index.merge.policy.floor_segment: 200mb
index.merge.policy.max_merge_at_once: 128
index.merge.policy.segments_per_tier: 256

Produces the following logs in the slow merge logs

[2013-01-30 16:08:00,871][TRACE][index.merge.scheduler ] [cluster]
[companies][0] merge [_3l] starting..., merging [128] segments, [529] docs,
[108.5mb] size, into [108.5mb] estimated_size
[2013-01-30 16:08:10,066][TRACE][index.merge.scheduler ] [cluster]
[companies][0] merge [_3l] done, took [9.1s]

maybe if could help shed some light. Basically the fastest i got it to go
is aorund 11 seconds per 1000 records.

On Wednesday, 30 January 2013 12:34:44 UTC+1, Shawn Ritchie wrote:

Just some additional information i'm Posting the method in the following
manner (I do not think it should be chunking my HTTP request)

try
{
string URL = "http://localhost:9200/_bulk/";

            HttpWebRequest request = 

(HttpWebRequest)WebRequest.Create(URL);
request.Method = "POST";
request.Timeout = System.Threading.Timeout.Infinite;
request.ContentType = "application/x-www-form-urlencoded";
//request.ContentLength = data == null ? 0 : data.Length;
//StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8);
StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8,
(104857600));
requestWriter.Write(data);
requestWriter.Close();

            try
            {
                WebResponse webResponse = request.GetResponse();
                //Stream webStream = webResponse.GetResponseStream();
                //StreamReader responseReader = new 

StreamReader(webStream);
//responseReader.Close();
}
catch (WebException)
{
throw;
}
catch (Exception)
{
throw;
}
}
catch (Exception)
{
throw;
}

On Wednesday, 30 January 2013 11:43:38 UTC+1, Shawn Ritchie wrote:

Hi,

The machine is basically my local machine using Elasticsearch
version elasticsearch-0.19.11 AND jre version 7

I do have some custom settings;
node.name: "Test"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 3
index.number_of_replicas: 0
index.refresh_interval: 1s
index.merge.policy.floor_segment: 100mb
index.merge.policy.max_merge_at_once: 100
index.merge.policy.segments_per_tier: 200
bootstrap.mlockall: true
index.search.slowlog.level: TRACE
index.search.slowlog.threshold.query.warn: 200ms
index.search.slowlog.threshold.query.info: 200ms
index.search.slowlog.threshold.query.debug: 200ms
index.search.slowlog.threshold.query.trace: 200ms
index.search.slowlog.threshold.fetch.warn: 200ms
index.search.slowlog.threshold.fetch.info: 200ms
index.search.slowlog.threshold.fetch.debug: 200ms
index.search.slowlog.threshold.fetch.trace: 200ms
monitor.jvm.gc.ParNew.warn: 1000ms
monitor.jvm.gc.ParNew.info: 700ms
monitor.jvm.gc.ParNew.debug: 400ms
monitor.jvm.gc.ConcurrentMarkSweep.warn: 1s
monitor.jvm.gc.ConcurrentMarkSweep.info: 1s
monitor.jvm.gc.ConcurrentMarkSweep.debug: 1s

Basically i'm tesing everything out before i try them out on the server.

Regards
Shawn

On Wednesday, 30 January 2013 11:29:10 UTC+1, Radu Gheorghe wrote:

Hello Shawn,

I don't know why that happens from what you described, but maybe some
clues will pop up if you say some more about your cluster and how you're
indexing. For example:

  • how many nodes, shards, replicas. What sort of hardware are your nodes
    running on
  • how often you're refreshing
  • other non-default settings

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 12:18 PM, Shawn Ritchie xrit...@gmail.comwrote:

Hi guys

I'm trying to bulk insert batches of 1000 documents into Elasticsearch
using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the
slow merge logs. As a side note were trying to optimize performance so we
have some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Shawn,

The first thing I'd do is to monitor and see what's the bottleneck. Initial
suspects are CPU and I/O (which also includes high CPU usage by I/O waits).
It also depends on how many documents are already in there - if you want
your performance tests to be accurate you should have the same starting
point.

The rule of thumb is to allocate 50% of your total RAM to ES. I'm not sure
if you already did that, because I didn't see how much RAM you have.

Regarding refresh_interval, make sure it's applied: the index-specific
setting will override the one that's in the configuration. So I'd suggest
you try updating those settings via the Indices Update Settings API:

Some other things that might help are to increase the thresholds for the
transaction log:

And to increase the index_buffer_size:

As for the merge policy, tuning it for more segments will trade some search
performance for indexing performance. But increasing the floor_segment size
is going to create more concurrent merging, especially coupled with higher
max_merge_at_once* settings. So I'd only increase segments_per_tier.

If you still have too much stress on I/O, you can try throttling merges
some more:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 5:12 PM, Shawn Ritchie xritchie@gmail.com wrote:

Some additional information tweaking around i set

index.refresh_interval: 3600s
index.merge.policy.floor_segment: 200mb
index.merge.policy.max_merge_at_once: 128
index.merge.policy.segments_per_tier: 256

Produces the following logs in the slow merge logs

[2013-01-30 16:08:00,871][TRACE][index.merge.scheduler ] [cluster]
[companies][0] merge [_3l] starting..., merging [128] segments, [529] docs,
[108.5mb] size, into [108.5mb] estimated_size
[2013-01-30 16:08:10,066][TRACE][index.merge.scheduler ] [cluster]
[companies][0] merge [_3l] done, took [9.1s]

maybe if could help shed some light. Basically the fastest i got it to go
is aorund 11 seconds per 1000 records.

On Wednesday, 30 January 2013 12:34:44 UTC+1, Shawn Ritchie wrote:

Just some additional information i'm Posting the method in the following
manner (I do not think it should be chunking my HTTP request)

try
{
string URL = "http://localhost:9200/_bulk/";

            HttpWebRequest request =

(HttpWebRequest)WebRequest.Create(URL);
request.Method = "POST";
request.Timeout = System.Threading.Timeout.Infinite;
request.ContentType = "application/x-www-form-urlencoded";
//request.ContentLength = data == null ? 0 : data.Length;
//StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8);
StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8,
(104857600));
requestWriter.Write(data);
requestWriter.Close();

            try
            {
                WebResponse webResponse = request.GetResponse();
                //Stream webStream = webResponse.GetResponseStream();
                //StreamReader responseReader = new

StreamReader(webStream);
//responseReader.Close();
}
catch (WebException)
{
throw;
}
catch (Exception)
{
throw;
}
}
catch (Exception)
{
throw;
}

On Wednesday, 30 January 2013 11:43:38 UTC+1, Shawn Ritchie wrote:

Hi,

The machine is basically my local machine using Elasticsearch
version elasticsearch-0.19.11 AND jre version 7

I do have some custom settings;
node.name: "Test"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 3
index.number_of_replicas: 0
index.refresh_interval: 1s
index.merge.policy.floor_segment: 100mb
index.merge.policy.max_merge_at_once: 100
index.merge.policy.segments_per_tier: 200
bootstrap.mlockall: true
index.search.slowlog.level: TRACE
index.search.slowlog.threshold.query.warn: 200ms
index.search.slowlog.threshold.query.info: 200ms
index.search.slowlog.threshold.query.debug: 200ms
index.search.slowlog.threshold.query.trace: 200ms
index.search.slowlog.threshold.fetch.warn: 200ms
index.search.slowlog.threshold.fetch.info: 200ms
index.search.slowlog.threshold.fetch.debug: 200ms
index.search.slowlog.threshold.fetch.trace: 200ms
monitor.jvm.gc.ParNew.warn: 1000ms
monitor.jvm.gc.ParNew.info: 700ms
monitor.jvm.gc.ParNew.debug: 400ms
monitor.jvm.gc.ConcurrentMarkSweep.warn: 1s
monitor.jvm.gc.ConcurrentMarkSweep.info: 1s
monitor.jvm.gc.ConcurrentMarkSweep.debug: 1s

Basically i'm tesing everything out before i try them out on the server.

Regards
Shawn

On Wednesday, 30 January 2013 11:29:10 UTC+1, Radu Gheorghe wrote:

Hello Shawn,

I don't know why that happens from what you described, but maybe some
clues will pop up if you say some more about your cluster and how you're
indexing. For example:

  • how many nodes, shards, replicas. What sort of hardware are your
    nodes running on
  • how often you're refreshing
  • other non-default settings

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 12:18 PM, Shawn Ritchie xrit...@gmail.comwrote:

Hi guys

I'm trying to bulk insert batches of 1000 documents into elastic
search using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the
slow merge logs. As a side note were trying to optimize performance so we
have some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Radu,

Thanks for the reply this was extremely interesting, regarding the slow
indexing i m running this locally on my development machine which has 4GB
of RAM and allocating 1GB for Elastic search and as you said i can see a
high amount of I/O and CPU usage. I was just testing stuff before i try
them out on the actual server.

So the server has 128GB of RAM
So i shoud allocate 64GB to Elastic search but how much should i allocate
for index_buffer_size?
also would it be ideal to allocate lets say min_index_buffer_size 10% and max_index_buffer_size
50%
Or would it be ideal to put index_buffer_size to something like 50%
Or to the other extreme put indices.memory.min_shard_index_buffer_size 10%
(which would imply a total usage of roughly 50%)

Also as regards to bulk updating do you suggest i turn off
the refresh_interval bulk insert and turn it back on then run an optimize
with segment = 5?

As Regards segments_per_tier what is being referred to by tier? and also
what would be the ideal number to maximise insert speeds?

Also once bulk inserting has been completed can i dan retweek these setting
to increase search speed instead of insert speed?

On Thursday, 31 January 2013 08:39:52 UTC+1, Radu Gheorghe wrote:

Hello Shawn,

The first thing I'd do is to monitor and see what's the bottleneck.
Initial suspects are CPU and I/O (which also includes high CPU usage by I/O
waits). It also depends on how many documents are already in there - if you
want your performance tests to be accurate you should have the same
starting point.

The rule of thumb is to allocate 50% of your total RAM to ES. I'm not sure
if you already did that, because I didn't see how much RAM you have.

Regarding refresh_interval, make sure it's applied: the index-specific
setting will override the one that's in the configuration. So I'd suggest
you try updating those settings via the Indices Update Settings API:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Some other things that might help are to increase the thresholds for the
transaction log:
Elasticsearch Platform — Find real-time answers at scale | Elastic

And to increase the index_buffer_size:
Elasticsearch Platform — Find real-time answers at scale | Elastic

As for the merge policy, tuning it for more segments will trade some
search performance for indexing performance. But increasing the
floor_segment size is going to create more concurrent merging, especially
coupled with higher max_merge_at_once* settings. So I'd only increase
segments_per_tier.

If you still have too much stress on I/O, you can try throttling merges
some more:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 5:12 PM, Shawn Ritchie <xrit...@gmail.com<javascript:>

wrote:

Some additional information tweaking around i set

index.refresh_interval: 3600s
index.merge.policy.floor_segment: 200mb
index.merge.policy.max_merge_at_once: 128
index.merge.policy.segments_per_tier: 256

Produces the following logs in the slow merge logs

[2013-01-30 16:08:00,871][TRACE][index.merge.scheduler ] [cluster]
[companies][0] merge [_3l] starting..., merging [128] segments, [529] docs,
[108.5mb] size, into [108.5mb] estimated_size
[2013-01-30 16:08:10,066][TRACE][index.merge.scheduler ] [cluster]
[companies][0] merge [_3l] done, took [9.1s]

maybe if could help shed some light. Basically the fastest i got it to go
is aorund 11 seconds per 1000 records.

On Wednesday, 30 January 2013 12:34:44 UTC+1, Shawn Ritchie wrote:

Just some additional information i'm Posting the method in the following
manner (I do not think it should be chunking my HTTP request)

try
{
string URL = "http://localhost:9200/_bulk/";

            HttpWebRequest request = 

(HttpWebRequest)WebRequest.Create(URL);
request.Method = "POST";
request.Timeout = System.Threading.Timeout.Infinite;
request.ContentType =
"application/x-www-form-urlencoded";
//request.ContentLength = data == null ? 0 : data.Length;
//StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8);
StreamWriter requestWriter = new
StreamWriter(request.GetRequestStream(), System.Text.Encoding.UTF8,
(104857600));
requestWriter.Write(data);
requestWriter.Close();

            try
            {
                WebResponse webResponse = request.GetResponse();
                //Stream webStream = webResponse.GetResponseStream();
                //StreamReader responseReader = new 

StreamReader(webStream);
//responseReader.Close();
}
catch (WebException)
{
throw;
}
catch (Exception)
{
throw;
}
}
catch (Exception)
{
throw;
}

On Wednesday, 30 January 2013 11:43:38 UTC+1, Shawn Ritchie wrote:

Hi,

The machine is basically my local machine using Elasticsearch
version elasticsearch-0.19.11 AND jre version 7

I do have some custom settings;
node.name: "Test"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 3
index.number_of_replicas: 0
index.refresh_interval: 1s
index.merge.policy.floor_segment: 100mb
index.merge.policy.max_merge_at_once: 100
index.merge.policy.segments_per_tier: 200
bootstrap.mlockall: true
index.search.slowlog.level: TRACE
index.search.slowlog.threshold.query.warn: 200ms
index.search.slowlog.threshold.query.info: 200ms
index.search.slowlog.threshold.query.debug: 200ms
index.search.slowlog.threshold.query.trace: 200ms
index.search.slowlog.threshold.fetch.warn: 200ms
index.search.slowlog.threshold.fetch.info: 200ms
index.search.slowlog.threshold.fetch.debug: 200ms
index.search.slowlog.threshold.fetch.trace: 200ms
monitor.jvm.gc.ParNew.warn: 1000ms
monitor.jvm.gc.ParNew.info: 700ms
monitor.jvm.gc.ParNew.debug: 400ms
monitor.jvm.gc.ConcurrentMarkSweep.warn: 1s
monitor.jvm.gc.ConcurrentMarkSweep.info: 1s
monitor.jvm.gc.ConcurrentMarkSweep.debug: 1s

Basically i'm tesing everything out before i try them out on the server.

Regards
Shawn

On Wednesday, 30 January 2013 11:29:10 UTC+1, Radu Gheorghe wrote:

Hello Shawn,

I don't know why that happens from what you described, but maybe some
clues will pop up if you say some more about your cluster and how you're
indexing. For example:

  • how many nodes, shards, replicas. What sort of hardware are your
    nodes running on
  • how often you're refreshing
  • other non-default settings

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 12:18 PM, Shawn Ritchie xrit...@gmail.comwrote:

Hi guys

I'm trying to bulk insert batches of 1000 documents into elastic
search using a predefined Mapping. Yet each bulk insert takes roughly 15-20
seconds any idea why?

Predfined Mapping -> http://pastebin.com/j1Guxj7p
Sample Bulk Insert Record -> http://pastebin.com/w0NmG4gD

Slow query logs aren't showing any abnormalities and neither are the
slow merge logs. As a side note were trying to optimize performance so we
have some custom settings

Java Heap set to min/max 1GB

[2013-01-30 11:00:48,275][TRACE][index.merge.scheduler ] [Test]
[companies][2] merge [_4f] starting..., merging [100] segments, [387] docs,
[75.2mb] size, into [75.2mb] estimated_size
[2013-01-30 11:04:56,583][DEBUG][index.merge.policy ] [Test]
[companies][2] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[100mb],
max_merge_at_once[100], max_merge_at_once_explicit[30],
max_merged_segment[5gb], segments_per_tier[200.0],
reclaim_deletes_weight[2.0], async_merge[true]
[2013-01-30 11:04:56,583][DEBUG][index.merge.scheduler ] [Test]
[companies][2] using [concurrent] merge scheduler with max_thread_count[3]

Any help is greatly appretiated.

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Shawn,

On Thu, Jan 31, 2013 at 4:58 PM, Shawn Ritchie xritchie@gmail.com wrote:

Hi Radu,

Thanks for the reply this was extremely interesting, regarding the slow
indexing i m running this locally on my development machine which has 4GB
of RAM and allocating 1GB for Elastic search and as you said i can see a
high amount of I/O and CPU usage. I was just testing stuff before i try
them out on the actual server.

If you run the indexing tests on an empty index, 1GB of RAM should be OK.
Otherwise, I'd increase the memory to 2GB. And of course your indexing
performance will decrease badly when there's not enough memory.

So the server has 128GB of RAM
So i shoud allocate 64GB to Elastic search but how much should i allocate
for index_buffer_size?
also would it be ideal to allocate lets say min_index_buffer_size 10%
and max_index_buffer_size 50%
Or would it be ideal to put index_buffer_size to something like 50%
Or to the other extreme put indices.memory.min_shard_index_buffer_size
10% (which would imply a total usage of roughly 50%)

I can't suggest exact figures, but I think you'd get close to the sweetspot
when you run some performance testing on the production hardware. I'd start
with 30GB heap for ES, and index_buffer_size of 20%, run some tests and see
the performance impact when changing those settings up and down.

Also as regards to bulk updating do you suggest i turn off
the refresh_interval bulk insert and turn it back on then run an optimize
with segment = 5?

I'd turn off refresh_interval if I wouldn't be interested in making the new
contents available for search until I finish the whole insert operation.

I'd use the optimize API only if you don't index until the next large
indexing operation (after which you'd optimize again). That's because if
you index afterwards and some merging will occur, your caches will get
invalidated, which has a big impact on query performance.

As Regards segments_per_tier what is being referred to by tier?

Basically, tiers are categories of segments by size. Here's my
understanding of the "tiered" merging policy:

  • you have a number of very little segments that are naturally created
    during indexing. Actually, some merging is done here to ensure segments are
    bigger than "floor_segment"
  • when that number hits segments_per_tier, ES will merge some of them into
    bigger ones that will create some indices in the next "tier"
  • the process repeats until that next tier hits segments_per_tier as well.
    Then merging happens on that tier too, which creates another tier and so on
  • it will stop creating new tiers when merging on the last tier will create
    segments larger than max_merged_segment

So basically the more segments_per_tier you have, the less merging ->
because you'll end up with more small segments, since lower "tiers" will
hit the limit later.

and also what would be the ideal number to maximise insert speeds?

Again, unfortunately I can't recommend some hard numbers, but you can get
to them while testing.

Also once bulk inserting has been completed can i dan retweek these
setting to increase search speed instead of insert speed?

Yes, you can change merge settings on the fly via the Update Settings API:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I wrote a reply to this thread earlier today, but it looks like the
internet swallowed it whole. The only thing I'd add is this excellent article
by Mike McCandless about how Lucene manages merges:

The descriptions Mike write are great...but the videos really make it easy
to understand merges and tiered policy. Highly recommended.

-Zach

On Thursday, January 31, 2013 1:26:43 PM UTC-5, Radu Gheorghe wrote:

Hi Shawn,

On Thu, Jan 31, 2013 at 4:58 PM, Shawn Ritchie <xrit...@gmail.com<javascript:>

wrote:

Hi Radu,

Thanks for the reply this was extremely interesting, regarding the slow
indexing i m running this locally on my development machine which has 4GB
of RAM and allocating 1GB for Elastic search and as you said i can see a
high amount of I/O and CPU usage. I was just testing stuff before i try
them out on the actual server.

If you run the indexing tests on an empty index, 1GB of RAM should be OK.
Otherwise, I'd increase the memory to 2GB. And of course your indexing
performance will decrease badly when there's not enough memory.

So the server has 128GB of RAM
So i shoud allocate 64GB to Elastic search but how much should i allocate
for index_buffer_size?
also would it be ideal to allocate lets say min_index_buffer_size 10%
and max_index_buffer_size 50%
Or would it be ideal to put index_buffer_size to something like 50%
Or to the other extreme put indices.memory.min_shard_index_buffer_size
10% (which would imply a total usage of roughly 50%)

I can't suggest exact figures, but I think you'd get close to the
sweetspot when you run some performance testing on the production hardware.
I'd start with 30GB heap for ES, and index_buffer_size of 20%, run some
tests and see the performance impact when changing those settings up and
down.

Also as regards to bulk updating do you suggest i turn off
the refresh_interval bulk insert and turn it back on then run an optimize
with segment = 5?

I'd turn off refresh_interval if I wouldn't be interested in making the
new contents available for search until I finish the whole insert operation.

I'd use the optimize API only if you don't index until the next large
indexing operation (after which you'd optimize again). That's because if
you index afterwards and some merging will occur, your caches will get
invalidated, which has a big impact on query performance.

As Regards segments_per_tier what is being referred to by tier?

Basically, tiers are categories of segments by size. Here's my
understanding of the "tiered" merging policy:

  • you have a number of very little segments that are naturally created
    during indexing. Actually, some merging is done here to ensure segments are
    bigger than "floor_segment"
  • when that number hits segments_per_tier, ES will merge some of them into
    bigger ones that will create some indices in the next "tier"
  • the process repeats until that next tier hits segments_per_tier as well.
    Then merging happens on that tier too, which creates another tier and so on
  • it will stop creating new tiers when merging on the last tier will
    create segments larger than max_merged_segment

So basically the more segments_per_tier you have, the less merging ->
because you'll end up with more small segments, since lower "tiers" will
hit the limit later.

and also what would be the ideal number to maximise insert speeds?

Again, unfortunately I can't recommend some hard numbers, but you can get
to them while testing.

Also once bulk inserting has been completed can i dan retweek these
setting to increase search speed instead of insert speed?

Yes, you can change merge settings on the fly via the Update Settings API:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

That's a really nice article. Thanks, Zach!

On Thu, Jan 31, 2013 at 10:14 PM, Zachary Tong zacharyjtong@gmail.comwrote:

I wrote a reply to this thread earlier today, but it looks like the
internet swallowed it whole. The only thing I'd add is this excellent article
by Mike McCandless about how Lucene manages merges:

Changing Bits: Visualizing Lucene's segment merges

The descriptions Mike write are great...but the videos really make it easy
to understand merges and tiered policy. Highly recommended.

-Zach

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.