Performance problems with large data volumes

John_D_Ament · November 4, 2014, 2:42pm

Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking up
around 2.5 gb, spread across a little more than 4000 indices. Currently
our master node is set for 6gb of ram. We're seeing that after loading
this data the JVM will eventually crash, sometimes in as little as 5
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John_D_Ament · November 4, 2014, 4:05pm

Georgi,

Thanks for the quick reply!

I have 4k indices. We're creating an index per tenant. In this
environment we've created 4k tenants.

We're running out of memory just letting the loading of records run.

John

On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

Hi,
I don't think 24k documents are large data.
What is strange for me is "4000 indices".
This is strange .. how many indices do you need ?

On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

When you are running OOM ? Example query(ies) ? How my nodes ? Some more
info please

Also, 6GB Heap is not too much, but that depends on your use case

Georgi

On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking up
around 2.5 gb, spread across a little more than 4000 indices. Currently
our master node is set for 6gb of ram. We're seeing that after loading
this data the JVM will eventually crash, sometimes in as little as 5
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John_D_Ament · November 4, 2014, 4:37pm

Georgi,

I'm indexing the data through regular index request via java

final IndexResponse response = esClient.client().prepareIndex(indexName,
type)
.setSource(json).setRefresh(true).execute().actionGet();

json in this case is a byte with the json data in it.

The requests come in via multiple HTTP requests, but I'm not leveraging any
specific multithreading within the ES client. I hope this helps, I'm not
100% sure what information would help identify.

John

On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

So you run OOM when you index data ?
If so :
How do you index the data ?
Are you using BulkRequest ?
Which programming language are you using ?
Are you using multiple threads to index ?

If you are using Bulk request , you should limit the size of the bulk.
You can also tune the bulk request pool in ES.

In general, you are very brief in describing you problem

Georgi

2014-11-04 17:05 GMT+01:00 John D. Ament <john.d...@gmail.com
<javascript:>>:

Georgi,

Thanks for the quick reply!

I have 4k indices. We're creating an index per tenant. In this
environment we've created 4k tenants.

We're running out of memory just letting the loading of records run.

John

On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

Hi,
I don't think 24k documents are large data.
What is strange for me is "4000 indices".
This is strange .. how many indices do you need ?

On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

When you are running OOM ? Example query(ies) ? How my nodes ? Some more
info please

Also, 6GB Heap is not too much, but that depends on your use case

Georgi

On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking up
around 2.5 gb, spread across a little more than 4000 indices. Currently
our master node is set for 6gb of ram. We're seeing that after loading
this data the JVM will eventually crash, sometimes in as little as 5
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cJ2Y6-KQZus/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dae58106-8954-4413-88a0-01dc40b99972%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John_D_Ament · November 4, 2014, 4:53pm

And actually now that I'm looking at it again - I wanted to ask why I need
to use setRefresh(true)?

In my case, we were not seeing index data updated quick enough upon
indexing a record. setting refresh = true was doing it for us. If there's
a way to avoid it, that might help me here?

On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote:

Georgi,

I'm indexing the data through regular index request via java

final IndexResponse response = esClient.client().prepareIndex(indexName,
type)

.setSource(json).setRefresh(true).execute().actionGet();

json in this case is a byte with the json data in it.

The requests come in via multiple HTTP requests, but I'm not leveraging
any specific multithreading within the ES client. I hope this helps, I'm
not 100% sure what information would help identify.

John

On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

So you run OOM when you index data ?
If so :
How do you index the data ?
Are you using BulkRequest ?
Which programming language are you using ?
Are you using multiple threads to index ?

If you are using Bulk request , you should limit the size of the bulk.
You can also tune the bulk request pool in ES.

In general, you are very brief in describing you problem

Georgi

2014-11-04 17:05 GMT+01:00 John D. Ament john.d...@gmail.com:

Georgi,

Thanks for the quick reply!

I have 4k indices. We're creating an index per tenant. In this
environment we've created 4k tenants.

We're running out of memory just letting the loading of records run.

John

On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

Hi,
I don't think 24k documents are large data.
What is strange for me is "4000 indices".
This is strange .. how many indices do you need ?

On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

When you are running OOM ? Example query(ies) ? How my nodes ? Some
more info please

Also, 6GB Heap is not too much, but that depends on your use case

Georgi

On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking
up around 2.5 gb, spread across a little more than 4000 indices. Currently
our master node is set for 6gb of ram. We're seeing that after loading
this data the JVM will eventually crash, sometimes in as little as 5
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cJ2Y6-KQZus/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/19f4f9a3-0f57-4521-ade6-d6cb9cfc4d25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John_D_Ament · November 5, 2014, 11:53am

Hi,

I doubt the issue is that I'm not using bulk requests. My requests come in
one at a time, not in bulk. If you can explain why bulk is required that
would help.

I can believe that the refresh is causing the issue. I would prefer to
test that one by itself. How do I configure the refresh interval on the
index?

John

On Wednesday, November 5, 2014 3:43:37 AM UTC-5, Georgi Ivanov wrote:

Ok .. so it is Java

You are not doing this right .

You should use BulkRequest or better BulkProcessor class

Do NOT do setRefresh ! This way you are forcing ES to do the real
indexing which will load the cluster a LOT

Set the refresh interval of your index to something line 30s or 60s

Here is a snippet of code using BulkProcessor (it will not run , because i
removed some parts but it will give u an idea)

public class IndexFoo {
private Connection connection = null;

public Client client;
Integer bulkSize = 1000;
private CommandLine cmd;
//BulkRequestBuilder bulkRequest;
BulkProcessor bulkRequest;
private String index;
Set hosts = new HashSet();

private int threads = 5;

public IndexFoo(CommandLine cmd) throws SQLException, ParseException {
this.cmd = cmd;
this.index = cmd.getOptionValue("index");
if (cmd.hasOption("b")) {
this.bulkSize = Integer.valueOf(cmd.getOptionValue("b"));
}
if (cmd.hasOption("t")) {
this.threads = Integer.valueOf(cmd.getOptionValue("t"));
}
if (cmd.hasOption("h")) {
String hosts = cmd.getOptionValue("h").split(",");
for (String host : hosts) {
this.hosts.add(host);
}
}

this.connectES();

this.bulkRequest = this.getBulkProcessor();
}

private void processData(ResultSet rs) throws SQLException {
while (rs.next()) {
//index
bulkRequest.add(client.prepareIndex(myIndex, "mytype",
id.toString()).setSource(mySource).request());

}//while
this.bulkRequest.close();
System.out.println("Indexing done");

}

private BulkProcessor getBulkProcessor(){
return BulkProcessor.builder(client, new BulkProcessor.Listener() {
@Override
public void beforeBulk(long executionId, BulkRequest request) {
//System.out.println("Executing bulk #"+executionId+"
"+request.numberOfActions());
}
@Override
public void afterBulk(long executionId, BulkRequest request, Throwable
failure) {
}
@Override
public void afterBulk(long executionId, BulkRequest request, BulkResponse
response) {
System.out.println("Bulk #"+executionId+"/"+request.numberOfActions()+"
executed in "+response.getTook().secondsFrac()+" sec.");
if (response.hasFailures()) {
for (BulkItemResponse bulkItemResponse : response.getItems()) {
if (bulkItemResponse.isFailed()){
System.err.println("Failure message : "+
bulkItemResponse.getFailureMessage());
}
}
System.exit(-1);
}
}
}).setConcurrentRequests(this.threads
).setBulkActions(this.bulkSize).build();
}

}

2014-11-04 17:53 GMT+01:00 John D. Ament <john.d...@gmail.com
<javascript:>>:

And actually now that I'm looking at it again - I wanted to ask why I
need to use setRefresh(true)?

In my case, we were not seeing index data updated quick enough upon
indexing a record. setting refresh = true was doing it for us. If there's
a way to avoid it, that might help me here?

On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote:

Georgi,

I'm indexing the data through regular index request via java

final IndexResponse response = esClient.client().prepareIndex(indexName,
type)
.setSource(json).setRefresh(
true).execute().actionGet();

json in this case is a byte with the json data in it.

The requests come in via multiple HTTP requests, but I'm not leveraging
any specific multithreading within the ES client. I hope this helps, I'm
not 100% sure what information would help identify.

John

On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

So you run OOM when you index data ?
If so :
How do you index the data ?
Are you using BulkRequest ?
Which programming language are you using ?
Are you using multiple threads to index ?

If you are using Bulk request , you should limit the size of the bulk.
You can also tune the bulk request pool in ES.

In general, you are very brief in describing you problem

Georgi

2014-11-04 17:05 GMT+01:00 John D. Ament john.d...@gmail.com:

Georgi,

Thanks for the quick reply!

I have 4k indices. We're creating an index per tenant. In this
environment we've created 4k tenants.

We're running out of memory just letting the loading of records run.

John

On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

Hi,
I don't think 24k documents are large data.
What is strange for me is "4000 indices".
This is strange .. how many indices do you need ?

On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

When you are running OOM ? Example query(ies) ? How my nodes ? Some
more info please

Also, 6GB Heap is not too much, but that depends on your use case

Georgi

On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking
up around 2.5 gb, spread across a little more than 4000 indices. Currently
our master node is set for 6gb of ram. We're seeing that after loading
this data the JVM will eventually crash, sometimes in as little as 5
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/cJ2Y6-KQZus/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cJ2Y6-KQZus/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/19f4f9a3-0f57-4521-ade6-d6cb9cfc4d25%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/19f4f9a3-0f57-4521-ade6-d6cb9cfc4d25%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0574cd8c-b03f-4278-9ca5-7ba60ecd894d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · November 5, 2014, 4:50pm

Use index aliases: one physical index, 4000 aliases.

Jörg

On Tue, Nov 4, 2014 at 3:42 PM, John D. Ament john.d.ament@gmail.com
wrote:

Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking up
around 2.5 gb, spread across a little more than 4000 indices. Currently
our master node is set for 6gb of ram. We're seeing that after loading
this data the JVM will eventually crash, sometimes in as little as 5
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGTw8zWjptswHxi6w2yTnwU4zXtB9jHLGOCNOf9ZW%2B27A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

John_D_Ament · November 6, 2014, 6:08pm

How would index aliases help here?

On Wednesday, November 5, 2014 11:50:34 AM UTC-5, Jörg Prante wrote:

Use index aliases: one physical index, 4000 aliases.

Jörg

On Tue, Nov 4, 2014 at 3:42 PM, John D. Ament <john.d...@gmail.com
<javascript:>> wrote:

Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking up
around 2.5 gb, spread across a little more than 4000 indices. Currently
our master node is set for 6gb of ram. We're seeing that after loading
this data the JVM will eventually crash, sometimes in as little as 5
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb4e1ae5-8ec7-42ae-b461-ba150ff1da61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · November 6, 2014, 7:20pm

See kimchy's explanation

https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Jörg

On Thu, Nov 6, 2014 at 7:08 PM, John D. Ament john.d.ament@gmail.com
wrote:

How would index aliases help here?

On Wednesday, November 5, 2014 11:50:34 AM UTC-5, Jörg Prante wrote:

Use index aliases: one physical index, 4000 aliases.

Jörg

On Tue, Nov 4, 2014 at 3:42 PM, John D. Ament john.d...@gmail.com
wrote:

Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking up
around 2.5 gb, spread across a little more than 4000 indices. Currently
our master node is set for 6gb of ram. We're seeing that after loading
this data the JVM will eventually crash, sometimes in as little as 5
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cb4e1ae5-8ec7-42ae-b461-ba150ff1da61%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cb4e1ae5-8ec7-42ae-b461-ba150ff1da61%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGu_MfidMrNX1kEiKdUGQDsDCFVbhYn1%2BgKuFE4DFq7kA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ES searching and indexing speed reduced after processing 600milion records Elasticsearch	10	994	July 6, 2017
Memory requirements and settings Elasticsearch	8	3030	July 6, 2017
ES locks up and eat the heap Elasticsearch	18	1160	July 6, 2017
Abrupt performance drop above a certain index size Elasticsearch	16	1477	July 6, 2017
Memory utilization - predicting 'out of heap space' errors Elasticsearch	13	427	July 6, 2017

Performance problems with large data volumes

Related topics