ES takes too much time to index data

ian · August 1, 2012, 11:06am

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in following
link .

gist.github.com

https://gist.github.com/siminoushad/3225800

gistfile1.txt

    IndexResponse response = null;
		IndexRequestBuilder indexRreqBuild;
		client.admin()
				.cluster()
				.health(new ClusterHealthRequest(indexName)
						.waitForYellowStatus()).actionGet();
	
		XContentBuilder docBuilder = XContentFactory.jsonBuilder()
				.startObject();
		for (String key : map.keySet()) {

This file has been truncated. show original

Thanks

Ivan · August 1, 2012, 4:33pm

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA simi.ma@algotree.com wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in following
link .
Elasticsearch client · GitHub

Thanks

vineeth_mohan · August 3, 2012, 6:09am

Hello ,

The case here is that the feeds come one by one and we are looking for a
real time indexing solution.
So we cant use bulk here.
Also it is observed that with the 1.5 L feeds in ES , it takes like 45
minutes to index a single feed using transport portocall on port 9300
But it only takes a second using the standard 9200 port using curl tool.

What do you feel is the reason for this ?

Thanks
Vineeth

On Wed, Aug 1, 2012 at 10:03 PM, Ivan Brusic ivan@brusic.com wrote:

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA simi.ma@algotree.com wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in
following
link .
Elasticsearch client · GitHub

Thanks

Shaun_Etherton · August 3, 2012, 6:57am

I thought 9300 was for ES nodes to
Communicate with each other - should you be using 9200 instead?

--
Shaun

On Friday, 3 August 2012 at 15:39, Vineeth Mohan wrote:

Hello ,

The case here is that the feeds come one by one and we are looking for a real time indexing solution.
So we cant use bulk here.
Also it is observed that with the 1.5 L feeds in ES , it takes like 45 minutes to index a single feed using transport portocall on port 9300
But it only takes a second using the standard 9200 port using curl tool.

What do you feel is the reason for this ?

Thanks
Vineeth

On Wed, Aug 1, 2012 at 10:03 PM, Ivan Brusic <ivan@brusic.com (mailto:ivan@brusic.com)> wrote:

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA <simi.ma@algotree.com (mailto:simi.ma@algotree.com)> wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in following
link .
Elasticsearch client · GitHub

Thanks

vineeth_mohan · August 3, 2012, 9:12am

Its the port used for transport client -

@Shay - Please let me know what you think about this.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 12:27 PM, Shaun Etherton shaun.etherton@gmail.comwrote:

I thought 9300 was for ES nodes to
Communicate with each other - should you be using 9200 instead?

--
Shaun

On Friday, 3 August 2012 at 15:39, Vineeth Mohan wrote:

Hello ,

The case here is that the feeds come one by one and we are looking for a
real time indexing solution.
So we cant use bulk here.
Also it is observed that with the 1.5 L feeds in ES , it takes like 45
minutes to index a single feed using transport portocall on port 9300
But it only takes a second using the standard 9200 port using curl tool.

What do you feel is the reason for this ?

Thanks
Vineeth

On Wed, Aug 1, 2012 at 10:03 PM, Ivan Brusic ivan@brusic.com wrote:

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA simi.ma@algotree.com wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in
following
link .
Elasticsearch client · GitHub

Thanks

Shaun_Etherton · August 3, 2012, 1:15pm

Oh, I see.

Thanks

Shaun

On Friday, 3 August 2012 at 18:42, Vineeth Mohan wrote:

Its the port used for transport client - Elasticsearch Platform — Find real-time answers at scale | Elastic

@Shay - Please let me know what you think about this.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 12:27 PM, Shaun Etherton <shaun.etherton@gmail.com (mailto:shaun.etherton@gmail.com)> wrote:

I thought 9300 was for ES nodes to
Communicate with each other - should you be using 9200 instead?

--
Shaun

On Friday, 3 August 2012 at 15:39, Vineeth Mohan wrote:

Hello ,

The case here is that the feeds come one by one and we are looking for a real time indexing solution.
So we cant use bulk here.
Also it is observed that with the 1.5 L feeds in ES , it takes like 45 minutes to index a single feed using transport portocall on port 9300
But it only takes a second using the standard 9200 port using curl tool.

What do you feel is the reason for this ?

Thanks
Vineeth

On Wed, Aug 1, 2012 at 10:03 PM, Ivan Brusic <ivan@brusic.com (mailto:ivan@brusic.com)> wrote:

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA <simi.ma@algotree.com (mailto:simi.ma@algotree.com)> wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in following
link .
Elasticsearch client · GitHub

Thanks

Wes_Plunk · August 9, 2012, 8:33am

I'm seeing similar results, if anyone has any suggestions that would be
great

On Wednesday, August 1, 2012 6:06:56 AM UTC-5, Aami wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in
following link .
Elasticsearch client · GitHub

Thanks

dadoonet · August 9, 2012, 8:44am

Removing refresh should help.
Then, for bulk indexing, it's best to use bulk features.

See: Elasticsearch Platform — Find real-time answers at scale | Elastic
http://www.elasticsearch.org/guide/reference/java-api/bulk.html

Hope this helps.

David.

Le 9 août 2012 à 10:33, Wes Plunk wes@wesandemily.com a écrit :

I'm seeing similar results, if anyone has any suggestions that would be great

On Wednesday, August 1, 2012 6:06:56 AM UTC-5, Aami wrote:
Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in
following link .
Elasticsearch client · GitHub

Thanks
<https://gist.github.com/3225800>

https://gist.github.com/3225800

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Topic		Replies	Views
ES Indexing take huge time Elasticsearch	6	1630	July 5, 2017
How Can I increase ES's indexing Data speed?Bulk can't achieve it! Elasticsearch	12	1275	July 5, 2017
Elasticsearch Java client much slower than rest call Elasticsearch	7	4240	July 5, 2017
Index API queries taking long time Elasticsearch	5	627	December 11, 2018
Java bulk API slows down if client is not closed and reopened Elasticsearch	9	520	July 6, 2017

ES takes too much time to index data

Thanks

Related topics