ES takes too much time to index data


(ian) #1

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in following
link .

Thanks


(Ivan Brusic) #2

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA simi.ma@algotree.com wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in following
link .
https://gist.github.com/3225800

Thanks


(vineeth mohan) #3

Hello ,

The case here is that the feeds come one by one and we are looking for a
real time indexing solution.
So we cant use bulk here.
Also it is observed that with the 1.5 L feeds in ES , it takes like 45
minutes to index a single feed using transport portocall on port 9300
But it only takes a second using the standard 9200 port using curl tool.

What do you feel is the reason for this ?

Thanks
Vineeth

On Wed, Aug 1, 2012 at 10:03 PM, Ivan Brusic ivan@brusic.com wrote:

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA simi.ma@algotree.com wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in
following
link .
https://gist.github.com/3225800

Thanks


(Shaun Etherton) #4

I thought 9300 was for ES nodes to
Communicate with each other - should you be using 9200 instead?

--
Shaun

On Friday, 3 August 2012 at 15:39, Vineeth Mohan wrote:

Hello ,

The case here is that the feeds come one by one and we are looking for a real time indexing solution.
So we cant use bulk here.
Also it is observed that with the 1.5 L feeds in ES , it takes like 45 minutes to index a single feed using transport portocall on port 9300
But it only takes a second using the standard 9200 port using curl tool.

What do you feel is the reason for this ?

Thanks
Vineeth

On Wed, Aug 1, 2012 at 10:03 PM, Ivan Brusic <ivan@brusic.com (mailto:ivan@brusic.com)> wrote:

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA <simi.ma@algotree.com (mailto:simi.ma@algotree.com)> wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in following
link .
https://gist.github.com/3225800

Thanks


(vineeth mohan) #5

Its the port used for transport client -
http://www.elasticsearch.org/guide/reference/java-api/client.html

@Shay - Please let me know what you think about this.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 12:27 PM, Shaun Etherton shaun.etherton@gmail.comwrote:

I thought 9300 was for ES nodes to
Communicate with each other - should you be using 9200 instead?

--
Shaun

On Friday, 3 August 2012 at 15:39, Vineeth Mohan wrote:

Hello ,

The case here is that the feeds come one by one and we are looking for a
real time indexing solution.
So we cant use bulk here.
Also it is observed that with the 1.5 L feeds in ES , it takes like 45
minutes to index a single feed using transport portocall on port 9300
But it only takes a second using the standard 9200 port using curl tool.

What do you feel is the reason for this ?

Thanks
Vineeth

On Wed, Aug 1, 2012 at 10:03 PM, Ivan Brusic ivan@brusic.com wrote:

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA simi.ma@algotree.com wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in
following
link .
https://gist.github.com/3225800

Thanks


(Shaun Etherton) #6

Oh, I see.

Thanks

Shaun

On Friday, 3 August 2012 at 18:42, Vineeth Mohan wrote:

Its the port used for transport client - http://www.elasticsearch.org/guide/reference/java-api/client.html

@Shay - Please let me know what you think about this.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 12:27 PM, Shaun Etherton <shaun.etherton@gmail.com (mailto:shaun.etherton@gmail.com)> wrote:

I thought 9300 was for ES nodes to
Communicate with each other - should you be using 9200 instead?

--
Shaun

On Friday, 3 August 2012 at 15:39, Vineeth Mohan wrote:

Hello ,

The case here is that the feeds come one by one and we are looking for a real time indexing solution.
So we cant use bulk here.
Also it is observed that with the 1.5 L feeds in ES , it takes like 45 minutes to index a single feed using transport portocall on port 9300
But it only takes a second using the standard 9200 port using curl tool.

What do you feel is the reason for this ?

Thanks
Vineeth

On Wed, Aug 1, 2012 at 10:03 PM, Ivan Brusic <ivan@brusic.com (mailto:ivan@brusic.com)> wrote:

You should use bulk indexing instead of indexing every individual
document separately. You are also refreshing the index after every
single index operation, which is time-consuming. Use bulk indexing and
disable refreshing (set the interval to -1) during batch indexing.

--
Ivan

On Wed, Aug 1, 2012 at 4:06 AM, Simi MA <simi.ma@algotree.com (mailto:simi.ma@algotree.com)> wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in following
link .
https://gist.github.com/3225800

Thanks


(Wes Plunk) #7

I'm seeing similar results, if anyone has any suggestions that would be
great

On Wednesday, August 1, 2012 6:06:56 AM UTC-5, Aami wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in
following link .
https://gist.github.com/3225800

Thanks


(David Pilato) #8

Removing refresh should help.
Then, for bulk indexing, it's best to use bulk features.

See: http://www.elasticsearch.org/guide/reference/java-api/bulk.html
http://www.elasticsearch.org/guide/reference/java-api/bulk.html

Hope this helps.

David.

Le 9 août 2012 à 10:33, Wes Plunk wes@wesandemily.com a écrit :

I'm seeing similar results, if anyone has any suggestions that would be great

On Wednesday, August 1, 2012 6:06:56 AM UTC-5, Aami wrote:

Hi

I am using java API to index data and my ES version is 0.19.8 .It worked
fine and about 155000 data got indexed .
After that ES taking too much time to index the data . Code is in
following link .
https://gist.github.com/3225800

Thanks

<https://gist.github.com/3225800>

https://gist.github.com/3225800

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


(system) #9