Optimize bulk insertion

kayngee · November 9, 2012, 10:38am

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using edgeNGram
filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to insert a
batch of 1000 records is still taking around 1-2 minutes, and have only
inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

kayngee · November 9, 2012, 10:44am

Also getting this in the logs when i do bulk insertion anyone mind
explaining what these warnings are

[2012-11-09 10:49:37,678][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340020][1597] duration [2.8s], collections [1]/[3.8s], total
[2.8s]/[22.2m], memory [547.5mb]->[571.6mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[1.1mb]->[3.8mb]/[273mb]}{[Par Survivor Space]
[30.2mb]->[33.5mb]/[34.1mb]}{[CMS Old Gen]
[516.5mb]->[534.6mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:50:55,045][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340095][1625] duration [1s], collections [1]/[1.9s], total
[1s]/[22.2m], memory [540.8mb]->[571.3mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[12.4mb]->[11.8mb]/[273mb]}{[Par Survivor Space]
[28.7mb]->[34.1mb]/[34.1mb]}{[CMS Old Gen]
[499.6mb]->[525.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:14,384][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340097][1626] duration [17.6s], collections [1]/[18.3s], total
[17.6s]/[22.5m], memory [763.1mb]->[586.5mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[203.6mb]->[3.3mb]/[273mb]}{[Par Survivor Space]
[34.1mb]->[33.9mb]/[34.1mb]}{[CMS Old Gen]
[525.3mb]->[549.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:40,359][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340122][1636] duration [1.1s], collections [1]/[1.8s], total
[1.1s]/[22.5m], memory [573.5mb]->[311.7mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[271.5mb]->[1.2mb]/[273mb]}{[Par Survivor Space]
[24.4mb]->[11.2mb]/[34.1mb]}{[CMS Old Gen]
[277.5mb]->[299.2mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}

On Friday, 9 November 2012 11:38:29 UTC+1, Shawn Ritchie wrote:

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using edgeNGram
filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to insert
a batch of 1000 records is still taking around 1-2 minutes, and have only
inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

dadoonet · November 9, 2012, 10:47am

Are you doing bulk in Java?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 nov. 2012 à 11:44, Shawn Ritchie xritchie@gmail.com a écrit :

Also getting this in the logs when i do bulk insertion anyone mind explaining what these warnings are

[2012-11-09 10:49:37,678][WARN ][monitor.jvm ] [Frank Castle] [gc][ParNew][340020][1597] duration [2.8s], collections [1]/[3.8s], total [2.8s]/[22.2m], memory [547.5mb]->[571.6mb]/[989.8mb], all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space] [1.1mb]->[3.8mb]/[273mb]}{[Par Survivor Space] [30.2mb]->[33.5mb]/[34.1mb]}{[CMS Old Gen] [516.5mb]->[534.6mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:50:55,045][WARN ][monitor.jvm ] [Frank Castle] [gc][ParNew][340095][1625] duration [1s], collections [1]/[1.9s], total [1s]/[22.2m], memory [540.8mb]->[571.3mb]/[989.8mb], all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space] [12.4mb]->[11.8mb]/[273mb]}{[Par Survivor Space] [28.7mb]->[34.1mb]/[34.1mb]}{[CMS Old Gen] [499.6mb]->[525.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:14,384][WARN ][monitor.jvm ] [Frank Castle] [gc][ParNew][340097][1626] duration [17.6s], collections [1]/[18.3s], total [17.6s]/[22.5m], memory [763.1mb]->[586.5mb]/[989.8mb], all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space] [203.6mb]->[3.3mb]/[273mb]}{[Par Survivor Space] [34.1mb]->[33.9mb]/[34.1mb]}{[CMS Old Gen] [525.3mb]->[549.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:40,359][WARN ][monitor.jvm ] [Frank Castle] [gc][ParNew][340122][1636] duration [1.1s], collections [1]/[1.8s], total [1.1s]/[22.5m], memory [573.5mb]->[311.7mb]/[989.8mb], all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space] [271.5mb]->[1.2mb]/[273mb]}{[Par Survivor Space] [24.4mb]->[11.2mb]/[34.1mb]}{[CMS Old Gen] [277.5mb]->[299.2mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}

On Friday, 9 November 2012 11:38:29 UTC+1, Shawn Ritchie wrote:
Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using edgeNGram filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to insert a batch of 1000 records is still taking around 1-2 minutes, and have only inserted around 6 batches before i halted it i'm not certain if this performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

kayngee · November 9, 2012, 10:55am

no using HttpRequests using c#, allocated on the same server as elastic
search

On Friday, 9 November 2012 11:48:00 UTC+1, David Pilato wrote:

Are you doing bulk in Java?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 nov. 2012 à 11:44, Shawn Ritchie <xrit...@gmail.com <javascript:>> a
écrit :

Also getting this in the logs when i do bulk insertion anyone mind
explaining what these warnings are

[2012-11-09 10:49:37,678][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340020][1597] duration [2.8s], collections [1]/[3.8s], total
[2.8s]/[22.2m], memory [547.5mb]->[571.6mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[1.1mb]->[3.8mb]/[273mb]}{[Par Survivor Space]
[30.2mb]->[33.5mb]/[34.1mb]}{[CMS Old Gen]
[516.5mb]->[534.6mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:50:55,045][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340095][1625] duration [1s], collections [1]/[1.9s], total
[1s]/[22.2m], memory [540.8mb]->[571.3mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[12.4mb]->[11.8mb]/[273mb]}{[Par Survivor Space]
[28.7mb]->[34.1mb]/[34.1mb]}{[CMS Old Gen]
[499.6mb]->[525.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:14,384][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340097][1626] duration [17.6s], collections [1]/[18.3s], total
[17.6s]/[22.5m], memory [763.1mb]->[586.5mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[203.6mb]->[3.3mb]/[273mb]}{[Par Survivor Space]
[34.1mb]->[33.9mb]/[34.1mb]}{[CMS Old Gen]
[525.3mb]->[549.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:40,359][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340122][1636] duration [1.1s], collections [1]/[1.8s], total
[1.1s]/[22.5m], memory [573.5mb]->[311.7mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[271.5mb]->[1.2mb]/[273mb]}{[Par Survivor Space]
[24.4mb]->[11.2mb]/[34.1mb]}{[CMS Old Gen]
[277.5mb]->[299.2mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}

On Friday, 9 November 2012 11:38:29 UTC+1, Shawn Ritchie wrote:

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using
edgeNGram filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to insert
a batch of 1000 records is still taking around 1-2 minutes, and have only
inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

--

kayngee · November 9, 2012, 11:02am

will try out the same thing using curl and see how long it will take.

On Friday, 9 November 2012 11:55:47 UTC+1, Shawn Ritchie wrote:

no using HttpRequests using c#, allocated on the same server as elastic
search

On Friday, 9 November 2012 11:48:00 UTC+1, David Pilato wrote:

Are you doing bulk in Java?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 nov. 2012 à 11:44, Shawn Ritchie xrit...@gmail.com a écrit :

Also getting this in the logs when i do bulk insertion anyone mind
explaining what these warnings are

[2012-11-09 10:49:37,678][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340020][1597] duration [2.8s], collections [1]/[3.8s],
total [2.8s]/[22.2m], memory [547.5mb]->[571.6mb]/[989.8mb], all_pools
{[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[1.1mb]->[3.8mb]/[273mb]}{[Par Survivor Space]
[30.2mb]->[33.5mb]/[34.1mb]}{[CMS Old Gen]
[516.5mb]->[534.6mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:50:55,045][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340095][1625] duration [1s], collections [1]/[1.9s],
total [1s]/[22.2m], memory [540.8mb]->[571.3mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[12.4mb]->[11.8mb]/[273mb]}{[Par Survivor Space]
[28.7mb]->[34.1mb]/[34.1mb]}{[CMS Old Gen]
[499.6mb]->[525.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:14,384][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340097][1626] duration [17.6s], collections
[1]/[18.3s], total [17.6s]/[22.5m], memory [763.1mb]->[586.5mb]/[989.8mb],
all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[203.6mb]->[3.3mb]/[273mb]}{[Par Survivor Space]
[34.1mb]->[33.9mb]/[34.1mb]}{[CMS Old Gen]
[525.3mb]->[549.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:40,359][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340122][1636] duration [1.1s], collections [1]/[1.8s],
total [1.1s]/[22.5m], memory [573.5mb]->[311.7mb]/[989.8mb], all_pools
{[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[271.5mb]->[1.2mb]/[273mb]}{[Par Survivor Space]
[24.4mb]->[11.2mb]/[34.1mb]}{[CMS Old Gen]
[277.5mb]->[299.2mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}

On Friday, 9 November 2012 11:38:29 UTC+1, Shawn Ritchie wrote:

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using
edgeNGram filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to
insert a batch of 1000 records is still taking around 1-2 minutes, and have
only inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

--

dadoonet · November 9, 2012, 11:15am

Ok. A common error in Java is to reuse the same bulk at each iteration. But you have to recreate it after each execution.

That's not your concern here. I'm afraid I can't help here and hope you will find answers from others...

Cheers

David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 nov. 2012 à 12:02, Shawn Ritchie xritchie@gmail.com a écrit :

will try out the same thing using curl and see how long it will take.

On Friday, 9 November 2012 11:55:47 UTC+1, Shawn Ritchie wrote:
no using HttpRequests using c#, allocated on the same server as Elasticsearch

On Friday, 9 November 2012 11:48:00 UTC+1, David Pilato wrote:
Are you doing bulk in Java?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 nov. 2012 à 11:44, Shawn Ritchie xrit...@gmail.com a écrit :

Also getting this in the logs when i do bulk insertion anyone mind explaining what these warnings are

[2012-11-09 10:49:37,678][WARN ][monitor.jvm ] [Frank Castle] [gc][ParNew][340020][1597] duration [2.8s], collections [1]/[3.8s], total [2.8s]/[22.2m], memory [547.5mb]->[571.6mb]/[989.8mb], all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space] [1.1mb]->[3.8mb]/[273mb]}{[Par Survivor Space] [30.2mb]->[33.5mb]/[34.1mb]}{[CMS Old Gen] [516.5mb]->[534.6mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:50:55,045][WARN ][monitor.jvm ] [Frank Castle] [gc][ParNew][340095][1625] duration [1s], collections [1]/[1.9s], total [1s]/[22.2m], memory [540.8mb]->[571.3mb]/[989.8mb], all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space] [12.4mb]->[11.8mb]/[273mb]}{[Par Survivor Space] [28.7mb]->[34.1mb]/[34.1mb]}{[CMS Old Gen] [499.6mb]->[525.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:14,384][WARN ][monitor.jvm ] [Frank Castle] [gc][ParNew][340097][1626] duration [17.6s], collections [1]/[18.3s], total [17.6s]/[22.5m], memory [763.1mb]->[586.5mb]/[989.8mb], all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space] [203.6mb]->[3.3mb]/[273mb]}{[Par Survivor Space] [34.1mb]->[33.9mb]/[34.1mb]}{[CMS Old Gen] [525.3mb]->[549.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:40,359][WARN ][monitor.jvm ] [Frank Castle] [gc][ParNew][340122][1636] duration [1.1s], collections [1]/[1.8s], total [1.1s]/[22.5m], memory [573.5mb]->[311.7mb]/[989.8mb], all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space] [271.5mb]->[1.2mb]/[273mb]}{[Par Survivor Space] [24.4mb]->[11.2mb]/[34.1mb]}{[CMS Old Gen] [277.5mb]->[299.2mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}

On Friday, 9 November 2012 11:38:29 UTC+1, Shawn Ritchie wrote:
Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using edgeNGram filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to insert a batch of 1000 records is still taking around 1-2 minutes, and have only inserted around 6 batches before i halted it i'm not certain if this performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

--

kayngee · November 9, 2012, 11:19am

No I'm recreating the bulk with a fresh 1000 records at each iteration.

On Friday, 9 November 2012 12:16:16 UTC+1, David Pilato wrote:

Ok. A common error in Java is to reuse the same bulk at each iteration.
But you have to recreate it after each execution.

That's not your concern here. I'm afraid I can't help here and hope you
will find answers from others...

Cheers

David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 nov. 2012 à 12:02, Shawn Ritchie <xrit...@gmail.com <javascript:>> a
écrit :

will try out the same thing using curl and see how long it will take.

On Friday, 9 November 2012 11:55:47 UTC+1, Shawn Ritchie wrote:

no using HttpRequests using c#, allocated on the same server as elastic
search

On Friday, 9 November 2012 11:48:00 UTC+1, David Pilato wrote:

Are you doing bulk in Java?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 nov. 2012 à 11:44, Shawn Ritchie xrit...@gmail.com a écrit :

Also getting this in the logs when i do bulk insertion anyone mind
explaining what these warnings are

[2012-11-09 10:49:37,678][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340020][1597] duration [2.8s], collections [1]/[3.8s],
total [2.8s]/[22.2m], memory [547.5mb]->[571.6mb]/[989.8mb], all_pools
{[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[1.1mb]->[3.8mb]/[273mb]}{[Par Survivor Space]
[30.2mb]->[33.5mb]/[34.1mb]}{[CMS Old Gen]
[516.5mb]->[534.6mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:50:55,045][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340095][1625] duration [1s], collections [1]/[1.9s],
total [1s]/[22.2m], memory [540.8mb]->[571.3mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[12.4mb]->[11.8mb]/[273mb]}{[Par Survivor Space]
[28.7mb]->[34.1mb]/[34.1mb]}{[CMS Old Gen]
[499.6mb]->[525.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:14,384][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340097][1626] duration [17.6s], collections
[1]/[18.3s], total [17.6s]/[22.5m], memory [763.1mb]->[586.5mb]/[989.8mb],
all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[203.6mb]->[3.3mb]/[273mb]}{[Par Survivor Space]
[34.1mb]->[33.9mb]/[34.1mb]}{[CMS Old Gen]
[525.3mb]->[549.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:40,359][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340122][1636] duration [1.1s], collections [1]/[1.8s],
total [1.1s]/[22.5m], memory [573.5mb]->[311.7mb]/[989.8mb], all_pools
{[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[271.5mb]->[1.2mb]/[273mb]}{[Par Survivor Space]
[24.4mb]->[11.2mb]/[34.1mb]}{[CMS Old Gen]
[277.5mb]->[299.2mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}

On Friday, 9 November 2012 11:38:29 UTC+1, Shawn Ritchie wrote:

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using
edgeNGram filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to
insert a batch of 1000 records is still taking around 1-2 minutes, and have
only inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

--

--

kayngee · November 9, 2012, 11:28am

Same performance using curl.

On Friday, 9 November 2012 12:02:44 UTC+1, Shawn Ritchie wrote:

will try out the same thing using curl and see how long it will take.

On Friday, 9 November 2012 11:55:47 UTC+1, Shawn Ritchie wrote:

no using HttpRequests using c#, allocated on the same server as elastic
search

On Friday, 9 November 2012 11:48:00 UTC+1, David Pilato wrote:

Are you doing bulk in Java?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 nov. 2012 à 11:44, Shawn Ritchie xrit...@gmail.com a écrit :

Also getting this in the logs when i do bulk insertion anyone mind
explaining what these warnings are

[2012-11-09 10:49:37,678][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340020][1597] duration [2.8s], collections [1]/[3.8s],
total [2.8s]/[22.2m], memory [547.5mb]->[571.6mb]/[989.8mb], all_pools
{[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[1.1mb]->[3.8mb]/[273mb]}{[Par Survivor Space]
[30.2mb]->[33.5mb]/[34.1mb]}{[CMS Old Gen]
[516.5mb]->[534.6mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:50:55,045][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340095][1625] duration [1s], collections [1]/[1.9s],
total [1s]/[22.2m], memory [540.8mb]->[571.3mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[12.4mb]->[11.8mb]/[273mb]}{[Par Survivor Space]
[28.7mb]->[34.1mb]/[34.1mb]}{[CMS Old Gen]
[499.6mb]->[525.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:14,384][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340097][1626] duration [17.6s], collections
[1]/[18.3s], total [17.6s]/[22.5m], memory [763.1mb]->[586.5mb]/[989.8mb],
all_pools {[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[203.6mb]->[3.3mb]/[273mb]}{[Par Survivor Space]
[34.1mb]->[33.9mb]/[34.1mb]}{[CMS Old Gen]
[525.3mb]->[549.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:40,359][WARN ][monitor.jvm ] [Frank
Castle] [gc][ParNew][340122][1636] duration [1.1s], collections [1]/[1.8s],
total [1.1s]/[22.5m], memory [573.5mb]->[311.7mb]/[989.8mb], all_pools
{[Code Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[271.5mb]->[1.2mb]/[273mb]}{[Par Survivor Space]
[24.4mb]->[11.2mb]/[34.1mb]}{[CMS Old Gen]
[277.5mb]->[299.2mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}

On Friday, 9 November 2012 11:38:29 UTC+1, Shawn Ritchie wrote:

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using
edgeNGram filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to
insert a batch of 1000 records is still taking around 1-2 minutes, and have
only inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

--

radu_gheorghe · November 10, 2012, 10:26am

Hello Shawn,

On Fri, Nov 9, 2012 at 12:38 PM, Shawn Ritchie xritchie@gmail.com wrote:

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using edgeNGram
filters)
turned of refresh_interval
set the max_num_segments to 5

I'm not sure I understand this one. Do you optimize after each bulk, or?

and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to insert a
batch of 1000 records is still taking around 1-2 minutes, and have only
inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Maybe you already went through this but it's worth a shot

how much memory did you allocate to ES out of the total RAM?
you can disable _all if you don't need it
test to find the optimum batch size, maybe it works better with
smaller batches

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

jprante · November 12, 2012, 8:21am

As long as you don't evaluate the BulkResponses from the _bulk requests,
there is no safeguard against flooding ES, degrading the insertion
performance over time will be unavoidable.

Your strategy should be: estimate your data volume size of your 1000
requests in a single bulk. Issue a BulkRequest, do not wait for response,
issue more BulkRequests, then wait for incoming BulkResponses. Limit the
number of concurrent BulkRequests by waiting for the corresponding
BulkResponses. Check your heap settings if you can handle (number of max
concurrent bulks * number of req's in a bulk). Adjust the length of a bulk
request of the number of concurrent bulk until you hit the sweet spot of
your configuration. So, you can balance the total volume of bulk data you
are sending between c# client and ES cluster, without flooding the system.

Shay has developed the class org.elasticsearch.action.bulk.BulkProcessor as
an example to show how the throughput and concurrency of bulk ingesting can
be controlled by using the BulkResponses.

Cheers,

Jörg

On Saturday, November 10, 2012 11:26:10 AM UTC+1, Radu Gheorghe wrote:

Hello Shawn,

On Fri, Nov 9, 2012 at 12:38 PM, Shawn Ritchie <xrit...@gmail.com<javascript:>>
wrote:

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using
edgeNGram
filters)
turned of refresh_interval
set the max_num_segments to 5

I'm not sure I understand this one. Do you optimize after each bulk, or?

and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to
insert a
batch of 1000 records is still taking around 1-2 minutes, and have only
inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Maybe you already went through this but it's worth a shot

how much memory did you allocate to ES out of the total RAM?

you can disable _all if you don't need it

test to find the optimum batch size, maybe it works better with
smaller batches

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

jprante · November 12, 2012, 8:21am

Those warnings are harmless as they indicate the GC is stepping in.

Jörg

On Friday, November 9, 2012 11:44:16 AM UTC+1, Shawn Ritchie wrote:

Also getting this in the logs when i do bulk insertion anyone mind
explaining what these warnings are

[2012-11-09 10:49:37,678][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340020][1597] duration [2.8s], collections [1]/[3.8s], total
[2.8s]/[22.2m], memory [547.5mb]->[571.6mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[1.1mb]->[3.8mb]/[273mb]}{[Par Survivor Space]
[30.2mb]->[33.5mb]/[34.1mb]}{[CMS Old Gen]
[516.5mb]->[534.6mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:50:55,045][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340095][1625] duration [1s], collections [1]/[1.9s], total
[1s]/[22.2m], memory [540.8mb]->[571.3mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[12.4mb]->[11.8mb]/[273mb]}{[Par Survivor Space]
[28.7mb]->[34.1mb]/[34.1mb]}{[CMS Old Gen]
[499.6mb]->[525.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:14,384][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340097][1626] duration [17.6s], collections [1]/[18.3s], total
[17.6s]/[22.5m], memory [763.1mb]->[586.5mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[203.6mb]->[3.3mb]/[273mb]}{[Par Survivor Space]
[34.1mb]->[33.9mb]/[34.1mb]}{[CMS Old Gen]
[525.3mb]->[549.3mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}
[2012-11-09 10:51:40,359][WARN ][monitor.jvm ] [Frank Castle]
[gc][ParNew][340122][1636] duration [1.1s], collections [1]/[1.8s], total
[1.1s]/[22.5m], memory [573.5mb]->[311.7mb]/[989.8mb], all_pools {[Code
Cache] [6.5mb]->[6.5mb]/[48mb]}{[Par Eden Space]
[271.5mb]->[1.2mb]/[273mb]}{[Par Survivor Space]
[24.4mb]->[11.2mb]/[34.1mb]}{[CMS Old Gen]
[277.5mb]->[299.2mb]/[682.6mb]}{[CMS Perm Gen] [30.5mb]->[30.5mb]/[82mb]}

On Friday, 9 November 2012 11:38:29 UTC+1, Shawn Ritchie wrote:

Hi guys i'm trying to import my data source using the _bulk api.

I predefined mappers (has 5 dfferent type of analysers and using
edgeNGram filters)
turned of refresh_interval
set the max_num_segments to 5
and bulk inserting in batches of 1000

any other optimization i should do as with the current settings to insert
a batch of 1000 records is still taking around 1-2 minutes, and have only
inserted around 6 batches before i halted it i'm not certain if this
performance will degrade over time.

Any help is appretiated.

Regards
Shawn

--

Topic		Replies	Views
Slow Bulk Insert Elasticsearch	11	2361	July 6, 2017
Bulk inserting is slow Elasticsearch	14	16241	July 6, 2017
OutOfMemory Exceptions during bulk insert Elasticsearch	9	3017	July 6, 2017
Bulk throughput issues Elasticsearch	15	1744	July 6, 2017
Improving Bulk Indexing Elasticsearch	12	4602	July 6, 2017

Optimize bulk insertion

Regards Shawn

Cheers

Regards Shawn

Cheers

Best regards, Radu

Best regards, Radu

Related topics

Regards
Shawn

Regards
Shawn

Best regards,
Radu

Best regards,
Radu