Very slow bulk indexing

Hi,

We have a elasticsearch cluster deployed on AWS. We use the following
configurations:

2 m1.large instances
Shards = 3
replication = 1
JVM Memory - 4.5 GB

in /etc/security/limits.conf, we have set the limits to : nofile 32000.

We have 2 different indexes, each of 15GB. Our use case requires us to
index the entire data at one time.
After the initial indexing, we don't get too many updates to the index
(about 200-300 updates per day). Most of the search requests to the cluster.

The first index got indexed in 2 hrs and the memory was fine during the
indexing.

We are facing a problem while indexing the 2nd index:
There are too many old gc's happening during the indexing, We are getting a
old gen every 1 min. After the old gen, the memory goes to below 1 GB, but
fills up very fast.

Any help is appreciated.

We got the following data using JMAP:

num #instances #bytes Class description


1: 5940691 332678696 org.apache.lucene.index.SegmentNorms

2: 8941204 286118528 java.util.HashMap$Entry

3: 2629325 84138400 org.apache.lucene.index.FieldInfo

4: 7462 70466928 java.util.HashMap$Entry[]

5: 253939 37475472 char[]

6: 32018 32550872 byte[]

7: 25109 30609304 int[]

8: 688280 17893328
org.elasticsearch.index.mapper.FieldMapper[]

9: 688329 16519896 java.util.Arrays$ArrayList

10: 688280 16518720
org.elasticsearch.index.mapper.FieldMappers

11: 7972 15217024 java.lang.Object[]

12: 92755 13697816 * ConstMethodKlass

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Problem was bad mapping file. Fixing the mapping file, solved the issue

Thanks
Rohit

On Tuesday, June 11, 2013 6:14:12 PM UTC+5:30, rohit reddy wrote:

Hi,

We have a elasticsearch cluster deployed on AWS. We use the following
configurations:

2 m1.large instances
Shards = 3
replication = 1
JVM Memory - 4.5 GB

in /etc/security/limits.conf, we have set the limits to : nofile 32000.

We have 2 different indexes, each of 15GB. Our use case requires us to
index the entire data at one time.
After the initial indexing, we don't get too many updates to the index
(about 200-300 updates per day). Most of the search requests to the cluster.

The first index got indexed in 2 hrs and the memory was fine during the
indexing.

We are facing a problem while indexing the 2nd index:
There are too many old gc's happening during the indexing, We are getting
a old gen every 1 min. After the old gen, the memory goes to below 1 GB,
but fills up very fast.

Any help is appreciated.

We got the following data using JMAP:

num #instances #bytes Class description


1: 5940691 332678696
org.apache.lucene.index.SegmentNorms

2: 8941204 286118528 java.util.HashMap$Entry

3: 2629325 84138400 org.apache.lucene.index.FieldInfo

4: 7462 70466928 java.util.HashMap$Entry

5: 253939 37475472 char

6: 32018 32550872 byte

7: 25109 30609304 int

8: 688280 17893328
org.elasticsearch.index.mapper.FieldMapper

9: 688329 16519896 java.util.Arrays$ArrayList

10: 688280 16518720
org.elasticsearch.index.mapper.FieldMappers

11: 7972 15217024 java.lang.Object

12: 92755 13697816 * ConstMethodKlass

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.