Where are these TermInfo instances coming from?

lukeforehand_2 · January 20, 2012, 4:00pm

My caches are relatively small in size, so I'm wondering what is
chewing up all my old gen space? Here is some more information.

From BigDesk:
Number of documents: 390519818, Store size: 1221.1gb (1311185373569 B)
Field cache evictions: 0, Field cache size: 7.3gb, Filter cache size:
6.2gb
Merges: Current: 0, Total: 283, Took: 6.6m

#cache config
index.cache.field.type: soft
index.cache.field.max_size: 25000
index.cache.field.expire: 5m

jmap -histo:live
num #instances #bytes class name

1: 35874 7673524504 [J
2: 56225 5731388288 [I
3: 58051747 4430675432 [C
4: 57917438 2316697520 java.lang.String
5: 80004 1360455520 [B
6: 30007994 1200319760 org.apache.lucene.index.TermInfo
7: 30437157 973989024 org.apache.lucene.index.Term
8: 718 239969256 [Lorg.apache.lucene.index.Term;
9: 717 239969232
[Lorg.apache.lucene.index.TermInfo;
...

jmap -heap

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 34359738368 (32768.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 3
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 7730954240 (7372.8125MB)
used = 6874289696 (6555.833526611328MB)
free = 856664544 (816.9789733886719MB)
88.91903227718497% used
Eden Space:
capacity = 6871973888 (6553.625MB)
used = 6871973888 (6553.625MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 858980352 (819.1875MB)
used = 2315808 (2.208526611328125MB)
free = 856664544 (816.9789733886719MB)
0.26959964737354086% used
To Space:
capacity = 858980352 (819.1875MB)
used = 0 (0.0MB)
free = 858980352 (819.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 25769803776 (24576.0MB)
used = 25071826568 (23910.357063293457MB)
free = 697977208 (665.642936706543MB)
97.29149195676048% used
Perm Generation:
capacity = 68038656 (64.88671875MB)
used = 40815352 (38.92455291748047MB)
free = 27223304 (25.96216583251953MB)
59.98847478703871% used

Luke Forehand

otisg · January 20, 2012, 4:25pm

Luke,

Those are coming from Lucene. I've always seen a lot of them with
jmap, even in very healthy situations.
You may want to review other JVM params that control various size
ratios or when GC kicks in, etc.

Otis Backhand

Sematext is Hiring World-Wide -- Jobs - Sematext

On Jan 20, 11:00 am, lukeforehand lukeforeh...@gmail.com wrote:

My caches are relatively small in size, so I'm wondering what is
chewing up all my old gen space? Here is some more information.

From BigDesk:
Number of documents: 390519818, Store size: 1221.1gb (1311185373569 B)
Field cache evictions: 0, Field cache size: 7.3gb, Filter cache size:
6.2gb
Merges: Current: 0, Total: 283, Took: 6.6m

#cache config
index.cache.field.type: soft
index.cache.field.max_size: 25000
index.cache.field.expire: 5m

jmap -histo:live
num #instances #bytes class name

1: 35874 7673524504 [J
2: 56225 5731388288 [I
3: 58051747 4430675432 [C
4: 57917438 2316697520 java.lang.String
5: 80004 1360455520 [B
6: 30007994 1200319760 org.apache.lucene.index.TermInfo
7: 30437157 973989024 org.apache.lucene.index.Term
8: 718 239969256 [Lorg.apache.lucene.index.Term;
9: 717 239969232
[Lorg.apache.lucene.index.TermInfo;
...

jmap -heap

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 34359738368 (32768.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 3
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 7730954240 (7372.8125MB)
used = 6874289696 (6555.833526611328MB)
free = 856664544 (816.9789733886719MB)
88.91903227718497% used
Eden Space:
capacity = 6871973888 (6553.625MB)
used = 6871973888 (6553.625MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 858980352 (819.1875MB)
used = 2315808 (2.208526611328125MB)
free = 856664544 (816.9789733886719MB)
0.26959964737354086% used
To Space:
capacity = 858980352 (819.1875MB)
used = 0 (0.0MB)
free = 858980352 (819.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 25769803776 (24576.0MB)
used = 25071826568 (23910.357063293457MB)
free = 697977208 (665.642936706543MB)
97.29149195676048% used
Perm Generation:
capacity = 68038656 (64.88671875MB)
used = 40815352 (38.92455291748047MB)
free = 27223304 (25.96216583251953MB)
59.98847478703871% used

Luke Forehand

lukeforehand_2 · January 20, 2012, 4:55pm

Thanks Otis. But why aren't these instances being collected during
garbage collection? Do we know which mechanism in elasticsearch is
holding all these references? I had assumed it was the field data
cache or the filter cache but the stats show these caches are very
small.

-Luke

On Jan 20, 10:25 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Luke,

Those are coming from Lucene. I've always seen a lot of them with
jmap, even in very healthy situations.
You may want to review other JVM params that control various size
ratios or when GC kicks in, etc.

Otis Backhand

Sematext is Hiring World-Wide --Jobs - Sematext

On Jan 20, 11:00 am, lukeforehand lukeforeh...@gmail.com wrote:

My caches are relatively small in size, so I'm wondering what is
chewing up all my old gen space? Here is some more information.

From BigDesk:
Number of documents: 390519818, Store size: 1221.1gb (1311185373569 B)
Field cache evictions: 0, Field cache size: 7.3gb, Filter cache size:
6.2gb
Merges: Current: 0, Total: 283, Took: 6.6m

#cache config
index.cache.field.type: soft
index.cache.field.max_size: 25000
index.cache.field.expire: 5m

jmap -histo:live
num #instances #bytes class name

1: 35874 7673524504 [J
2: 56225 5731388288 [I
3: 58051747 4430675432 [C
4: 57917438 2316697520 java.lang.String
5: 80004 1360455520 [B
6: 30007994 1200319760 org.apache.lucene.index.TermInfo
7: 30437157 973989024 org.apache.lucene.index.Term
8: 718 239969256 [Lorg.apache.lucene.index.Term;
9: 717 239969232
[Lorg.apache.lucene.index.TermInfo;
...

jmap -heap

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 34359738368 (32768.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 3
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 7730954240 (7372.8125MB)
used = 6874289696 (6555.833526611328MB)
free = 856664544 (816.9789733886719MB)
88.91903227718497% used
Eden Space:
capacity = 6871973888 (6553.625MB)
used = 6871973888 (6553.625MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 858980352 (819.1875MB)
used = 2315808 (2.208526611328125MB)
free = 856664544 (816.9789733886719MB)
0.26959964737354086% used
To Space:
capacity = 858980352 (819.1875MB)
used = 0 (0.0MB)
free = 858980352 (819.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 25769803776 (24576.0MB)
used = 25071826568 (23910.357063293457MB)
free = 697977208 (665.642936706543MB)
97.29149195676048% used
Perm Generation:
capacity = 68038656 (64.88671875MB)
used = 40815352 (38.92455291748047MB)
free = 27223304 (25.96216583251953MB)
59.98847478703871% used

Luke Forehand

lukeforehand_2 · January 20, 2012, 6:42pm

Our cluster died just now and I think it is because of this issue with
old gen being filled up with references to Term or TermInfo objects.
17 of 20 nodes were 99.999 old gen used. These nodes only had average
cpu load of 1.0.

The stack trace from one of the nodes shows search threads being
blocked by a FieldDataLoader object.
http://pastie.org/3220393

I'm guessing FieldDataLoader was having problems because of garbage
collection hell, here are the gc stats before we had to restart
(concurrent mark sweep):

64138.657: [Full GC 32710576K->32643016K(32715584K), 25.3980750 secs]
64164.128: [Full GC 32698826K->32692769K(32715584K), 18.5887220 secs]
64182.783: [Full GC 32692769K->32683344K(32715584K), 17.9947830 secs]
64200.847: [Full GC 32706758K->31749396K(32715584K), 20.9377530 secs]

So the question remains: why is my old gen filled up with Term/
TermInfo, and not being collected, and is it the cause of my searches
being blocked? It has 24GB size, my field data cache never grew over
7GB according to BigDesk. Filter cache is default to 20% memory.

Thanks,
Luke

On Jan 20, 10:25 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Luke,

Those are coming from Lucene. I've always seen a lot of them with
jmap, even in very healthy situations.
You may want to review other JVM params that control various size
ratios or when GC kicks in, etc.

Otis Backhand

Sematext is Hiring World-Wide --Jobs - Sematext

On Jan 20, 11:00 am, lukeforehand lukeforeh...@gmail.com wrote:

My caches are relatively small in size, so I'm wondering what is
chewing up all my old gen space? Here is some more information.

From BigDesk:
Number of documents: 390519818, Store size: 1221.1gb (1311185373569 B)
Field cache evictions: 0, Field cache size: 7.3gb, Filter cache size:
6.2gb
Merges: Current: 0, Total: 283, Took: 6.6m

#cache config
index.cache.field.type: soft
index.cache.field.max_size: 25000
index.cache.field.expire: 5m

jmap -histo:live
num #instances #bytes class name

1: 35874 7673524504 [J
2: 56225 5731388288 [I
3: 58051747 4430675432 [C
4: 57917438 2316697520 java.lang.String
5: 80004 1360455520 [B
6: 30007994 1200319760 org.apache.lucene.index.TermInfo
7: 30437157 973989024 org.apache.lucene.index.Term
8: 718 239969256 [Lorg.apache.lucene.index.Term;
9: 717 239969232
[Lorg.apache.lucene.index.TermInfo;
...

jmap -heap

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 34359738368 (32768.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 3
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 7730954240 (7372.8125MB)
used = 6874289696 (6555.833526611328MB)
free = 856664544 (816.9789733886719MB)
88.91903227718497% used
Eden Space:
capacity = 6871973888 (6553.625MB)
used = 6871973888 (6553.625MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 858980352 (819.1875MB)
used = 2315808 (2.208526611328125MB)
free = 856664544 (816.9789733886719MB)
0.26959964737354086% used
To Space:
capacity = 858980352 (819.1875MB)
used = 0 (0.0MB)
free = 858980352 (819.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 25769803776 (24576.0MB)
used = 25071826568 (23910.357063293457MB)
free = 697977208 (665.642936706543MB)
97.29149195676048% used
Perm Generation:
capacity = 68038656 (64.88671875MB)
used = 40815352 (38.92455291748047MB)
free = 27223304 (25.96216583251953MB)
59.98847478703871% used

Luke Forehand

Eric_Jain · January 20, 2012, 7:40pm

On Jan 20, 10:42 am, lukeforehand lukeforeh...@gmail.com wrote:

So the question remains: why is my old gen filled up with Term/
TermInfo, and not being collected, and is it the cause of my searches
being blocked? It has 24GB size, my field data cache never grew over
7GB according to BigDesk. Filter cache is default to 20% memory.

I believe sorting and facets require loading a lot of terms into
memory, could it be that?

lukeforehand_2 · January 20, 2012, 7:45pm

We are killing it with a facet query on a field with a ton of terms.
If a facet is not going to fit in memory, is GC hell unavoidable ?

On Jan 20, 1:40 pm, Eric Jain eric.j...@gmail.com wrote:

On Jan 20, 10:42 am, lukeforehand lukeforeh...@gmail.com wrote:

So the question remains: why is my old gen filled up with Term/
TermInfo, and not being collected, and is it the cause of my searches
being blocked? It has 24GB size, my field data cache never grew over
7GB according to BigDesk. Filter cache is default to 20% memory.

I believe sorting and facets require loading a lot of terms into
memory, could it be that?

Michael_McCandless · January 21, 2012, 1:06am

The TermInfo instances are from Lucene's terms index.

Every 128th term is held in memory... so if you have many terms, that
can become sizable.

However: as of Lucene 3.5.0 the RAM required is substantially reduced
(LUCENE-2205). The terms are written into a more compact in-memory
format instead of a TermInfo+Term+String per term.

If upgrading is not an option then you can also set the terms index
divisor (not sure how to do so through Elasticsearch); eg setting it
to 2 loads every 256th term instead and uses half the RAM, but then
seeking to a given term will be slower.

Mike

http://blog.mikemccandless.com

On Fri, Jan 20, 2012 at 11:00 AM, lukeforehand lukeforehand@gmail.com wrote:

My caches are relatively small in size, so I'm wondering what is
chewing up all my old gen space? Here is some more information.

From BigDesk:
Number of documents: 390519818, Store size: 1221.1gb (1311185373569 B)
Field cache evictions: 0, Field cache size: 7.3gb, Filter cache size:
6.2gb
Merges: Current: 0, Total: 283, Took: 6.6m

#cache config
index.cache.field.type: soft
index.cache.field.max_size: 25000
index.cache.field.expire: 5m

jmap -histo:live
num #instances #bytes class name

1: 35874 7673524504 [J
2: 56225 5731388288 [I
3: 58051747 4430675432 [C
4: 57917438 2316697520 java.lang.String
5: 80004 1360455520 [B
6: 30007994 1200319760 org.apache.lucene.index.TermInfo
7: 30437157 973989024 org.apache.lucene.index.Term
8: 718 239969256 [Lorg.apache.lucene.index.Term;
9: 717 239969232
[Lorg.apache.lucene.index.TermInfo;
...

jmap -heap

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 34359738368 (32768.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 3
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 7730954240 (7372.8125MB)
used = 6874289696 (6555.833526611328MB)
free = 856664544 (816.9789733886719MB)
88.91903227718497% used
Eden Space:
capacity = 6871973888 (6553.625MB)
used = 6871973888 (6553.625MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 858980352 (819.1875MB)
used = 2315808 (2.208526611328125MB)
free = 856664544 (816.9789733886719MB)
0.26959964737354086% used
To Space:
capacity = 858980352 (819.1875MB)
used = 0 (0.0MB)
free = 858980352 (819.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 25769803776 (24576.0MB)
used = 25071826568 (23910.357063293457MB)
free = 697977208 (665.642936706543MB)
97.29149195676048% used
Perm Generation:
capacity = 68038656 (64.88671875MB)
used = 40815352 (38.92455291748047MB)
free = 27223304 (25.96216583251953MB)
59.98847478703871% used

Luke Forehand

lukeforehand_2 · January 21, 2012, 1:16am

Great suggestion!, I will definitely try setting
index.term_index_divisor to something greater than 1.

On Jan 20, 7:06 pm, Michael McCandless m...@mikemccandless.com
wrote:

The TermInfo instances are from Lucene's terms index.

Every 128th term is held in memory... so if you have many terms, that
can become sizable.

However: as of Lucene 3.5.0 the RAM required is substantially reduced
(LUCENE-2205). The terms are written into a more compact in-memory
format instead of a TermInfo+Term+String per term.

If upgrading is not an option then you can also set the terms index
divisor (not sure how to do so through Elasticsearch); eg setting it
to 2 loads every 256th term instead and uses half the RAM, but then
seeking to a given term will be slower.

Mike

http://blog.mikemccandless.com

On Fri, Jan 20, 2012 at 11:00 AM, lukeforehand lukeforeh...@gmail.com wrote:

My caches are relatively small in size, so I'm wondering what is
chewing up all my old gen space? Here is some more information.

From BigDesk:
Number of documents: 390519818, Store size: 1221.1gb (1311185373569 B)
Field cache evictions: 0, Field cache size: 7.3gb, Filter cache size:
6.2gb
Merges: Current: 0, Total: 283, Took: 6.6m

#cache config
index.cache.field.type: soft
index.cache.field.max_size: 25000
index.cache.field.expire: 5m

jmap -histo:live
num #instances #bytes class name

1: 35874 7673524504 [J
2: 56225 5731388288 [I
3: 58051747 4430675432 [C
4: 57917438 2316697520 java.lang.String
5: 80004 1360455520 [B
6: 30007994 1200319760 org.apache.lucene.index.TermInfo
7: 30437157 973989024 org.apache.lucene.index.Term
8: 718 239969256 [Lorg.apache.lucene.index.Term;
9: 717 239969232
[Lorg.apache.lucene.index.TermInfo;
...

jmap -heap

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 34359738368 (32768.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 3
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 7730954240 (7372.8125MB)
used = 6874289696 (6555.833526611328MB)
free = 856664544 (816.9789733886719MB)
88.91903227718497% used
Eden Space:
capacity = 6871973888 (6553.625MB)
used = 6871973888 (6553.625MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 858980352 (819.1875MB)
used = 2315808 (2.208526611328125MB)
free = 856664544 (816.9789733886719MB)
0.26959964737354086% used
To Space:
capacity = 858980352 (819.1875MB)
used = 0 (0.0MB)
free = 858980352 (819.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 25769803776 (24576.0MB)
used = 25071826568 (23910.357063293457MB)
free = 697977208 (665.642936706543MB)
97.29149195676048% used
Perm Generation:
capacity = 68038656 (64.88671875MB)
used = 40815352 (38.92455291748047MB)
free = 27223304 (25.96216583251953MB)
59.98847478703871% used

Luke Forehand

kimchy · January 23, 2012, 12:27am

Heya,

You can change the term index divisor on a live cluster using the update
settings API. But, it will make search slower. Which version of
elasticsearch are you using? Lucene 3.5.0 (as Mike noted) that is part of
0.18.5 (and above) has much better memory when it comes to it.

-shay.banon

On Sat, Jan 21, 2012 at 3:16 AM, lukeforehand lukeforehand@gmail.comwrote:

Great suggestion!, I will definitely try setting
index.term_index_divisor to something greater than 1.

On Jan 20, 7:06 pm, Michael McCandless m...@mikemccandless.com
wrote:

The TermInfo instances are from Lucene's terms index.

Every 128th term is held in memory... so if you have many terms, that
can become sizable.

However: as of Lucene 3.5.0 the RAM required is substantially reduced
(LUCENE-2205). The terms are written into a more compact in-memory
format instead of a TermInfo+Term+String per term.

If upgrading is not an option then you can also set the terms index
divisor (not sure how to do so through Elasticsearch); eg setting it
to 2 loads every 256th term instead and uses half the RAM, but then
seeking to a given term will be slower.

Mike

http://blog.mikemccandless.com

On Fri, Jan 20, 2012 at 11:00 AM, lukeforehand lukeforeh...@gmail.com
wrote:

My caches are relatively small in size, so I'm wondering what is
chewing up all my old gen space? Here is some more information.

From BigDesk:
Number of documents: 390519818, Store size: 1221.1gb (1311185373569 B)
Field cache evictions: 0, Field cache size: 7.3gb, Filter cache size:
6.2gb
Merges: Current: 0, Total: 283, Took: 6.6m

#cache config
index.cache.field.type: soft
index.cache.field.max_size: 25000
index.cache.field.expire: 5m

jmap -histo:live
num #instances #bytes class name

1: 35874 7673524504 [J
2: 56225 5731388288 [I
3: 58051747 4430675432 [C
4: 57917438 2316697520 java.lang.String
5: 80004 1360455520 [B
6: 30007994 1200319760 org.apache.lucene.index.TermInfo
7: 30437157 973989024 org.apache.lucene.index.Term
8: 718 239969256 [Lorg.apache.lucene.index.Term;
9: 717 239969232
[Lorg.apache.lucene.index.TermInfo;
...

jmap -heap

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 34359738368 (32768.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 3
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 7730954240 (7372.8125MB)
used = 6874289696 (6555.833526611328MB)
free = 856664544 (816.9789733886719MB)
88.91903227718497% used
Eden Space:
capacity = 6871973888 (6553.625MB)
used = 6871973888 (6553.625MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 858980352 (819.1875MB)
used = 2315808 (2.208526611328125MB)
free = 856664544 (816.9789733886719MB)
0.26959964737354086% used
To Space:
capacity = 858980352 (819.1875MB)
used = 0 (0.0MB)
free = 858980352 (819.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 25769803776 (24576.0MB)
used = 25071826568 (23910.357063293457MB)
free = 697977208 (665.642936706543MB)
97.29149195676048% used
Perm Generation:
capacity = 68038656 (64.88671875MB)
used = 40815352 (38.92455291748047MB)
free = 27223304 (25.96216583251953MB)
59.98847478703871% used

Luke Forehand

lukeforehand_2 · January 23, 2012, 2:51am

We have upgraded on dev as well as tweaked newratio to give old gen
more room, and I am awaiting test results. Thanks!

-Luke

On Jan 22, 6:27 pm, Shay Banon kim...@gmail.com wrote:

Heya,

You can change the term index divisor on a live cluster using the update
settings API. But, it will make search slower. Which version of
elasticsearch are you using? Lucene 3.5.0 (as Mike noted) that is part of
0.18.5 (and above) has much better memory when it comes to it.

-shay.banon

On Sat, Jan 21, 2012 at 3:16 AM, lukeforehand lukeforeh...@gmail.comwrote:

Great suggestion!, I will definitely try setting
index.term_index_divisor to something greater than 1.

On Jan 20, 7:06 pm, Michael McCandless m...@mikemccandless.com
wrote:

The TermInfo instances are from Lucene's terms index.

Every 128th term is held in memory... so if you have many terms, that
can become sizable.

However: as of Lucene 3.5.0 the RAM required is substantially reduced
(LUCENE-2205). The terms are written into a more compact in-memory
format instead of a TermInfo+Term+String per term.

If upgrading is not an option then you can also set the terms index
divisor (not sure how to do so through Elasticsearch); eg setting it
to 2 loads every 256th term instead and uses half the RAM, but then
seeking to a given term will be slower.

Mike

http://blog.mikemccandless.com

On Fri, Jan 20, 2012 at 11:00 AM, lukeforehand lukeforeh...@gmail.com
wrote:

My caches are relatively small in size, so I'm wondering what is
chewing up all my old gen space? Here is some more information.

From BigDesk:
Number of documents: 390519818, Store size: 1221.1gb (1311185373569B)
Field cache evictions: 0, Field cache size: 7.3gb, Filter cache size:
6.2gb
Merges: Current: 0, Total: 283, Took: 6.6m

#cache config
index.cache.field.type: soft
index.cache.field.max_size: 25000
index.cache.field.expire: 5m

jmap -histo:live
num #instances #bytes class name

1: 35874 7673524504 [J
2: 56225 5731388288 [I
3: 58051747 4430675432 [C
4: 57917438 2316697520 java.lang.String
5: 80004 1360455520 [B
6: 30007994 1200319760 org.apache.lucene.index.TermInfo
7: 30437157 973989024 org.apache.lucene.index.Term
8: 718 239969256 [Lorg.apache.lucene.index.Term;
9: 717 239969232
[Lorg.apache.lucene.index.TermInfo;
...

jmap -heap

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize =34359738368(32768.0MB)
NewSize =1310720(1.25MB)
MaxNewSize =17592186044415MB
OldSize =5439488(5.1875MB)
NewRatio = 3
SurvivorRatio = 8
PermSize =21757952(20.75MB)
MaxPermSize =85983232(82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity =7730954240(7372.8125MB)
used =6874289696(6555.833526611328MB)
free = 856664544 (816.9789733886719MB)
88.91903227718497% used
Eden Space:
capacity =6871973888(6553.625MB)
used =6871973888(6553.625MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 858980352 (819.1875MB)
used =2315808(2.208526611328125MB)
free = 856664544 (816.9789733886719MB)
0.26959964737354086% used
To Space:
capacity = 858980352 (819.1875MB)
used = 0 (0.0MB)
free = 858980352 (819.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity =25769803776(24576.0MB)
used =25071826568(23910.357063293457MB)
free = 697977208 (665.642936706543MB)
97.29149195676048% used
Perm Generation:
capacity =68038656(64.88671875MB)
used =40815352(38.92455291748047MB)
free =27223304(25.96216583251953MB)
59.98847478703871% used

Luke Forehand

lukeforehand_2 · January 30, 2012, 4:22pm

Thanks for the replies... Here are some results:

We were definitely over sharding, the JVM settings needed to be tuned,
and we weren't fully taking advantage of our disks. After moving from
weekly partitions (indices) to quarterly, we had copious amounts of
RAM to give back for file caching. We then set number of shards +
replicas to evenly distribute across the nodes. The next issue was
query speed, and we realized that our RAID0 configuration wasn't
cutting it. Now all disks are working evenly and each node is
efficient. We're in a good place now (I think), but the next
optimization we are thinking about is to use routing so that each
shard holds a time interval of data. This would make searches with
smaller date ranges hit less machines to do work, and it efficiently
puts more machines to work as the search date range grows larger.
Does this seem like a good use case for routing?

Thanks,
-Luke Forehand

On Jan 22, 8:51 pm, lukeforehand lukeforeh...@gmail.com wrote:

We have upgraded on dev as well as tweaked newratio to give old gen
more room, and I am awaiting test results. Thanks!

-Luke

On Jan 22, 6:27 pm, Shay Banon kim...@gmail.com wrote:

kimchy · January 31, 2012, 3:49pm

If you already create an index per time interval, then you can use that to control what timespan you search on, I think using time interval for routing within an index might not really be needed.

On Monday, January 30, 2012 at 6:22 PM, lukeforehand wrote:

Thanks for the replies... Here are some results:

We were definitely over sharding, the JVM settings needed to be tuned,
and we weren't fully taking advantage of our disks. After moving from
weekly partitions (indices) to quarterly, we had copious amounts of
RAM to give back for file caching. We then set number of shards +
replicas to evenly distribute across the nodes. The next issue was
query speed, and we realized that our RAID0 configuration wasn't
cutting it. Now all disks are working evenly and each node is
efficient. We're in a good place now (I think), but the next
optimization we are thinking about is to use routing so that each
shard holds a time interval of data. This would make searches with
smaller date ranges hit less machines to do work, and it efficiently
puts more machines to work as the search date range grows larger.
Does this seem like a good use case for routing?

Thanks,
-Luke Forehand

On Jan 22, 8:51 pm, lukeforehand <lukeforeh...@gmail.com (http://gmail.com)> wrote:

We have upgraded on dev as well as tweaked newratio to give old gen
more room, and I am awaiting test results. Thanks!

-Luke

On Jan 22, 6:27 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

lukeforehand_2 · February 1, 2012, 6:16pm

Good point.

On Jan 31, 9:49 am, Shay Banon kim...@gmail.com wrote:

If you already create an index per time interval, then you can use that to control what timespan you search on, I think using time interval for routing within an index might not really be needed.

On Monday, January 30, 2012 at 6:22 PM, lukeforehand wrote:

Thanks for the replies... Here are some results:

We were definitely over sharding, the JVM settings needed to be tuned,
and we weren't fully taking advantage of our disks. After moving from
weekly partitions (indices) to quarterly, we had copious amounts of
RAM to give back for file caching. We then set number of shards +
replicas to evenly distribute across the nodes. The next issue was
query speed, and we realized that our RAID0 configuration wasn't
cutting it. Now all disks are working evenly and each node is
efficient. We're in a good place now (I think), but the next
optimization we are thinking about is to use routing so that each
shard holds a time interval of data. This would make searches with
smaller date ranges hit less machines to do work, and it efficiently
puts more machines to work as the search date range grows larger.
Does this seem like a good use case for routing?

Thanks,
-Luke Forehand

On Jan 22, 8:51 pm, lukeforehand <lukeforeh...@gmail.com (http://gmail.com)> wrote:

We have upgraded on dev as well as tweaked newratio to give old gen
more room, and I am awaiting test results. Thanks!

-Luke

On Jan 22, 6:27 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Where are these TermInfo instances coming from?

jmap -histo:live num #instances #bytes class name

Otis Backhand

jmap -histo:live num #instances #bytes class name

Otis Backhand

jmap -histo:live num #instances #bytes class name

Otis Backhand

jmap -histo:live num #instances #bytes class name

jmap -histo:live num #instances #bytes class name

jmap -histo:live num #instances #bytes class name

jmap -histo:live num #instances #bytes class name

jmap -histo:live num #instances #bytes class name

jmap -histo:live
num #instances #bytes class name

jmap -histo:live
num #instances #bytes class name

jmap -histo:live
num #instances #bytes class name

jmap -histo:live
num #instances #bytes class name

jmap -histo:live
num #instances #bytes class name

jmap -histo:live
num #instances #bytes class name

jmap -histo:live
num #instances #bytes class name

jmap -histo:live
num #instances #bytes class name