Memory problem

hi everyone,

we indexed 250 million docs on 6 es nodes. es's fast index speed and
relatively low resource (both cpu and memory) requirement is very
good. the average index-speed for 1000 docs is about 100ms.

we did several performance tests, and they were all passed. but when
we reindex all the docs (delete and recreate index & mappings before
that), it becomes quite slow. about 1600ms for 1000 docs on average.
so we restart all the nodes, and the index speed become 100ms again.

i guess it's the high memory consumption that hurts the index speed.
because bigdesk shows that, about 10g virtual memory were taken by es
on every node. and it seems that they were not reclaimed by gc.

so what can i do to make index speed fast without restarting nodes?
delete & recreate index won't help, as i already did that.

any sugguestions?

ps, my environment:

es version:0.19.9 with 6 shards and no replica.
jdk: 1.6u25 (64 bit)
memory: 5g(physical) + 2g(swap)
heap-size: 3.5g
index file size:60g(most fields are not stored)
JAVA_OPTS='-Xmx3500m -Xms3500m -Xmn768m -XX:PermSize=128m -
XX:MaxPermSize=128m -Xss256k -XX:+DisableExplicitGC'

--

ps,

our performance tests contain several scan-type search, which may
download about 2 million docs from es. we see that the es node, which
we fetch data from, did 2-3 times fullGC during the fetch, and also
the swap memory has been used. but the contents can be downloaded
properly. will that affect the memory performance?

also fix a typo.... it's 10g 5g virtual memory were taken by
es
on every node

On Nov 1, 5:06 pm, Brian Hu crocodile...@gmail.com wrote:

hi everyone,

we indexed 250 million docs on 6 es nodes. es's fast index speed and
relatively low resource (both cpu and memory) requirement is very
good. the average index-speed for 1000 docs is about 100ms.

we did several performance tests, and they were all passed. but when
we reindex all the docs (delete and recreate index & mappings before
that), it becomes quite slow. about 1600ms for 1000 docs on average.
so we restart all the nodes, and the index speed become 100ms again.

i guess it's the high memory consumption that hurts the index speed.
because bigdesk shows that, about 10g virtual memory were taken by es
on every node. and it seems that they were not reclaimed by gc.

so what can i do to make index speed fast without restarting nodes?
delete & recreate index won't help, as i already did that.

any sugguestions?

ps, my environment:

es version:0.19.9 with 6 shards and no replica.
jdk: 1.6u25 (64 bit)
memory: 5g(physical) + 2g(swap)
heap-size: 3.5g
index file size:60g(most fields are not stored)
JAVA_OPTS='-Xmx3500m -Xms3500m -Xmn768m -XX:PermSize=128m -
XX:MaxPermSize=128m -Xss256k -XX:+DisableExplicitGC'

--

Hi,

from your given parameters, all looks reasonable. But, as you observed,
swap is used, which kills performance, so you need to reduce the max heap
size. You also have to allow some RAM for usage by the OS and the buffer
cache of the filesystem. Another option would be mlockall to force the
complete ES process to stay in RAM.

Don't worry about virtual mem size, only resident mem size counts.

If you don't delete the index before reindexing, ES will use the existing
Lucene data, and this will take longer.

Lucene index merge operations will fill up your heap which is expected once
they step in. When the merging is over, it is up to the gc to clean the
heap. Just the same for the scan search operation, over 2 mio docs, this
fills up the heap pretty much, as you observed.

Best regards,

Jörg

On Thursday, November 1, 2012 10:06:07 AM UTC+1, Brian Hu wrote:

hi everyone,

we indexed 250 million docs on 6 es nodes. es's fast index speed and
relatively low resource (both cpu and memory) requirement is very
good. the average index-speed for 1000 docs is about 100ms.

we did several performance tests, and they were all passed. but when
we reindex all the docs (delete and recreate index & mappings before
that), it becomes quite slow. about 1600ms for 1000 docs on average.
so we restart all the nodes, and the index speed become 100ms again.

i guess it's the high memory consumption that hurts the index speed.
because bigdesk shows that, about 10g virtual memory were taken by es
on every node. and it seems that they were not reclaimed by gc.

so what can i do to make index speed fast without restarting nodes?
delete & recreate index won't help, as i already did that.

any sugguestions?

ps, my environment:

es version:0.19.9 with 6 shards and no replica.
jdk: 1.6u25 (64 bit)
memory: 5g(physical) + 2g(swap)
heap-size: 3.5g
index file size:60g(most fields are not stored)
JAVA_OPTS='-Xmx3500m -Xms3500m -Xmn768m -XX:PermSize=128m -
XX:MaxPermSize=128m -Xss256k -XX:+DisableExplicitGC'

--

Hi Brian,

My first guess is that if you clean (or remove) the index before reindexing
you will see better indexing performance.
Search performance should not be affected by the above.

And if you see swapping, then you probably don't have enough RAM. You can
consider lowering Xmx if it's very high. You can (no, you SHOULD!) also
set vm.swappiness to 0 (you probably have it set to 60 like 99% of people
out there :))

If you want to keep performance metrics for longer, see
Sematext Monitoring | Infrastructure Monitoring Service - just made it real-time the other day
and adding a few more ES TCP/HTTP-level metrics for the release later this
month.

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Thursday, November 1, 2012 6:56:23 AM UTC-4, Brian Hu wrote:

ps,

our performance tests contain several scan-type search, which may
download about 2 million docs from es. we see that the es node, which
we fetch data from, did 2-3 times fullGC during the fetch, and also
the swap memory has been used. but the contents can be downloaded
properly. will that affect the memory performance?

also fix a typo.... it's 10g 5g virtual memory were taken by
es
on every node

On Nov 1, 5:06 pm, Brian Hu crocodile...@gmail.com wrote:

hi everyone,

we indexed 250 million docs on 6 es nodes. es's fast index speed and
relatively low resource (both cpu and memory) requirement is very
good. the average index-speed for 1000 docs is about 100ms.

we did several performance tests, and they were all passed. but when
we reindex all the docs (delete and recreate index & mappings before
that), it becomes quite slow. about 1600ms for 1000 docs on average.
so we restart all the nodes, and the index speed become 100ms again.

i guess it's the high memory consumption that hurts the index speed.
because bigdesk shows that, about 10g virtual memory were taken by es
on every node. and it seems that they were not reclaimed by gc.

so what can i do to make index speed fast without restarting nodes?
delete & recreate index won't help, as i already did that.

any sugguestions?

ps, my environment:

es version:0.19.9 with 6 shards and no replica.
jdk: 1.6u25 (64 bit)
memory: 5g(physical) + 2g(swap)
heap-size: 3.5g
index file size:60g(most fields are not stored)
JAVA_OPTS='-Xmx3500m -Xms3500m -Xmn768m -XX:PermSize=128m -
XX:MaxPermSize=128m -Xss256k -XX:+DisableExplicitGC'

--