I've run into an issue which is preventing me from moving forwards with ES.
I've got an application where I keep 'live' documents in ElasticSearch.
Each document is a combination from data from multiple sources, which are
merged together using doc_as_upsert. Each document has a TTL which is
updated whenever new data comes in for a document, so documents die
whenever no data source has given information about it for a while. The
amount of documents generally doesn't exceed 15.000 so it's a fairly small
data set.
Whenever I leave this running, slowly but surely memory usage on the box
creeps up, seemingly unbounded until there is no more resident memory left.
The Java process nicely keeps within its set ES_MAX_HEAP bounds, but it
seems the mapping from storage on disk to memory is every-increasing, even
when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and whether
there are ways to debug memory usage which is unaccounted for by processes
in 'top'.
I forgot to mention, I'm running Elasticsearch 1.0.1 on Ubuntu 12.04 with
24GB of available RAM.
On Thursday, March 13, 2014 5:07:13 PM UTC-7, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards with
ES. I've got an application where I keep 'live' documents in Elasticsearch.
Each document is a combination from data from multiple sources, which are
merged together using doc_as_upsert. Each document has a TTL which is
updated whenever new data comes in for a document, so documents die
whenever no data source has given information about it for a while. The
amount of documents generally doesn't exceed 15.000 so it's a fairly small
data set.
Whenever I leave this running, slowly but surely memory usage on the box
creeps up, seemingly unbounded until there is no more resident memory left.
The Java process nicely keeps within its set ES_MAX_HEAP bounds, but it
seems the mapping from storage on disk to memory is every-increasing, even
when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
I forgot to mention, I'm running Elasticsearch 1.0.1 on Ubuntu 12.04 with
24GB of available RAM.
On Thursday, March 13, 2014 5:07:13 PM UTC-7, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards with
ES. I've got an application where I keep 'live' documents in Elasticsearch.
Each document is a combination from data from multiple sources, which are
merged together using doc_as_upsert. Each document has a TTL which is
updated whenever new data comes in for a document, so documents die
whenever no data source has given information about it for a while. The
amount of documents generally doesn't exceed 15.000 so it's a fairly small
data set.
Whenever I leave this running, slowly but surely memory usage on the box
creeps up, seemingly unbounded until there is no more resident memory left.
The Java process nicely keeps within its set ES_MAX_HEAP bounds, but it
seems the mapping from storage on disk to memory is every-increasing, even
when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
I believe you are just witnessing the OS caching files in memory. Lucene
(and therefore by extension Elasticsearch) uses a large number of files to
represent segments. TTL + updates will cause even higher file turnover
than usual.
The OS manages all of this caching and will reclaim it for other processes
when needed. Are you experiencing problems, or just witnessing memory
usage? I wouldn't be concerned unless there is an actual problem that you
are seeing.
On Thursday, March 13, 2014 8:07:13 PM UTC-4, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards with
ES. I've got an application where I keep 'live' documents in Elasticsearch.
Each document is a combination from data from multiple sources, which are
merged together using doc_as_upsert. Each document has a TTL which is
updated whenever new data comes in for a document, so documents die
whenever no data source has given information about it for a while. The
amount of documents generally doesn't exceed 15.000 so it's a fairly small
data set.
Whenever I leave this running, slowly but surely memory usage on the box
creeps up, seemingly unbounded until there is no more resident memory left.
The Java process nicely keeps within its set ES_MAX_HEAP bounds, but it
seems the mapping from storage on disk to memory is every-increasing, even
when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
@Mark:
The heap is set to 2GB, using mlockall. The problem occurs with both
OpenJDK7 and OracleJDK7, both the latest versions. I have one index, which
is very small:
index:
{
primary_size_in_bytes: 37710681
size_in_bytes: 37710681
}
@Zachary Our systems are set up to alert when memory is about to run out.
We use Ganglia to monitor our systems and that represents the memory as
'used', rather than 'cached'. I will try to just let it run until memory
runs out and report back after that though.
On Thursday, March 13, 2014 5:17:20 PM UTC-7, Zachary Tong wrote:
I believe you are just witnessing the OS caching files in memory. Lucene
(and therefore by extension Elasticsearch) uses a large number of files to
represent segments. TTL + updates will cause even higher file turnover
than usual.
The OS manages all of this caching and will reclaim it for other processes
when needed. Are you experiencing problems, or just witnessing memory
usage? I wouldn't be concerned unless there is an actual problem that you
are seeing.
On Thursday, March 13, 2014 8:07:13 PM UTC-4, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards with
ES. I've got an application where I keep 'live' documents in Elasticsearch.
Each document is a combination from data from multiple sources, which are
merged together using doc_as_upsert. Each document has a TTL which is
updated whenever new data comes in for a document, so documents die
whenever no data source has given information about it for a while. The
amount of documents generally doesn't exceed 15.000 so it's a fairly small
data set.
Whenever I leave this running, slowly but surely memory usage on the box
creeps up, seemingly unbounded until there is no more resident memory left.
The Java process nicely keeps within its set ES_MAX_HEAP bounds, but it
seems the mapping from storage on disk to memory is every-increasing, even
when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
Cool, curious to see what happens. As an aside, I would recommend
downgrading to Java 1.7.0_u25. There are known bugs in the most recent
Oracle JVM versions which have not been resolved yet. u25 is the most
recent safe version. I don't think that's your problem, but it's a good
general consideration anyway.
-Z
On Thursday, March 13, 2014 8:23:34 PM UTC-4, Jos Kraaijeveld wrote:
@Mark:
The heap is set to 2GB, using mlockall. The problem occurs with both
OpenJDK7 and OracleJDK7, both the latest versions. I have one index, which
is very small:
index:
{
primary_size_in_bytes: 37710681
size_in_bytes: 37710681
}
@Zachary Our systems are set up to alert when memory is about to run out.
We use Ganglia to monitor our systems and that represents the memory as
'used', rather than 'cached'. I will try to just let it run until memory
runs out and report back after that though.
On Thursday, March 13, 2014 5:17:20 PM UTC-7, Zachary Tong wrote:
I believe you are just witnessing the OS caching files in memory. Lucene
(and therefore by extension Elasticsearch) uses a large number of files to
represent segments. TTL + updates will cause even higher file turnover
than usual.
The OS manages all of this caching and will reclaim it for other
processes when needed. Are you experiencing problems, or just witnessing
memory usage? I wouldn't be concerned unless there is an actual problem
that you are seeing.
On Thursday, March 13, 2014 8:07:13 PM UTC-4, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards with
ES. I've got an application where I keep 'live' documents in Elasticsearch.
Each document is a combination from data from multiple sources, which are
merged together using doc_as_upsert. Each document has a TTL which is
updated whenever new data comes in for a document, so documents die
whenever no data source has given information about it for a while. The
amount of documents generally doesn't exceed 15.000 so it's a fairly small
data set.
Whenever I leave this running, slowly but surely memory usage on the box
creeps up, seemingly unbounded until there is no more resident memory left.
The Java process nicely keeps within its set ES_MAX_HEAP bounds, but it
seems the mapping from storage on disk to memory is every-increasing, even
when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
Also, are there other processes running which may be causing the problem?
Does the behavior only happen when ES is running?
On Thursday, March 13, 2014 8:31:18 PM UTC-4, Zachary Tong wrote:
Cool, curious to see what happens. As an aside, I would recommend
downgrading to Java 1.7.0_u25. There are known bugs in the most recent
Oracle JVM versions which have not been resolved yet. u25 is the most
recent safe version. I don't think that's your problem, but it's a good
general consideration anyway.
-Z
On Thursday, March 13, 2014 8:23:34 PM UTC-4, Jos Kraaijeveld wrote:
@Mark:
The heap is set to 2GB, using mlockall. The problem occurs with both
OpenJDK7 and OracleJDK7, both the latest versions. I have one index, which
is very small:
index:
{
primary_size_in_bytes: 37710681
size_in_bytes: 37710681
}
@Zachary Our systems are set up to alert when memory is about to run out.
We use Ganglia to monitor our systems and that represents the memory as
'used', rather than 'cached'. I will try to just let it run until memory
runs out and report back after that though.
On Thursday, March 13, 2014 5:17:20 PM UTC-7, Zachary Tong wrote:
I believe you are just witnessing the OS caching files in memory.
Lucene (and therefore by extension Elasticsearch) uses a large number of
files to represent segments. TTL + updates will cause even higher file
turnover than usual.
The OS manages all of this caching and will reclaim it for other
processes when needed. Are you experiencing problems, or just witnessing
memory usage? I wouldn't be concerned unless there is an actual problem
that you are seeing.
On Thursday, March 13, 2014 8:07:13 PM UTC-4, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards with
ES. I've got an application where I keep 'live' documents in Elasticsearch.
Each document is a combination from data from multiple sources, which are
merged together using doc_as_upsert. Each document has a TTL which is
updated whenever new data comes in for a document, so documents die
whenever no data source has given information about it for a while. The
amount of documents generally doesn't exceed 15.000 so it's a fairly small
data set.
Whenever I leave this running, slowly but surely memory usage on the
box creeps up, seemingly unbounded until there is no more resident memory
left. The Java process nicely keeps within its set ES_MAX_HEAP bounds, but
it seems the mapping from storage on disk to memory is every-increasing,
even when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
There are no other processes running except for ES and the program which
posts the updates. The memory is constantly increasing when the updater is
running, but is stale (and doesn't release the memory at all, no matter how
much is used) whenever ES is idle.
On Thursday, March 13, 2014 5:32:43 PM UTC-7, Zachary Tong wrote:
Also, are there other processes running which may be causing the problem?
Does the behavior only happen when ES is running?
On Thursday, March 13, 2014 8:31:18 PM UTC-4, Zachary Tong wrote:
Cool, curious to see what happens. As an aside, I would recommend
downgrading to Java 1.7.0_u25. There are known bugs in the most recent
Oracle JVM versions which have not been resolved yet. u25 is the most
recent safe version. I don't think that's your problem, but it's a good
general consideration anyway.
-Z
On Thursday, March 13, 2014 8:23:34 PM UTC-4, Jos Kraaijeveld wrote:
@Mark:
The heap is set to 2GB, using mlockall. The problem occurs with both
OpenJDK7 and OracleJDK7, both the latest versions. I have one index, which
is very small:
index:
{
primary_size_in_bytes: 37710681
size_in_bytes: 37710681
}
@Zachary Our systems are set up to alert when memory is about to run
out. We use Ganglia to monitor our systems and that represents the memory
as 'used', rather than 'cached'. I will try to just let it run until memory
runs out and report back after that though.
On Thursday, March 13, 2014 5:17:20 PM UTC-7, Zachary Tong wrote:
I believe you are just witnessing the OS caching files in memory.
Lucene (and therefore by extension Elasticsearch) uses a large number of
files to represent segments. TTL + updates will cause even higher file
turnover than usual.
The OS manages all of this caching and will reclaim it for other
processes when needed. Are you experiencing problems, or just witnessing
memory usage? I wouldn't be concerned unless there is an actual problem
that you are seeing.
On Thursday, March 13, 2014 8:07:13 PM UTC-4, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards
with ES. I've got an application where I keep 'live' documents in
Elasticsearch. Each document is a combination from data from multiple
sources, which are merged together using doc_as_upsert. Each document has a
TTL which is updated whenever new data comes in for a document, so
documents die whenever no data source has given information about it for a
while. The amount of documents generally doesn't exceed 15.000 so it's a
fairly small data set.
Whenever I leave this running, slowly but surely memory usage on the
box creeps up, seemingly unbounded until there is no more resident memory
left. The Java process nicely keeps within its set ES_MAX_HEAP bounds, but
it seems the mapping from storage on disk to memory is every-increasing,
even when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
As a follow-up, when the server is nearing maximum memory, the memory use
stops increasing. This would indeed support Zachary's caching theory,
although I'm still confused as to why it shows up as 'in use' memory rather
than 'cached' memory. In any case, it does not block me right now. It's
just peculiar, and I'll revive this thread once I have a better explanation.
On Thursday, March 13, 2014 5:35:17 PM UTC-7, Jos Kraaijeveld wrote:
There are no other processes running except for ES and the program which
posts the updates. The memory is constantly increasing when the updater is
running, but is stale (and doesn't release the memory at all, no matter how
much is used) whenever ES is idle.
On Thursday, March 13, 2014 5:32:43 PM UTC-7, Zachary Tong wrote:
Also, are there other processes running which may be causing the problem?
Does the behavior only happen when ES is running?
On Thursday, March 13, 2014 8:31:18 PM UTC-4, Zachary Tong wrote:
Cool, curious to see what happens. As an aside, I would recommend
downgrading to Java 1.7.0_u25. There are known bugs in the most recent
Oracle JVM versions which have not been resolved yet. u25 is the most
recent safe version. I don't think that's your problem, but it's a good
general consideration anyway.
-Z
On Thursday, March 13, 2014 8:23:34 PM UTC-4, Jos Kraaijeveld wrote:
@Mark:
The heap is set to 2GB, using mlockall. The problem occurs with both
OpenJDK7 and OracleJDK7, both the latest versions. I have one index, which
is very small:
index:
{
primary_size_in_bytes: 37710681
size_in_bytes: 37710681
}
@Zachary Our systems are set up to alert when memory is about to run
out. We use Ganglia to monitor our systems and that represents the memory
as 'used', rather than 'cached'. I will try to just let it run until memory
runs out and report back after that though.
On Thursday, March 13, 2014 5:17:20 PM UTC-7, Zachary Tong wrote:
I believe you are just witnessing the OS caching files in memory.
Lucene (and therefore by extension Elasticsearch) uses a large number of
files to represent segments. TTL + updates will cause even higher file
turnover than usual.
The OS manages all of this caching and will reclaim it for other
processes when needed. Are you experiencing problems, or just witnessing
memory usage? I wouldn't be concerned unless there is an actual problem
that you are seeing.
On Thursday, March 13, 2014 8:07:13 PM UTC-4, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards
with ES. I've got an application where I keep 'live' documents in
Elasticsearch. Each document is a combination from data from multiple
sources, which are merged together using doc_as_upsert. Each document has a
TTL which is updated whenever new data comes in for a document, so
documents die whenever no data source has given information about it for a
while. The amount of documents generally doesn't exceed 15.000 so it's a
fairly small data set.
Whenever I leave this running, slowly but surely memory usage on the
box creeps up, seemingly unbounded until there is no more resident memory
left. The Java process nicely keeps within its set ES_MAX_HEAP bounds, but
it seems the mapping from storage on disk to memory is every-increasing,
even when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
I have experienced same behavior when I have tried to load large amount of
data... If you clear the file system cache (herehttp://www.delphitools.info/2013/11/29/flush-windows-file-cache/is a link to a tool), the memory drops to the defined heap size.
However this is still looks as a wrong behavior, is there a way to block
the shareable memory upfront ?
All the best,
Yitzhak
On Tuesday, March 18, 2014 12:34:33 AM UTC+2, Jos Kraaijeveld wrote:
As a follow-up, when the server is nearing maximum memory, the memory use
stops increasing. This would indeed support Zachary's caching theory,
although I'm still confused as to why it shows up as 'in use' memory rather
than 'cached' memory. In any case, it does not block me right now. It's
just peculiar, and I'll revive this thread once I have a better explanation.
On Thursday, March 13, 2014 5:35:17 PM UTC-7, Jos Kraaijeveld wrote:
There are no other processes running except for ES and the program which
posts the updates. The memory is constantly increasing when the updater is
running, but is stale (and doesn't release the memory at all, no matter how
much is used) whenever ES is idle.
On Thursday, March 13, 2014 5:32:43 PM UTC-7, Zachary Tong wrote:
Also, are there other processes running which may be causing the
problem? Does the behavior only happen when ES is running?
On Thursday, March 13, 2014 8:31:18 PM UTC-4, Zachary Tong wrote:
Cool, curious to see what happens. As an aside, I would recommend
downgrading to Java 1.7.0_u25. There are known bugs in the most recent
Oracle JVM versions which have not been resolved yet. u25 is the most
recent safe version. I don't think that's your problem, but it's a good
general consideration anyway.
-Z
On Thursday, March 13, 2014 8:23:34 PM UTC-4, Jos Kraaijeveld wrote:
@Mark:
The heap is set to 2GB, using mlockall. The problem occurs with both
OpenJDK7 and OracleJDK7, both the latest versions. I have one index, which
is very small:
index:
{
primary_size_in_bytes: 37710681
size_in_bytes: 37710681
}
@Zachary Our systems are set up to alert when memory is about to run
out. We use Ganglia to monitor our systems and that represents the memory
as 'used', rather than 'cached'. I will try to just let it run until memory
runs out and report back after that though.
On Thursday, March 13, 2014 5:17:20 PM UTC-7, Zachary Tong wrote:
I believe you are just witnessing the OS caching files in memory.
Lucene (and therefore by extension Elasticsearch) uses a large number of
files to represent segments. TTL + updates will cause even higher file
turnover than usual.
The OS manages all of this caching and will reclaim it for other
processes when needed. Are you experiencing problems, or just witnessing
memory usage? I wouldn't be concerned unless there is an actual problem
that you are seeing.
On Thursday, March 13, 2014 8:07:13 PM UTC-4, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards
with ES. I've got an application where I keep 'live' documents in
Elasticsearch. Each document is a combination from data from multiple
sources, which are merged together using doc_as_upsert. Each document has a
TTL which is updated whenever new data comes in for a document, so
documents die whenever no data source has given information about it for a
while. The amount of documents generally doesn't exceed 15.000 so it's a
fairly small data set.
Whenever I leave this running, slowly but surely memory usage on the
box creeps up, seemingly unbounded until there is no more resident memory
left. The Java process nicely keeps within its set ES_MAX_HEAP bounds, but
it seems the mapping from storage on disk to memory is every-increasing,
even when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
I have experienced same behavior when I have tried to load large amount of
data... If you clear the file system cache (herehttp://www.delphitools.info/2013/11/29/flush-windows-file-cache/is a link to a tool), the memory drops to the defined heap size.
However this is still looks as a wrong behavior, is there a way to block
the shareable memory upfront ?
All the best,
Yitzhak
On Tuesday, March 18, 2014 12:34:33 AM UTC+2, Jos Kraaijeveld wrote:
As a follow-up, when the server is nearing maximum memory, the memory use
stops increasing. This would indeed support Zachary's caching theory,
although I'm still confused as to why it shows up as 'in use' memory rather
than 'cached' memory. In any case, it does not block me right now. It's
just peculiar, and I'll revive this thread once I have a better explanation.
On Thursday, March 13, 2014 5:35:17 PM UTC-7, Jos Kraaijeveld wrote:
There are no other processes running except for ES and the program which
posts the updates. The memory is constantly increasing when the updater is
running, but is stale (and doesn't release the memory at all, no matter how
much is used) whenever ES is idle.
On Thursday, March 13, 2014 5:32:43 PM UTC-7, Zachary Tong wrote:
Also, are there other processes running which may be causing the
problem? Does the behavior only happen when ES is running?
On Thursday, March 13, 2014 8:31:18 PM UTC-4, Zachary Tong wrote:
Cool, curious to see what happens. As an aside, I would recommend
downgrading to Java 1.7.0_u25. There are known bugs in the most recent
Oracle JVM versions which have not been resolved yet. u25 is the most
recent safe version. I don't think that's your problem, but it's a good
general consideration anyway.
-Z
On Thursday, March 13, 2014 8:23:34 PM UTC-4, Jos Kraaijeveld wrote:
@Mark:
The heap is set to 2GB, using mlockall. The problem occurs with both
OpenJDK7 and OracleJDK7, both the latest versions. I have one index, which
is very small:
index:
{
primary_size_in_bytes: 37710681
size_in_bytes: 37710681
}
@Zachary Our systems are set up to alert when memory is about to run
out. We use Ganglia to monitor our systems and that represents the memory
as 'used', rather than 'cached'. I will try to just let it run until memory
runs out and report back after that though.
On Thursday, March 13, 2014 5:17:20 PM UTC-7, Zachary Tong wrote:
I believe you are just witnessing the OS caching files in memory.
Lucene (and therefore by extension Elasticsearch) uses a large number of
files to represent segments. TTL + updates will cause even higher file
turnover than usual.
The OS manages all of this caching and will reclaim it for other
processes when needed. Are you experiencing problems, or just witnessing
memory usage? I wouldn't be concerned unless there is an actual problem
that you are seeing.
On Thursday, March 13, 2014 8:07:13 PM UTC-4, Jos Kraaijeveld wrote:
Hey,
I've run into an issue which is preventing me from moving forwards
with ES. I've got an application where I keep 'live' documents in
Elasticsearch. Each document is a combination from data from multiple
sources, which are merged together using doc_as_upsert. Each document has a
TTL which is updated whenever new data comes in for a document, so
documents die whenever no data source has given information about it for a
while. The amount of documents generally doesn't exceed 15.000 so it's a
fairly small data set.
Whenever I leave this running, slowly but surely memory usage on
the box creeps up, seemingly unbounded until there is no more resident
memory left. The Java process nicely keeps within its set ES_MAX_HEAP
bounds, but it seems the mapping from storage on disk to memory is
every-increasing, even when the amount of 'live' documents goes to 0.
I was wondering if anyone has seen such a memory problem before and
whether there are ways to debug memory usage which is unaccounted for by
processes in 'top'.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.