Hitting some limit on ElasticSearch

Hello all, I am using ES with Graylog2 indexer to index all my machines'
logs which are around 500 million a day. The problem manifests itself when
Graylog server stops indexing any new message into ES. The Graylog server
seems to be just fine and it feels like I am hitting somekind of threshold
on ES side. Here is my configuration :

I have 14 x ( 8 CPU, 64GB RAM, 1TB RAID 50 Array ) machines. Entire RAM is
dedicated to ES exclusively. I have 14 shards with 1 replica and with just
one Graylog2 Index and it's 99.99% write only index. With 1.5 billion
messages the disks were around 31% utilized. The CPUs were around 25%
utilized and Virtual memory was 62G but the resident memory was 40G only.

I am running 0.19.1 ES. And last week I moved to Graylog2 0.9.7 version
which has embedded ES client. My log messages throughput is around 500
million a day and I would like to keep atleast 1 week of data into the
cluster.

I have not changed any JVM settings on ES except the Heap memory.

To resolve the problem, I have to delete the entire index everytime and
then things start working fine again until it the index reaches around 1.5
billion messages again. I have tried with more shards but same results
everytime.

Any pointers what I should look at ?

Thanks,
-Amit Mohan

Hi Amit,

Entire RAM is dedicated to ES exclusively.

If the above is true, then that's the problem. Lower your heap to, say,
8GB.

And you grab SPM for Elasticsearch from Sematext Monitoring | Infrastructure Monitoring Service
you'll see your shards bounce around the cluster after you restart your
nodes. :slight_smile:

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Thursday, April 12, 2012 10:01:56 PM UTC-4, Amit Mohan wrote:

Hello all, I am using ES with Graylog2 indexer to index all my machines'
logs which are around 500 million a day. The problem manifests itself when
Graylog server stops indexing any new message into ES. The Graylog server
seems to be just fine and it feels like I am hitting somekind of threshold
on ES side. Here is my configuration :

I have 14 x ( 8 CPU, 64GB RAM, 1TB RAID 50 Array ) machines. Entire RAM is
dedicated to ES exclusively. I have 14 shards with 1 replica and with just
one Graylog2 Index and it's 99.99% write only index. With 1.5 billion
messages the disks were around 31% utilized. The CPUs were around 25%
utilized and Virtual memory was 62G but the resident memory was 40G only.

I am running 0.19.1 ES. And last week I moved to Graylog2 0.9.7 version
which has embedded ES client. My log messages throughput is around 500
million a day and I would like to keep atleast 1 week of data into the
cluster.

I have not changed any JVM settings on ES except the Heap memory.

To resolve the problem, I have to delete the entire index everytime and
then things start working fine again until it the index reaches around 1.5
billion messages again. I have tried with more shards but same results
everytime.

Any pointers what I should look at ?

Thanks,
-Amit Mohan

As Otis mentioned, I would recommend allocating around 22gb to start with
to ES (and make sure you use latest 1.6/6 Java version). The reason I
recommend to start with 22gb heap allocation is because in this case, the
JVM can do memory optimizations to effectively take less memory (pointer
compression).

Next, I would recommend using rolling indices. I am not sure how graylog
index the data, but for logging type data, its best to create an index per
"time span", like day for example. The reason for that is the fact that
deleting old indices is a snap, while deleting docs from an index will be
considerably more expensive. You can always search over more than one
index. But, it boils down to graylog and seeing if it supports it or not.

On Fri, Apr 13, 2012 at 5:01 AM, Amit Mohan amisakrenvan@gmail.com wrote:

Hello all, I am using ES with Graylog2 indexer to index all my machines'
logs which are around 500 million a day. The problem manifests itself when
Graylog server stops indexing any new message into ES. The Graylog server
seems to be just fine and it feels like I am hitting somekind of threshold
on ES side. Here is my configuration :

I have 14 x ( 8 CPU, 64GB RAM, 1TB RAID 50 Array ) machines. Entire RAM is
dedicated to ES exclusively. I have 14 shards with 1 replica and with just
one Graylog2 Index and it's 99.99% write only index. With 1.5 billion
messages the disks were around 31% utilized. The CPUs were around 25%
utilized and Virtual memory was 62G but the resident memory was 40G only.

I am running 0.19.1 ES. And last week I moved to Graylog2 0.9.7 version
which has embedded ES client. My log messages throughput is around 500
million a day and I would like to keep atleast 1 week of data into the
cluster.

I have not changed any JVM settings on ES except the Heap memory.

To resolve the problem, I have to delete the entire index everytime and
then things start working fine again until it the index reaches around 1.5
billion messages again. I have tried with more shards but same results
everytime.

Any pointers what I should look at ?

Thanks,
-Amit Mohan

Thanks guys! I am going to give 22GB RAM to ES and see how it behaves over
next couple days. Will post the results here as and when I have those.

Otis : Thanks for pointing me to Sematext ! It certainly looks something
very interesting and I am going to give it a try.

-Amit Mohan

On Friday, April 13, 2012 8:45:19 AM UTC-4, kimchy wrote:

As Otis mentioned, I would recommend allocating around 22gb to start with
to ES (and make sure you use latest 1.6/6 Java version). The reason I
recommend to start with 22gb heap allocation is because in this case, the
JVM can do memory optimizations to effectively take less memory (pointer
compression).

Next, I would recommend using rolling indices. I am not sure how graylog
index the data, but for logging type data, its best to create an index per
"time span", like day for example. The reason for that is the fact that
deleting old indices is a snap, while deleting docs from an index will be
considerably more expensive. You can always search over more than one
index. But, it boils down to graylog and seeing if it supports it or not.

On Fri, Apr 13, 2012 at 5:01 AM, Amit Mohan amisakrenvan@gmail.comwrote:

Hello all, I am using ES with Graylog2 indexer to index all my machines'
logs which are around 500 million a day. The problem manifests itself when
Graylog server stops indexing any new message into ES. The Graylog server
seems to be just fine and it feels like I am hitting somekind of threshold
on ES side. Here is my configuration :

I have 14 x ( 8 CPU, 64GB RAM, 1TB RAID 50 Array ) machines. Entire RAM
is dedicated to ES exclusively. I have 14 shards with 1 replica and with
just one Graylog2 Index and it's 99.99% write only index. With 1.5 billion
messages the disks were around 31% utilized. The CPUs were around 25%
utilized and Virtual memory was 62G but the resident memory was 40G only.

I am running 0.19.1 ES. And last week I moved to Graylog2 0.9.7 version
which has embedded ES client. My log messages throughput is around 500
million a day and I would like to keep atleast 1 week of data into the
cluster.

I have not changed any JVM settings on ES except the Heap memory.

To resolve the problem, I have to delete the entire index everytime and
then things start working fine again until it the index reaches around 1.5
billion messages again. I have tried with more shards but same results
everytime.

Any pointers what I should look at ?

Thanks,
-Amit Mohan

Hi Amit,

You might also find something helpful from here:
http://elasticsearch-users.115913.n3.nabble.com/Problems-with-GrayLog2-ES-setup-long-td3859362.html

On Apr 13, 3:54 pm, Amit Mohan amisakren...@gmail.com wrote:

Thanks guys! I am going to give 22GB RAM to ES and see how it behaves over
next couple days. Will post the results here as and when I have those.

Otis : Thanks for pointing me to Sematext ! It certainly looks something
very interesting and I am going to give it a try.

-Amit Mohan

On Friday, April 13, 2012 8:45:19 AM UTC-4, kimchy wrote:

As Otis mentioned, I would recommend allocating around 22gb to start with
to ES (and make sure you use latest 1.6/6 Java version). The reason I
recommend to start with 22gb heap allocation is because in this case, the
JVM can do memory optimizations to effectively take less memory (pointer
compression).

Next, I would recommend using rolling indices. I am not sure how graylog
index the data, but for logging type data, its best to create an index per
"time span", like day for example. The reason for that is the fact that
deleting old indices is a snap, while deleting docs from an index will be
considerably more expensive. You can always search over more than one
index. But, it boils down to graylog and seeing if it supports it or not.

On Fri, Apr 13, 2012 at 5:01 AM, Amit Mohan amisakren...@gmail.comwrote:

Hello all, I am using ES with Graylog2 indexer to index all my machines'
logs which are around 500 million a day. The problem manifests itself when
Graylog server stops indexing any new message into ES. The Graylog server
seems to be just fine and it feels like I am hitting somekind of threshold
on ES side. Here is my configuration :

I have 14 x ( 8 CPU, 64GB RAM, 1TB RAID 50 Array ) machines. Entire RAM
is dedicated to ES exclusively. I have 14 shards with 1 replica and with
just one Graylog2 Index and it's 99.99% write only index. With 1.5 billion
messages the disks were around 31% utilized. The CPUs were around 25%
utilized and Virtual memory was 62G but the resident memory was 40G only.

I am running 0.19.1 ES. And last week I moved to Graylog2 0.9.7 version
which has embedded ES client. My log messages throughput is around 500
million a day and I would like to keep atleast 1 week of data into the
cluster.

I have not changed any JVM settings on ES except the Heap memory.

To resolve the problem, I have to delete the entire index everytime and
then things start working fine again until it the index reaches around 1.5
billion messages again. I have tried with more shards but same results
everytime.

Any pointers what I should look at ?

Thanks,
-Amit Mohan