Problems with GrayLog2 + ES setup [long]


(MichaelGlad) #1

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(Radu Gheorghe) #2

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(Michael Sick) #3

Also, if your inserts are slow are you using bulk inserts?
http://www.elasticsearch.org/guide/reference/api/bulk.html

On Tue, Mar 27, 2012 at 5:52 AM, Radu Gheorghe radu0gheorghe@gmail.comwrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(MichaelGlad) #4

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(MichaelGlad) #5

I've looked at the GL2 sources at GitHub, and it seems that the bulk
API is used.

  • Michael

On 27 Mar., 14:35, Michael Sick michael.s...@serenesoftware.com
wrote:

Also, if your inserts are slow are you using bulk inserts?http://www.elasticsearch.org/guide/reference/api/bulk.html

On Tue, Mar 27, 2012 at 5:52 AM, Radu Gheorghe radu0gheor...@gmail.comwrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(Radu Gheorghe) #6

Hi Michael,

Nice to hear things got better :slight_smile:

Here are some other stuff you might want to try:

On Mar 27, 9:42 pm, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(Shay Banon) #7

What type of searches were being executed? It certainly might be that the
ES process needed more memory to accommodate those (especially if sorting /
facets are being used).

On Tue, Mar 27, 2012 at 8:42 PM, MichaelGlad glad@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(MichaelGlad) #8

Hi thank you for the suggestions, could it be the case the the extreme
CPU usage I'm seeing a couple of hours following each start-up is ES
doing optimization
by its own initative? My shards are 140gigs each and as I understand
that each shard is a lucene index
this might be resource intensive.

  • Michael

On 28 Mar., 07:46, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

Nice to hear things got better :slight_smile:

Here are some other stuff you might want to try:

On Mar 27, 9:42 pm, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(MichaelGlad) #9

It is relatively simple queries like

gotit AND glad AND viagra

to see if the filter has caught any viagra spammails bound for me.

On 28 Mar., 12:55, Shay Banon kim...@gmail.com wrote:

What type of searches were being executed? It certainly might be that the
ES process needed more memory to accommodate those (especially if sorting /
facets are being used).

On Tue, Mar 27, 2012 at 8:42 PM, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(Radu Gheorghe) #10

AFAIK Graylog does a "match_all" and sorts by date each time you open
the interface, and then at a certain interval. I guess that's expected
for any "logging" solution.

How would that impact the memory requirements? I mean, I would expect
to need more memory, but by how much?

On Mar 28, 1:55 pm, Shay Banon kim...@gmail.com wrote:

What type of searches were being executed? It certainly might be that the
ES process needed more memory to accommodate those (especially if sorting /
facets are being used).

On Tue, Mar 27, 2012 at 8:42 PM, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(Shay Banon) #11

Hard to say "how much", basically, when sorting, teh value for the field
are loaded to memory to do it. You can see the current usage of it under
the node stats API, under field cache.

On Thu, Mar 29, 2012 at 8:25 AM, Radu Gheorghe radu0gheorghe@gmail.comwrote:

AFAIK Graylog does a "match_all" and sorts by date each time you open
the interface, and then at a certain interval. I guess that's expected
for any "logging" solution.

How would that impact the memory requirements? I mean, I would expect
to need more memory, but by how much?

On Mar 28, 1:55 pm, Shay Banon kim...@gmail.com wrote:

What type of searches were being executed? It certainly might be that the
ES process needed more memory to accommodate those (especially if
sorting /
facets are being used).

On Tue, Mar 27, 2012 at 8:42 PM, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no
disk

activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about
    1

billion .

  • I currently have some 800 million mesages on disk, using about
    5x140

gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local
    shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a
VMware

environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(Radu Gheorghe) #12

I see. Thanks a lot!

On Mar 29, 5:15 pm, Shay Banon kim...@gmail.com wrote:

Hard to say "how much", basically, when sorting, teh value for the field
are loaded to memory to do it. You can see the current usage of it under
the node stats API, under field cache.

On Thu, Mar 29, 2012 at 8:25 AM, Radu Gheorghe radu0gheor...@gmail.comwrote:

AFAIK Graylog does a "match_all" and sorts by date each time you open
the interface, and then at a certain interval. I guess that's expected
for any "logging" solution.

How would that impact the memory requirements? I mean, I would expect
to need more memory, but by how much?

On Mar 28, 1:55 pm, Shay Banon kim...@gmail.com wrote:

What type of searches were being executed? It certainly might be that the
ES process needed more memory to accommodate those (especially if
sorting /
facets are being used).

On Tue, Mar 27, 2012 at 8:42 PM, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no
disk

activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about
    1

billion .

  • I currently have some 800 million mesages on disk, using about
    5x140

gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local
    shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a
VMware

environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(MichaelGlad) #13

Thank your for all your kind help. I've learned a lot about ES.
I've located a reason why my searches are slow -- GL2 makes the
following Ruby call each time it returns
to the main screen - eg. after doing a search:

search("*", :size => 0).total

With close to a billion messages, that take some time :slight_smile:

I've now hacked the source, disabling the check, and now searches are
much faster. I'll contact the GL2
author and suggest adding a proper option to disable counting the
number of stored log entries
all the time.

  • Michael

On 28 Mar., 20:29, MichaelGlad g...@aysabtu.dk wrote:

It is relatively simple queries like

gotit AND glad AND viagra

to see if the filter has caught any viagra spammails bound for me.

On 28 Mar., 12:55, Shay Banon kim...@gmail.com wrote:

What type of searches were being executed? It certainly might be that the
ES process needed more memory to accommodate those (especially if sorting /
facets are being used).

On Tue, Mar 27, 2012 at 8:42 PM, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've also
increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of CPU
time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query time
would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating the
index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation. It
seems I've hit a wall. Java is CPU bound and there's next to no disk
activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's about 1
    billion .
  • I currently have some 800 million mesages on disk, using about 5x140
    gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on each.

The two VMs with shards have 32 gigs of memory and 4 cores in a VMware
environment.

I've applied the following changes to the bin/elasticsearch script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out memory
and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should I
rather change the ES configuration?

Regards, Michael


(Shay Banon) #14

A faster way to get the total document is using the count API (or search
with search_type set to count) with match_all query, yes an even faster way
is possibly to use index stats API with docs:
http://www.elasticsearch.org/guide/reference/api/admin-indices-stats.htmland
use the num docs on primary stats (thats assuming no filtering based
on
type is done).

On Sat, Mar 31, 2012 at 12:03 AM, MichaelGlad glad@aysabtu.dk wrote:

Thank your for all your kind help. I've learned a lot about ES.
I've located a reason why my searches are slow -- GL2 makes the
following Ruby call each time it returns
to the main screen - eg. after doing a search:

search("*", :size => 0).total

With close to a billion messages, that take some time :slight_smile:

I've now hacked the source, disabling the check, and now searches are
much faster. I'll contact the GL2
author and suggest adding a proper option to disable counting the
number of stored log entries
all the time.

  • Michael

On 28 Mar., 20:29, MichaelGlad g...@aysabtu.dk wrote:

It is relatively simple queries like

gotit AND glad AND viagra

to see if the filter has caught any viagra spammails bound for me.

On 28 Mar., 12:55, Shay Banon kim...@gmail.com wrote:

What type of searches were being executed? It certainly might be that
the

ES process needed more memory to accommodate those (especially if
sorting /

facets are being used).

On Tue, Mar 27, 2012 at 8:42 PM, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've
also

increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of
CPU

time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query
time

would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating
the

index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation.
It

seems I've hit a wall. Java is CPU bound and there's next to no
disk

activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's
    about 1

billion .

  • I currently have some 800 million mesages on disk, using about
    5x140

gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local
    shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on
    each.

The two VMs with shards have 32 gigs of memory and 4 cores in a
VMware

environment.

I've applied the following changes to the bin/elasticsearch
script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out
memory

and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should
I

rather change the ES configuration?

Regards, Michael


(Radu Gheorghe) #15

We have a slightly different approach. Please let me know if you see
something wrong with it.

Currently, we are using the "hits" return value when we are doing any
search. So when you go to the main page, we have to do a "match_all"
query sorted by date to give the last logs. The "hits" to that is the
total number of logs.

Then, when we're doing a specific search, we still get a number of
hits and we show that number. Even though the interface itself will
only show a limited number of results.

On 1 apr., 00:09, Shay Banon kim...@gmail.com wrote:

A faster way to get the total document is using the count API (or search
with search_type set to count) with match_all query, yes an even faster way
is possibly to use index stats API with docs:http://www.elasticsearch.org/guide/reference/api/admin-indices-stats....
use the num docs on primary stats (thats assuming no filtering based
on
type is done).

On Sat, Mar 31, 2012 at 12:03 AM, MichaelGlad g...@aysabtu.dk wrote:

Thank your for all your kind help. I've learned a lot about ES.
I've located a reason why my searches are slow -- GL2 makes the
following Ruby call each time it returns
to the main screen - eg. after doing a search:

search("*", :size => 0).total

With close to a billion messages, that take some time :slight_smile:

I've now hacked the source, disabling the check, and now searches are
much faster. I'll contact the GL2
author and suggest adding a proper option to disable counting the
number of stored log entries
all the time.

  • Michael

On 28 Mar., 20:29, MichaelGlad g...@aysabtu.dk wrote:

It is relatively simple queries like

gotit AND glad AND viagra

to see if the filter has caught any viagra spammails bound for me.

On 28 Mar., 12:55, Shay Banon kim...@gmail.com wrote:

What type of searches were being executed? It certainly might be that
the

ES process needed more memory to accommodate those (especially if
sorting /

facets are being used).

On Tue, Mar 27, 2012 at 8:42 PM, MichaelGlad g...@aysabtu.dk wrote:

Hi Raud, inserts are fast enough to keep up with the ~200 incoming
messages / sec without piling up
in the Graylog2 queue. The problem are the serious CPU usage ( 2 x 3
cores ) and very
slow query response (~40-50 seconds). The storage is almost idle.

I've now per your suggestions increased refresh interval to 10 secs
and enabled compression.
I'll have to consult the GL2 sources to see if it uses _all. I've
also

increased the max Java heap size to 24 out
of the 32G RAM and restarted ES. After having spent 2 x 7 hours of
CPU

time starting up, search performance
is now acceptable (10 secs) and the two ES data nodes only use some
10-20% CPU each.

So it seems I'm back to a useful state of the world. Faster query
time

would be nice though.

  • Michael

On 27 Mar., 11:52, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Michael,

So what exactly happened? Inserts work slowly, or queries? Or both?

I'm also using Elasticsearch for logging on VMs (although I'm not
using Graylog) and my "wall" was on inserts, and it was heavily
influenced by the storage speed. You kind of need that, since I can
bet your index size is much bigger than the total size of RAM.

Some of the stuff you can do is to increase the number of shards if
you have no replicas configured (although this implies recreating
the

index AFAIK). And we've also did here the following:

  • increasted the refresh interval to 3
  • compressed _source
  • disabled _all (you need to check first if Graylog uses _all for
    searching :D)

On Mar 26, 11:40 pm, MichaelGlad g...@aysabtu.dk wrote:

I'm having troubles with a my GrayLog2 + ES 0.19.0 installation.
It

seems I've hit a wall. Java is CPU bound and there's next to no
disk

activity.
As I believe that the cure could be tuning ES or adding hardware
resources, I'm posting to this list.

The indexing capacity I would like is:

  • Documents: long syslog mesages (160 chars avg) from a busy mail
    filter
  • Message load exceeding 200 messages/second
  • Capacity for storing 60 days worth of log messages, that's
    about 1

billion .

  • I currently have some 800 million mesages on disk, using about
    5x140

gigs of disk.

My setup is ( I'm using 64 bit Redhat Linux)

  • One VM running GrayLog2 server + a ES server with no local
    shards.
  • Two VMs with a total of 5 shards distributed with 3 / 2 on
    each.

The two VMs with shards have 32 gigs of memory and 4 cores in a
VMware

environment.

I've applied the following changes to the bin/elasticsearch
script:

ulimit -n 60000 # fs.file-max = 131000
ulimit -l unlimited
export ES_HEAP_SIZE=16g
export JAVA_HOME=/usr/java/latest # SUN JRE 1.6.31
export JAVA_OPTS="-Xloggc:/tmp/gc"

and disabled the swap area to prevent swapping.

Java on the two VMs containing shards, use about 12 gigs out
memory

and all CPU resources.
The first VM, is not significantly loaded.

What could a solution be -- "Kiwi" (kill it with Iron) or should
I

rather change the ES configuration?

Regards, Michael


(system) #16