Elastic Search is crashing after I add the metrics{ } in filters and format => ... in output

Ben · June 2, 2015, 9:12pm

if [type] == "apacheaccesslog" {
grok {
patterns_dir => "./patterns"
match => {"message" => "%{APACHE_ACCESS_LOG}" }
break_on_match => false
}
grok {
match => {"request_uri" => "%{URIPATHPARAM:request_url}"}
break_on_match => false
}
grok {
match => ["request_url", "(?<\search>(?=^/s?).)"]
break_on_match => false
}
grok {
match => ["request_url", "(?<\all_urls>(?=^/).*)"]
break_on_match => true
remove_tag => [ "_grokparsefailure" ]
}
metrics {
meter => [ "search.%{search}", "all_urls.%{all_urls}" ]
add_tag => "metric"
}
}
}

output {
elasticsearch { host => localhost
cluster => test_cluster }
if "metric" in [tags] {
stdout {
codec => line {
format => "count: %{search.count}, %{all_urls.count}"
}

This is what I am using as my logstash.conf. The moment I added metrics{} in filters and format => in output, my elastic search nodes start to crash after 1 minute of processing. I have 3 nodes with 4 gb jvm and processing just a normal apache access logs. I didn't think i was doing anything weird. Wanted to see if you guys have different opinion.

Thanks in advance.
Ben

Ben · June 2, 2015, 9:18pm

Oops, I think I see the problem. This is not the config that I am using . Let me update the config.

Ben · June 2, 2015, 9:29pm

Hmm not sure why my patterns are not showing up here. I just realized that it is missing when I update it. looks like anything with <> pattern doesn't show up here. I have word "search" in the first grok patters that has (?=^/s?) and "all_urls" in the second grok pattern that has (?=^/).).

Sorry for the confusion.

warkolm · June 2, 2015, 10:06pm

Try using code brackets to quote your config, that's a (backtick) then your code, then another. Or highlight the text and use the </> button to format it.

Ben · June 3, 2015, 12:20am

Good to know. I have updated the original config. Hopefully, now I get some feedback on what am I doing wrong.

warkolm · June 3, 2015, 12:28am

It seems unlikely this would crash ES, what do the logs for ES show?

Ben · June 3, 2015, 3:24pm

I am seeing this in my logs as output.

Jun 03, 2015 11:10:55 AM org.elasticsearch.node.internal.InternalNode start
INFO: [logstash-m218176apss3262-26127-7964] started
count: %{search.count}, %{all_urls}
count: %{search.count}, %{all_urls}
count: %{search.count}, %{all_urls}
count: %{search.count}, %{all_urls}
count: %{search.count}, %{all_urls}
count: %{search.count}, %{all_urls}

Shouldn't this show up as actual number ?

Ben · June 3, 2015, 4:19pm

This is what I see in my ES server. I increased memory to 20gb from 4 gb but still running into problems.

[2015-06-03 12:11:43,693][INFO ][cluster.metadata ] [Frankenstein's Monster] [logstash-2015.06.03] update_mapping [logs] (dynamic)
[2015-06-03 12:11:58,741][INFO ][cluster.metadata ] [Frankenstein's Monster] [logstash-2015.06.03] update_mapping [logs] (dynamic)
[2015-06-03 12:12:44,655][WARN ][monitor.jvm ] [Frankenstein's Monster] [gc][old][3569][29] duration [38.9s], collections [2]/[40.9s], total [38.9s]/[4.8m], memory [15.4gb]->[14.4gb]/[19.8gb], all_pools {[young] [21.3mb]->[1.1gb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [15.2gb]->[13.3gb]/[18.5gb]}
[2015-06-03 12:13:19,559][WARN ][monitor.jvm ] [Frankenstein's Monster] [gc][old][3571][30] duration [32.8s], collections [1]/[33.7s], total [32.8s]/[5.3m], memory [16.2gb]->[17.9gb]/[19.8gb], all_pools {[young] [380.5kb]->[20mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [16.1gb]->[17.9gb]/[18.5gb]}
[2015-06-03 12:13:49,039][WARN ][monitor.jvm ] [Frankenstein's Monster] [gc][old][3572][31] duration [29.2s], collections [1]/[29.4s], total [29.2s]/[5.8m], memory [17.9gb]->[18.9gb]/[19.8gb], all_pools {[young] [20mb]->[432.7mb]/[1.1gb]}{[survivor] [0b]->[0b]/[149.7mb]}{[old] [17.9gb]->[18.5gb]/[18.5gb]}
[2015-06-03 12:14:21,784][WARN ][monitor.jvm ] [Frankenstein's Monster] [gc][old][3573][32] duration [32.4s], collections [1]/[32.7s], total [32.4s]/[6.4m], memory [18.9gb]->[19.7gb]/[19.8gb], all_pools {[young] [432.7mb]->[1.1gb]/[1.1gb]}{[survivor] [0b]->[12.6mb]/[149.7mb]}{[old] [18.5gb]->[18.5gb]/[18.5gb]}
[2015-06-03 12:14:51,325][WARN ][monitor.jvm ] [Frankenstein's Monster] [gc][old][3574][33] duration [29.2s], collections [1]/[29.5s], total [29.2s]/[6.9m], memory [19.7gb]->[19.8gb]/[19.8gb], all_pools {[young] [1.1gb]->[1.1gb]/[1.1gb]}{[survivor] [12.6mb]->[128.1mb]/[149.7mb]}{[old] [18.5gb]->[18.5gb]/[18.5gb]}
[2015-06-03 12:15:23,215][WARN ][monitor.jvm ] [Frankenstein's Monster] [gc][old][3575][34] duration [31.8s], collections [1]/[31.8s], total [31.8s]/[7.4m], memory [19.8gb]->[19.8gb]/[19.8gb], all_pools {[young] [1.1gb]->[1.1gb]/[1.1gb]}{[survivor] [128.1mb]->[148.5mb]/[149.7mb]}{[old] [18.5gb]->[18.5gb]/[18.5gb]}
[2015-06-03 12:16:24,543][WARN ][monitor.jvm ] [Frankenstein's Monster] [gc][old][3576][35] duration [28.9s], collections [1]/[28.9s], total [28.9s]/[7.9m], memory [19.8gb]->[19.8gb]/[19.8gb], all_pools {[young] [1.1gb]->[1.1gb]/[1.1gb]}{[survivor] [148.5mb]->[149.7mb]/[149.7mb]}{[old] [18.5gb]->[18.5gb]/[18.5gb]}
[2015-06-03 12:16:56,497][WARN ][monitor.jvm ] [Frankenstein's Monster] [gc][old][3577][37] duration [1m], collections [2]/[1m], total [1m]/[9m], memory [19.8gb]->[19.8gb]/[19.8gb], all_pools {[young] [1.1gb]->[1.1gb]/[1.1gb]}{[survivor] [149.7mb]->[149.1mb]/[149.7mb]}{[old] [18.5gb]->[18.5gb]/[18.5gb]}

warkolm · June 3, 2015, 11:18pm

The problem is the GC, old GC is stop-the-world which will look like ES has stopped.

How much data in your cluster, how many indices/nodes/shards, what are your node specs?

Ben · June 4, 2015, 3:29pm

I haven't modified any default settings. I tried to divide my nodes from 1 to 4 with some master, some worker, and 1 load balancer - all with 4gb memory. But it didn't last more than 1 min. So, I went back with 1 node with 20gb memory and at least lasted longer (20) minutes. Shards are default. Nodespecs are also default. I tried to change some but didn't do any good.So left it as default. Here are node/cluster details.

[somehost]# curl 'localhost:9200/_cat/indices?v'
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open logstash-2015.06.03 5 1 209974 0 7.4gb 7.4gb
yellow open .kibana 1 1 1 0 2.4kb 2.4kb

[somehost]# curl 'localhost:9200/_cat/nodes?v'
host ip heap.percent ram.percent load node.role master name
somehost 10.x.x.x 2 c - logstash-somehost-29267-7964
somehost 10.x.x.x. 88 45 2.61 d * Butterball

[somehost]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "test_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 6,
"active_shards" : 6,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 6,
"number_of_pending_tasks" : 0
}
[somehost]#

How can i get the size of data ? I checked the filesystem size and this is what I got. Not sure if this is what you meant.
[somehost]# du -sh *
7.5G logstash-2015.06.03

Basically, I am feeding live apache logs from production (Around 1000-1500 per minute).

Should I be scaling way more ? ES server has 42gb memroy, 8 cpus with 8 cores. (ES - 20 gb, logstash, 4gb, Kibana - default).

Let me know your suggestion.

warkolm · June 5, 2015, 8:11am

7GB of data is very small, it's surprising you are seeing that sort of GC.
What version are you on?

Ben · June 5, 2015, 2:18pm

I am on latest version available.
elasticsearch-1.5.2
kibana-4.0.2
logstash-1.5.0
I am just lost right now. I was planning to add lots of metrics, but at this point I can't even run the current metric.