Below is my config for ELK v6.2. Within Kibana, I see a few default fields, in addition to the 'message' field. However, I do not see (what I think that my Grok filter does) the 'message' field parsed out into several fields like 'src_ip', etc. /var/log/logstash/logstash-plain.log is filled with 'Could not index event to Elasticsearch status 400...' . Is my Grok filter not working? Or, do I need to setup a JSON template (or both)? I read up on JSON templates, but could not find a simple example. Not sure when I should use Grok or JSON template. These are probably really basic questions. If anyone has feedback, or can point me in the right direction, I would appreciate the help.
Thank you.
input
{
udp
{
type=> "http"
port=> 514
}
filter
if [type] == "http"
{
grok
{
match => [ "message", "(%{IP:src_ip})(%{NUMBER:src_port})([%{HTTPDATE:timestamp}])(%{IP:vip})(%{IP:dst_ip})(%{NUMBER:dst_port})("%{DATA:method}")("%{DATA:path}")(%{DATA:stat_code})("%{DATA:version}")(%{NUMBER:response_size})(%{NUMBER:response_ms})(%{NUMBER:response_us})("%{DATA:referrer}")("%{DATA:agent}")"]
}
}
output
{
if [type] == "http"
{
elasticsearch
{
index => "http"
}
}
}
Don't test everything at once. Just use a simple stdout { codec => rubydebug } output until your filter works, then enable the elasticsearch output.
If you want help with the 400 errors we need to be able to see them.
Don't use home-made grok expressions to parse standard HTTP logs. Use the predefined HTTPD_COMBINEDLOG pattern instead. You're overusing DATA patterns, making your expression extremely inefficient and potentially incorrect.
I'll try as you suggest, but could you please clarify (just trying to understand)? 1) I tested the Grok expressions, and they work for all examples that I attempted. 2) The logs are coming from a network device, for which there is no predefined pattern that I could locate. Also, for these particular logs, the messages are always in the same format. Curious why it is inefficient to use my own (could you please elaborate)? 3) Are Grok (or pattern files) used in conjunction with JSON template? Or should I not use both at the same time? Are there cases where I should use one over the other?
I am sure these questions are basic, and I appreciate your time. Thank you.
Here is the error message. I get this message repeatedly, even when I remove the Grok filter.
[2018-07-02T15:54:00,485][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"http", :_type=>"doc", :_routing=>nil}, #LogStash::Event:0x4d52d585], :response=>{"index"=>{"_index"=>"http", "_type"=>"doc", "_id"=>"jt4zXWQBVp5kboZe6Uwa", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Limit of total fields [1000] in index [f5-reqres] has been exceeded"}}}}
I tested the Grok expressions, and they work for all examples that I attempted.
Sure. There's nothing in what you posted that suggests they don't work.
The logs are coming from a network device, for which there is no predefined pattern that I could locate.
Looking more closely... yes, HTTPD_COMBINEDLOG won't work out of the box but it's pretty close so it's an excellent source of inspiration when you get rid of the DATA patterns.
Also, for these particular logs, the messages are always in the same format. Curious why it is inefficient to use my own (could you please elaborate)?
The DATA pattern (i.e. .*?) matches zero or more occurrences of any character. When you use them in an expression you will create ambiguity and there's a good chance the regexp will have to do a lot of backtracking, i.e. "no wait, this particular way of matching things didn't work out; I'll start over". This can increase CPU usage significantly and even lead to grok filters timing out.
A second problem with DATA and GREEDYDATA is that the ambiguity can create incorrect matches when the input changes in ways you hadn't anticipated.
So, for performance and correctness reasons grok expressions should always be as specific as possible.
Are Grok (or pattern files) used in conjunction with JSON template? Or should I not use both at the same time? Are there cases where I should use one over the other?
They're almost entirely unrelated. With the %{PATTERN:fieldname:datatype} notation you can in some ways control the data type of the extracted field, but only the data type on the Logstash side. How that field is mapped on the Elasticsearch side can be controlled by the index template, but absent an explicit mapping in the template the Logstash-side data type can play a role.
Appreciate the input, thank you. I am reading and trying to better understand these concepts.
How about the 400 error message?
[WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"http", :_type=>"doc", :_routing=>nil}, #LogStash::Event:0x4d52d585], :response=>{"index"=>{"_index"=>"http", "_type"=>"doc", "_id"=>"jt4zXWQBVp5kboZe6Uwa", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Limit of total fields [1000] in index [f5-reqres] has been exceeded"}}}}
By default an ES index can only hold 1000 fields. This is tunable, but if you're running into that limit there's a good chance you're up to no good. Why do you have so many fields?
Obviously I am a novice, but it seems that words within the 'message' field are being used to create other fields? I have no idea what in my config would cause this. Commenting out my Grok filter seems to make no difference, still get the same error repeatedly.
By default Logstash's elasticsearch output installs an index template for indexes whose names match logstash-*. The above is just a notification about that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.