siasic
(support)
June 24, 2016, 2:13pm
1
hello,
I have a Hadoop cluster (Hortonworks) and i try to send hdfs/yarn/.... logs to logstash for parsing them.
The sent is made by filebeat.
My logs look like that (i think it is log4j type) :
2016-06-12 03:00:20,432 INFO ipc.Client (Client.java:handleConnectionFailure(869)) - Retrying connect to server: d1hdpslave01.ouest-france.fr/128.1.228.46:8020. A
lready tried 25 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2016-06-12 03:00:20,660 ERROR datanode.DataNode (DataXceiver.java:run(278)) - d1hdpslave03.ouest-france.fr:50010:DataXceiver error processing unknown operation sr
c: /128.1.228.49:51734 dst: /128.1.228.49:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
at java.lang.Thread.run(Thread.java:745)
Here is my logstash config at the moment :
filter{
grok {
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} {1,2}%{JAVACLASS:class} \(%{USERNAME:java_family}\.%{USERNAME:error_type}\:%{USERNAME:java_sub_family}\(%{INT:java_num}\)\) \- %{GREEDYDATA:message}" ]
overwrite => [ "message" ]
}
mutate {
remove_field => [ "[beat]","input_type","offset" ]
}
}
Did anyone succeed to parse entirely this kind of log ? At least a little bit more the java error message.
Thanks,
This should work. What do you get? Please show an example event produced by a stdout { codec => rubydebug }
codec.
siasic
(support)
June 30, 2016, 7:07am
3
Here is what i get :
2016-06-30T07:02:24.329Z dxxdpslavexx dXXdpslave0X.ouest-france.fr:50010:DataXceiver error processing unknown operation src: /128.1.22x.xx:49066 dst: /128.1.22x.xx:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
at java.lang.Thread.run(Thread.java:745)
{
"message" => "dxxdpslavxxx.ouest-france.fr:50010:DataXceiver error processing unknown operation src: /128.1.22x.xx:43851 dst: /128.1.22x.xx:50010\njava.io.EOFException\n\tat java.io.DataInputStream.readShort(DataInputStream.java:315)\n\tat org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)\n\tat org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)\n\tat java.lang.Thread.run(Thread.java:745)",
"@version" => "1",
"@timestamp" => "2016-06-30T07:02:32.405Z",
"count" => 1,
"source" => "/var/log/hadoop/hdfs/hadoop-hdfs-datanode-d1hdpslave02.log",
"type" => "hdfs_datanode",
"fields" => nil,
"host" => "dxxdpslavexx",
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"timestamp" => "2016-06-30 09:02:23,269",
"severity" => "ERROR",
"class" => "datanode.DataNode",
"java_family" => "DataXceiver",
"error_type" => "java",
"java_sub_family" => "run",
"java_num" => "278"
}
Is there a way to parse a little bit more the "message" part?
Sure, just continue the grok expression in the same manner as you started it. I'm not sure I understand the difficulty. How, exactly, would you like to have the remainder of the message parsed?
siasic
(support)
June 30, 2016, 7:56am
5
The problem is, messages can be different:
2016-06-30T07:52:08.822Z d1hdpslave01 Get corrupt file blocks returned error: Operation category READ is not supported in state standby
{
"message" => "Get corrupt file blocks returned error: Operation category READ is not supported in state standby",
"@version" => "1",
"@timestamp" => "2016-06-30T07:52:08.822Z",
"source" => "/var/log/hadoop/hdfs/hadoop-hdfs-namenode-d1hdpslave01.log",
"type" => "hdfs_namenode",
"count" => 1,
"fields" => nil,
"host" => "d1hdpslave01",
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"timestamp" => "2016-06-30 09:52:08,774",
"severity" => "WARN",
"class" => "namenode.FSNamesystem",
"java_family" => "FSNamesystem",
"error_type" => "java",
"java_sub_family" => "getCorruptFiles",
"java_num" => "7324"
}
I don't know the full possibility of logstash. Maybe i can't go any further.
A single grok filter can try to match the message against multiple expressions (see example in the grok filter documentation), so you don't have to write a single expression that matches every string imaginable.