Failed action with response of 400, dropping action


#1

Hi,

For the following log:

Job Start Summary: jobid=2291646.r-man2 project=p08 user=rnt651@raijin1 queue=express-node ncpus=16 mem_requested=8gb walltime=16:00:00 cpu_hrs=768.0 jobfs_local=8gb jobfs_global=None

When I trim ':' with logstash kv filter, the log can be passed to ES, but then the walltime becomes '160000'.

kv {
   trim => "<>:"
   field_split => " "
   prefix => "job_"
}

If I don't trim ':' I get the following error and the log is dropped by ES

{:timestamp=>"2015-09-07T17:33:30.017000+1000", :message=>"failed action with response of 400, dropping action: ["index", {:_id=>nil, :_index=>"logstash-2015.09.07", :_type=>"program", :_routing=>nil}, #0}, @accessors=#"r-man2", "message"=>"Job Start Summary: jobid=2291646.r-man2 project=p08 user=rnt651@raijin1 queue=express-node ncpus=16 mem_requested=8gb walltime=16:00:00 cpu_hrs=768.0 jobfs_local=8gb jobfs_global=None", "offset"=>253322067, "path"=>"/var/spool/PBS/server_logs/20150907", "program"=>"pbs_server", "type"=>"program", "@version"=>"1", "@timestamp"=>"2015-09-07T07:33:08.000Z", "pbs_timestamp"=>"09/07/2015 17:33:08", "pbs_event_class"=>"job", "pbs_node"=>"Server@r-man2", "pbs_object_type"=>"job", "job_id"=>"2291646.r-man2", "received_from"=>"r-man2", "pbs_host"=>"r-man2", "pbs_program"=>"pbs_server", "received_at"=>"09/07/2015 17:33:08", "tags"=>["pbs_production", "pbs_job"], "job_jobid"=>"2291646.r-man2", "job_project"=>"p08", "job_user"=>"rnt651@raijin1", "job_queue"=>"express-node", "job_ncpus"=>16, "job_mem_requested"=>"8gb", "job_walltime"=>"16:00:00", "job_cpu_hrs"=>768.0, "job_jobfs_local"=>"8gb", "job_jobfs_global"=>"None", "pbs_user"=>"rnt651@raijin1"}, @lut={"[pbs_object_type]"=>[{"host"=>"r-man2", "message"=>"Job Start Summary: jobid=2291646.r-man2 project=p08 user=rnt651@raijin1 queue=express-node ncpus=16 mem_requested=8gb walltime=16:00:00 cpu_hrs=768.0 jobfs_local=8gb jobfs_global=None", "offset"=>253322067, "path"=>"/var/spool/PBS/server_logs/20150907", "program"=>"pbs_server", "type"=>"program", "@version"=>"1", "@timestamp"=>"2015-09-07T07:33:08.000Z", "pbs_timestamp"=>"09/07/2015 17:33:08", "pbs_event_class"=>"job", "pbs_node"=>"Server@r-man2", "pbs_object_type"=>"job", "job_id"=>"2291646.r-man2", "received_from"=>"r-man2", "pbs_host"=>"r-man2", "pbs_program"=>"pbs_server", "received_at"=>"09/07/2015 17:33:08", "tags"=>["pbs_production", "pbs_job"], "job_jobid"=>"2291646.r-man2", "job_project"=>"p08", "job_user"=>"rnt651@raijin1", "job_queue"=>"express-node", "job_ncpus"=>16, "job_mem_requested"=>"8gb", "job_walltime"=>"16:00:00", "job_cpu_hrs"=>768.0, "job_jobfs_local"=>"8gb", "job_jobfs_global"=>"None", "pbs_user"=>"rnt651@raijin1"}, "pbs_object_type"], "pbs_object"=>[{"host"=>"r-man2", "message"=>"Job Start Summary: jobid=2291646.r-man2 project=p08 user=rnt651@raijin1 queue=express-node ncpus=16 mem_requested=8gb walltime=16:00:00 cpu_hrs=768.0 jobfs_local=8gb jobfs_global=None", "offset"=>253322067, "path"=>"/var/spool/PBS/server_logs/20150907", "program"=>"pbs_server", "type"=>"program", "@version"=>"1", "@timestamp"=>"2015-09-07T07:33:08.000Z", "pbs_timestamp"=>"09/07/2015 17:33:08", "pbs_event_class"=>"job", "pbs_node"=>"Server@r-man2", "pbs_object_type"=>"job", "job_id"=>"2291646.r-man2", "received_from"=>"r-man2", "pbs_host"=>"r-man2", "pbs_program"=>"pbs_server", "received_at"=>"09/07/2015 17:33:08", "tags"=>["pbs_production", "pbs_job"], "job_jobid"=>"2291646.r-man2", "job_project"=>"p08", "job_user"=>"rnt651@raijin1", "job_queue"=>"express-node", "job_ncpus"=>16, "job_mem_requested"=>"8gb", "job_walltime"=>"16:00:00", "job_cpu_hrs"=>768.0, "job_jobfs_local"=>"8gb", "job_jobfs_global"=>"None", "pbs_user"=>"rnt651@raijin1"}, "pbs_object"], "tags"=>[{"host"=>"r-man2", "message"=>"Job Start Summary: jobid=2291646.r-man2 project=p08 user=rnt651@raijin1 queue=express-node ncpus=16 mem_requested=8gb walltime=16:00:00 cpu_hrs=768.0 jobfs_local=8gb jobfs_global=None",
...........
@metadata_accessors=#0}, @lut={}>, @cancelled=false>]", :level=>:warn}

Any idea why the log is dropped? I'm running logstash 1.5.4 and ES 1.4.5 on a CentOS 6.6 server.


(Mark Walkom) #2

Try looking in the ES logs to see if there is something that corresponds.

(Also, upgrade ES :wink: )


#3

Many thanks for your quick response Mark. I've confirmed there is nothing corresponding to this in ES log. I will upgrade ES to see if it can be resolved.


(Christian Dahlqvist) #4

It looks like only the part after "Job Start Summary: " is a proper k/v list. I would therefore suggest trying to first use a grok filter to isolate the k/v list and then applying the kv filter on the field containing the list without the trim parameter specified.


#5

I have upgraded ES to v1.7.1 but this problem still happens...


#7

Hi Christian,

Many thanks for your suggestion. I have followed your advice but unfortunately I still get the same error.

grok {
     match => [ "message", "Job Start Summary: %{GREEDYDATA:kv_target}" ]
 }
kv {
    source => "kv_target"
    field_split => " "
   prefix => "job_"
}

{:timestamp=>"2015-09-09T16:46:47.212000+1000", :message=>"failed action with response of 400, dropping action: ["index", {:id=>nil, :index=>"logstash-2015.09.09", :type=>"program", :routing=>nil}, #@metadata={"retry_count"=>0}, @accessors=#@store={"host"=>"r-man2", "message"=>"Job Start Summary: jobid=2
325253.r-man2 project=v71 user=mxz561@raijin3 queue=express-node ncpus=16 mem_requested=16gb walltime=24:00:00 cpu_hrs=1152.0 jobfs_local=100mb jobfs_global=None", "offset"=>182499027, "path"=>"/var/spool/PBS/server_logs/20150909", "program"=>"pbs_server", "type"=>"program", "@version"=>"1", "@timestamp"=>"2015-09-09T06:46:05.000Z", "pbs_timestamp"=>"09/09/2015 16:46:05", "pbs_event_class"=>"job", "pbs_node"=>"Server@r-man2", "pbs_object_type"=>"job", "job_id"=>"2325253.r-man2", "received_from"=>"r-man2", "pbs_host"=>"r-man2", "pbs_program"=>"pbs_server", "received_at"=>"09/09/2015 16:46:05", "tags"=>["pbs_production", "pbs_job"], "kv_target"=>"jobid=2325253.r-man2 project=v71 user=mxz561@raijin3 queue=
express-node ncpus=16 mem_requested=16gb walltime=24:00:00 cpu_hrs=1152.0 jobfs_local=100mb jobfs_global=None", "job_jobid"=>"2325253.r-man2", "job_project"=>"v71", "job_user"=>"mxz561@raijin3", "job_queue"=>"express-node", "job_ncpus"=>16,
"job_mem_requested"=>"16gb", "job_walltime"=>"24:00:00", "job_cpu_hrs"=>1152.0, "job_jobfs_local"=>"100mb", "job_jobfs_global"=>"None", "pbs_user"=>"mxz561@raijin3"}, @lut={"[pbs_object_type]"=>[{"host"=>"r-man2", "message"=>"Job Start Summary: jobid=2325253.r-man2 project=v71 user=mxz561@raijin3 queue=express-node ncpus=16 mem_requested=16gb walltime=24:00:00 cpu_hrs=1152.0 jobfs_local=100mb jobfs_global=None", "offset"=>182499027, "path"=>"/var/spool/PBS/server_logs/20150909", "program"=>"pbs_server", "type"=>"program", "@version"=>"1", "@timestamp"=>"2015-09-09T06:46:05.000Z", "pbs_timestamp"=>"09/09/2015 16:46:05", "pbs_event_class"=>"job", "pbs_node"=>"Server@r-man2", "pbs_object_type"=>"job", "job_id"=>"2325253.r-man2", "received_from"=>"r-man2", "pbs_host"=>"r-man2", "pbs_program"=>"pbs_server", "received at"=>"09/09/2015 16:46:05", "tags"=>["pbsproduction", "pbs_job"], "kv_target"=>"jobid=2325253.r-man2 project=v71 user=mxz561@raijin3 queue=express-node ncpus=16 mem_requested=16gb walltime=24:00:00 cpu_hrs=1152.0 jobfs_local=100mb jobfs_global=None", "job_jobid"=>"2325253.r-man2", "job_project"=>"v71", "job_user"=>"mxz561@raijin3", "job_queue"=>"express-node", "job_ncpus"=>16, "job_mem_requested"=>"16gb", "job_walltime"=>"24:00:00", "job_cpu_hrs"=>1152.0, "job_jobfs_local"=>"100mb", "job_jobfs_global"=>"None", "pbs_user"=>"mxz561@raijin3"}, "pbs_object_type"], "pbs_object"=>[{"host"=>"r-man2", "message"=>"Job Start Summary: jobid=2325253.r-man2 project=v71 user=mxz561@raijin3 queue=express-node ncpus=16 mem_requested=16gb walltime=24:00:00 cpu_hrs=1152.0 jobfs_local=100mb jobfs_global=None", "offset"=>182499027, "path"=>"/var/
spool/PBS/server_logs/20150909", "program"=>"pbs_server", "type"=>"program", "@version"=>"1", "@timestamp"=>"2015-09-09T06:46:05.000Z", "pbs_timestamp"=>"09/09/2015 16:46:05", "pbs_event_class"=>"job", "pbs_node"=>"Server@r-man2", "pbs_object_type"=>"job", "job_id"=>"2325253.r-man2", "received_from"=>"r-man2", "pbs_host"=>"r-man2", "pbs_progr
am"=>"pbs_server", "received_at"=>"09/09/2015 16:46:05", "tags"=>["pbs_production", "pbs_job"], "kv_target"=>"jobid=2325253.r-man2 project=v71 user=mxz561@raijin3 queue=express-node ncpus=16 mem_requested=16gb walltime=24:00:00 cpu_hrs=1152.0 jobfs_local=100mb jobfs_global=None", "job_jobid"=>"2325253.r-man2", "job_project"=>"v71", "job_user"=>"mxz561@raijin3
", "job_queue"=>"express-node", "job_ncpus"=>16, "job_mem_requested"=>"16gb", "job_walltime"=>"24:00:00", "job_cpu_hrs"=>1152.0, "job_jobfs_local"=>"100mb", "job_jobfs_global"=>"None", "pbs_user"=>"mxz561@raijin3"}, "pbs_object"],
"tags"=>[{"host"=>"r-man2", "message"=>"Job Start Summary: jobid=2325253.r-man2 project=v71 user=mxz561@raijin3 queue=express-node ncpus=16 mem_requested=16gb walltime=24:00:00 cpu_hrs=1152.0 jobfs_local=100mb jobfs_global=None", "offset"=>182499027, "path"=>"/var/spool/PBS/server_logs/20150909",


#8

I changed the prefix from "job_" to something else and the logs pass ES now.


(system) #9