Logstash 2.1: Dynamically Altering Field Names


#1

Hi,
I've seen an issue in recent days with sending data to elasticsearch 2.1 where inbound messages contained an "_uid" field which clashed with the type of the in-built meta-field "_uid". This led to indexing failing, shards becoming unassigned and the cluster going into a red state.

Given elasticsearch's apparent heightened sensitivity around type handling I feel I need to strip any leading underscores from inbound field names as a safety measure. However, as our log formats vary widely we do not know in andvance what the field names will be. Also, we do virtually no filtering in logstash as we pre-format our log lines as json. So our exposure to filtering is very limited...

So, can someone show me how I can parse inbound messages to strip leading underscores from field names? Also, am I taking the right approach here? Could I mitigate this issue by modifying the logstash template?

Regards,
David


Elasticsearch 2.1 _uid is reformated, shard will failed
(Magnus Bäck) #2

You can use a ruby filter for this. This should be reasonably close to actually working:

ruby {
  code => "
    event.to_hash.each_item {|k, v|
      if k.start_with? '_'
        event.remove(k)  # or is it .delete, I don't remember
        event[k.gsub('^_', '')] = v
      end
    }
  "
}

#3

Thanks for that but it doesn't work. "k" is getting set to the name of the first argument of the input I'm using to test this, which is the path to the input data.


(Magnus Bäck) #4

Sorry, I don't get it. What do your messages look like?


#5

I got it to work with this:

filter {
ruby {
code => "
event.to_hash.select{|k,v| k.start_with?('_')}.each() {|k, v| event.remove(k); event.append({k.slice(1,k.length) => v}) }
"
add_tag => "stripped"
}
}


#6

thank your submit,we find the same problem,this is a bug in 2.1?


(system) #7