Do filter plugins error if the field doesn't exist or is empty?


#1

If I use the date filter plugin to parse a field that usually has a date, but sometimes is empty, does it throw an error? What about the JSON filter? Is there any need for me to have a config setting that says if [field] == "" then remove [field]?


#2

If you give a non-existent field name to date/match or json/source they silently do nothing. If the field exists but is not the right format they will add a tag (_dateparsefailure or _jsonparsefailure) to the event.

I do not understand the question about removing fields.


#3

So if it is a date parse failure, would that just mean Elasticsearch will index the document but that the field will be a string instead of a time format?


(Magnus Bäck) #4

So if it is a date parse failure, would that just mean Elasticsearch will index the document but that the field will be a string instead of a time format?

Well, Logstash never sends anything but strings, numbers, and bools to ES. There is no time type in JSON.

But yes, unless you look for a _dateparsefailure tag after a date filter the event will just continue through the pipeline.


#5

Okay, and so a document with a date parse failure will continue through the pipeline, get send to Elasticsearch, and then what will happen? Will Elasticsearch see it's string that isn't a valid date format, but still indexes the document but just without that field? This is what I'm having the most trouble with wrapping my head around in regard to Elasticsearch. Sorry for so many questions, but I really want to understand what is happening here.

  1. If Elasticsearch doesn't have any explicit mapping that I set for field data types, then (from what I understand) it will try its best to guess what the data type is and create the mapping from that. Doesn't this cause issues? Like if the first document it indexes has a field with an incorrect data type of string (maybe it failed json or date parsing or something) then would Elasticsearch reject any documents later that have valid object or date data types?

  2. If I do explicitly set the mapping of data types, then (front what I understand) sending Elasticsearch a document without that field or with the field set to null will not be an issue at all, right? Elasticsearch will only throw errors if the field exists and if it is the wrong type?

  3. How does the following translate to Elasticsearch?

    {
    "field_name" : ""
    }

Does Elasticsearch think it's a string (even though it's empty) because there are quotes? or Does Elasticsearch say "this is an empty field so I'll keep the field name searchable and there will never be any issues with this" (e.g. mapping rejections)?

Thank you


#6

Regarding the date filter silently doing nothing, I have been getting the following error:

[2017-12-22T17:29:11,951][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"metrics-2017-08", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x46355571>], :response=>{"index"=>{"_index"=>"metrics-2017-08", "_type"=>"doc", "_id"=>"rfBFf2ABluZYKCJJiRmu", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [page.post_date]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"\""}}}}}

Doesn't "Invalid format: \"\"" mean that the page.post_date field was equal to "", so it was empty? Why is this error happening if it was empty? I checked Kibana for any documents that have tags and there were none with _dateparsefailure.


#7

@arisbanach That is elasticsearch refusing to index a document that has the value "" for a field whose type, I would imagine, is set to date. "" is not a valid date. Check the field type in the index pattern under Management.

If the field type in elasticsearch has gotten set to date, then it will reject an attempt to index a document if the field is not parseable as a date.

If the field type has gotten set to string, then strings that look like dates will not be converted to dates.

You may have gotten _dateparsefailure on that document, but since es rejected it you will never see it in Kibana.


#8

Okay thanks, that helps a lot. So if Elasticsearch just doesn't index the document at all if the mapping is a date and the value is "", then I'm more confused about how Elasticsearch handles empty field values...

I've seen (at least in Kibana) other fields that exist but have no data in them. I would assume they were send to Elasticsearch as "field_name" : "", right? So does Elasticsearch only reject a document if the field that is empty is of a certain type? Like, if a text field is empty then it's okay, but if a date field is empty then it isn't and will reject the whole thing?


#9

I believe that is correct.


#10

Doesn't this made the behavior harder to reason about? Where do I find what fields will cause Elasticsearch to reject the entire document if they're empty and which will be fine? Is there a reason Elasticsearch rejects the document if a date field is blank but doesn't reject it for a text field?


(Magnus Bäck) #11

Where do I find what fields will cause Elasticsearch to reject the entire document if they're empty and which will be fine?

I'd expect text (string) fields to be the only ones that can hold an empty value.

Is there a reason Elasticsearch rejects the document if a date field is blank but doesn't reject it for a text field?

Because an empty date doesn't make sense. What would it mean? Same thing with a number.


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.