Logstash Error: Invalid FieldReference: `[iso-8859-1]to`

I'm using ELK v7.9.1 and I get this error with some of records that are coming from CouchDB:

logstash         | [2020-09-15T16:03:20,662][ERROR][logstash.javapipeline    ][main][69664fe8e111893f386d879e5cd848742ac1a327faf79e98fcfea9f1054190c6] A plugin had an unrecoverable error. Will restart this plugin.
logstash         |   Pipeline_id:main
logstash         |   Plugin: <LogStash::Inputs::CouchDBChanges ignore_attachments=>true, codec=><LogStash::Codecs::JSON id=>"json_ac28a3cf-c546-4bbc-8377-4789a73ab494", enable_metric=>true, charset=>"UTF-8">, password=><password>, port=>5984, host=>"host", id=>"69664fe8e111893f386d879e5cd848742ac1a327faf79e98fcfea9f1054190c6", db=>"incidents", username=>"admin", enable_metric=>true, secure=>false, heartbeat=>1000, keep_id=>false, keep_revision=>false, always_reconnect=>true, reconnect_delay=>10>
logstash         |   Error: Invalid FieldReference: `[iso-8859-1]to`
logstash         |   Exception: Java::OrgLogstash::FieldReference::IllegalSyntaxException
logstash         |   Stack: org.logstash.FieldReference$StrictTokenizer.tokenize(FieldReference.java:303)

As far as I can tell, [iso-8859-1]to never appears in any of the JSON documents. What is causing this error and how can I track down which documents are causing it?

Here's my .conf file:

input {

    couchdb_changes {
        host => "host"
        port => "5984"
        username => "username"
        password => "password"
        db => "incidents"
        codec => json
        ignore_attachments => true
    }
    
}

filter {

    date {
        match => [ "time", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
    }

    mutate { add_field => { "doc_id" => "%{[@metadata][_id]}" } }

    if ([doc][all_headers][from]) {
        mutate { add_field => { "sender" => "%{[doc][all_headers][from]}" } }
    }

    if ([doc][subject]) {
        mutate { add_field => { "subject" => "%{[doc][subject]}" } }
    }

    if ([doc][email]) {
        mutate { add_field => { "reporter" => "%{[doc][email]}" } } 
    }

    if ([doc][time]) {
        mutate { add_field => { "time" => "%{[doc][time]}" } }
    }

    if ([doc][categories][mlclassifier][time]) {
        mutate { add_field => { "timeClassification" => "%{[doc][categories][mlclassifier][time]}" } }
    }

    if ([doc][categories][mlclassifier][category]) {
        mutate { add_field => { "category_auto" => "%{[doc][categories][mlclassifier][category]}" } }
    }

    if ([doc][categories][manual][category]) {
        mutate { add_field => { "category_manual" => "%{[doc][categories][manual][category]}" } }
    }

    # Remove original doc
    mutate { remove_field => [ "@timestamp", "doc_as_upsert", "@version", "doc" ] }

}

output {

    stdout { codec => "rubydebug" }
    elasticsearch {
        hosts => "http://host:9200"
        document_id => "%{[@metadata][_id]}"
        index => "incidents"
    }
    
}

The couchdb_changes input sets include_docs to true, so, as I understand it, the changes stream will include the documents. If any of those documents contain square brackets then the json codec will interpret them as field references.

I think the answer is to use a plain codec and sanitize the JSON before using a json filter.

So this problem will arise if a document contains square brackets anywhere in a key or value? [iso-8859-1]to isn't clear because that literal string doesn't exist anywhere.

And when you say sanitize the JSON before using a json filter, would that have to be before the couchdb_changes input step? As in, sanitize the JSON before it's saved in CouchDB?

If you do that then you could continue to use a json codec. I was thinking of using mutate to modify anything in the JSON that looks like a field reference.

So just for testing purposes, I removed the codec => json so it defaults to plain, and I removed the filter and elasticsearch output steps. Even with this configuration, Error: Invalid FieldReference: '[iso-8859-1]to' is thrown.

Doesn't this mean that the error is thrown before it even reaches the filter? If that's the case, how can I fix this without modifying the data in CouchDB?

The problem is that the input does not use a codec. You can configure it, since that is in logstash/inputs/base, but the input ignores it.

The code blows up when it creates the event at this line of code. I see no alternative other than modifying the data in CouchDB. Extracting the key lines from the input that are effectively the codec...

input { generator { count => 1 lines => [ '{ "foo[1]": "bar" }' ] } }
filter {
    ruby {
        code => '
            m = event.get("message")
            data = LogStash::Json.load(m)
            event = LogStash::Event.new(data)
        '
    }
}
output  { stdout { codec => rubydebug { metadata => false } } }

will produce the same error

[logstash.filters.ruby    ][main][...] Ruby exception occurred: Invalid FieldReference: `foo[1]`