Couchdb_changes gives Invalid FieldReference error when parsing couchdb which hosts couchapp

Here is the log:

Jan 10 16:45:25 Andromeda logstash[12489]: [2020-01-10T16:45:25,081][INFO ][logstash.inputs.couchdbchanges][main] Connecting to CouchDB _changes stream at: {:host=>"localhost", :port=>"5984", :db=>"material_database"}
Jan 10 16:45:25 Andromeda logstash[12489]: [2020-01-10T16:45:25,083][INFO ][logstash.inputs.couchdbchanges][main] Using service uri : {:uri=>#<URI::HTTP http://localhost:5984/material_database/_changes?feed=continuous&include_docs=true&since=950-g1AAAAJjeJyd0EsKwkAMANDBCrr1BHoCmcm0nXZlb6LzaSmlKoIudKO48xR6E72J3qTOpwsXRZgSSCAhD5IaITQuA4UmcnuQpRIZATbHOkitRwOOxLRpmqoMuFjrxigBYAxw18IfRsx0FotW2luJREVBmfCVMiMtW-lqpbSgKajcV1oZ6dxKRyuFjKZhHnpKm6HO6KKLxu5GO1kNxyxWkPTSHk57Gm1nNUoB59j3X057Oe39oxEuZEx6aR-n2b_d3KWcgeS8a6_6AnUdodA&heartbeat=1000>}
Jan 10 16:45:25 Andromeda logstash[12489]: [2020-01-10T16:45:25,193][ERROR][logstash.javapipeline    ][main] A plugin had an unrecoverable error. Will restart this plugin.
Jan 10 16:45:25 Andromeda logstash[12489]:   Pipeline_id:main
Jan 10 16:45:25 Andromeda logstash[12489]:   Plugin: <LogStash::Inputs::CouchDBChanges port=>5984, db=>"material_database", id=>"403531702e10d9afa71b0a66780fe20bcc554cac3c90be7e534d12222897105f", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_a0ec6ba6-e456-4605-b443-61d4b7b5d20b", enable_metric=>true, charset=>"UTF-8">, host=>"localhost", secure=>false, password=><password>, heartbeat=>1000, keep_id=>false, keep_revision=>false, ignore_attachments=>true, always_reconnect=>true, reconnect_delay=>10>
Jan 10 16:45:25 Andromeda logstash[12489]:   Error: Invalid FieldReference: `a[href=#logout]`
Jan 10 16:45:25 Andromeda logstash[12489]:   Exception: Java::OrgLogstash::FieldReference::IllegalSyntaxException
Jan 10 16:45:25 Andromeda logstash[12489]:   Stack: org.logstash.FieldReference$StrictTokenizer.tokenize(FieldReference.java:283)

My config file:

input {
  couchdb_changes {
    db => material_database
    port => 5984
    type => material_database
  }
}
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "material_database"
  }
}

How can I fix it?

A CouchDB _changes page should be returning JSON. That

Invalid FieldReference: `a[href=#logout]`

makes me wonder if it is returning HTML. What do you get when you request

http://localhost:5984/material_database/_changes?feed=continuous&include_docs=true&since=950-g1AAAAJjeJyd0EsKwkAMANDBCrr1BHoCmcm0nXZlb6LzaSmlKoIudKO48xR6E72J3qTOpwsXRZgSSCAhD5IaITQuA4UmcnuQpRIZATbHOkitRwOOxLRpmqoMuFjrxigBYAxw18IfRsx0FotW2luJREVBmfCVMiMtW-lqpbSgKajcV1oZ6dxKRyuFjKZhHnpKm6HO6KKLxu5GO1kNxyxWkPTSHk57Gm1nNUoB59j3X057Oe39oxEuZEx6aR-n2b_d3KWcgeS8a6_6AnUdodA&heartbeat=1000>

in a web browser? Does your CouchDB require authentication?

I don't want post my company data, but the above query (with the trailing ">" removed) is returning valid JSON. The thing is I have a couchapp hosted on the couchdb. So, some of the return values and key look like html. Maybe that's what chokes it?

I found the offending key-value pair:
"selectors":{"a[href=#logout]":{"click":["doLogout"]}}
(There are a few more that says login, signup, etc.)
So, is this key invalid? I thought JSON key can be any string. Or is logstash trying to interpret the key in some way? If so, I need to disable that. How can I do that?

That is a known issue. The code uses event.set and there is no way to stop event.set parsing field references.

If the set of offending keys is fixed you could mutate+gsub them. If it is dynamic then you would have to use a ruby filter (reusing code from the json filter) and iterate over the resulting keys to mutate them if needed before doing the event.set call.

I tried to add a filter block in my config file like so:

input {
  couchdb_changes {
    db => material_database
    port => 5984
    type => material_database
    keep_id => true
    keep_revision => true
  }
}

filter {
  mutate {
    gsub => [
      "selectors",".*",""
    ]   
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "material_database"
  }
}

That didn't work. Same error. Then I tried this filter block:

filter {
  prune { 
    blacklist_names => ["selectors"]
  }
}

Same result. So, I seems I need to filter before the input. Changing the order of the input block and the filter block didn't help. It would be great if I can tell the input plugin to skip that one document, but there doesn't seem to be such an option.

Any suggestions?

Oh, I see. The couchdb_changes input is parsing the JSON. No way to avoid the exception with that input. Maybe switch to an http_poller input and do the mutate before parsing it with a json filter.

Still no luck.

I changed the config to the following.

input {
  http_poller {
    urls => {
      local_couchdb => "http://localhost:5984/material_database/_changes?feed=continuous&include_docs=true&heartbeat=1000"
    }   
    schedule => { "every" => "1m" }
  }
}

filter {
  prune { 
    blacklist_names => ["selectors"]
  }
  json {
    source => "message"
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "material_database"
  }
}

The http_poller doesn't kick in even once 15 minutes after restarting logstash, even though I've set "every" => "1m" (I hope "1m" doesn't mean 1 month.) What did I do wrong?

Also, if I use http-poller, then I need to somehow keep track of the seq_no of couchdb changes, otherwise it will read the whole DB every time. Doesn't seem trivial to me.

Agreed.

OK. Let me summarize what potential solutions I have tried and failed:

  • Turning off field dereferencing. Impossible. No way to get around event.get().
  • Skipping the offending field. Impossible. Because filters kick in after input.
  • Skipping the offending document. Impossible. Because the input plug-in doesn't contain internal filters and it doesn't recognize filters (or views) in couchdb, so there is no way to pre-filter.

In short, there is no easy solution.

I have just filed this bug to the couchdb_changes github page:

This pull request from 2015 can provide a workaround by allowing filters, but it has never been merged. https://github.com/logstash-plugins/logstash-input-couchdb_changes/pull/13