Remove fields from documents based on their value

I am using Logstash to read JSON data and load it to ElasticSearch. I am using a combination of update with doc_as_upsert to update existing documents and insert new.
The specific application need is that in Elastic there should be no empty fields. So the field should be deleted from a document rather than being updated to an empty value.
So long story short: if the field in input JSON has empty string value I want to remove it from the target document.
Any ideas on how to tackle this task? I was thinking about 2 pipelines: first will update fields (even to empty values) and second will remove empty fields. But I don't know if this is right direction of thinking.
Also, some generic solution would be nice, not to write tons of code to hardcode checking on every field.
Getting the whole document, filtering out a field and inserting document again is not an option here - I usually update a small subset of fields and such operation would kill performance.

The update API can be used to delete fields. You could use a ruby script to iterate over the fields of an event and generate a list of empty fields.

I know, that this API has commands such as:

POST test/_update/1
{
"script" : "ctx._source.remove('some_field')"
}

But how to invoke it from logstash?

In the past I have used logstash to generate a text file that I curled into elasticsearch, but it could probably be done using an http filter. You do not have to have an output in a logstash pipeline.

Maybe because I am quite new to ELS, but I do not get idea here.
Http filter is filter, which can be used to modify event data. But without output in pipeline how event will get to Elasticsearch node?

An http filter can POST into elasticsearch.

I have request, that works in Kibana console:

POST test_idx1/_update/1
{
"script" : "ctx._source.remove('field1')"
}

I copied it as cURL:

curl -XPOST "http://localhost:9200/test_idx1/_update/1" -H 'Content-Type: application/json' -d'{ "script" : "ctx._source.remove("field1")"}'

From which I tried to do deduct HTTP filter:

filter {
http {
url => "http://localhost:9200/test_idx1/_update/1"
verb => "POST"
user => "logstash_user"
password => "logstash_user_password"
body_format => "json"
body => "{\"script\" : \"ctx._source.remove(\\\"field1\\\")\"}"
}
}

After running Logstash no error is logged, even on DEBUG level. No update is happening in Elasticsearch.
What am I missing?

Interesting thing to notice: it started working when I added dummy output to file. It seems like Logstash do not processed filter when there is no output section in a pipeline.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.