Skip indexing based on field value

We're doing some nested indexing on an events property. Some of the events we don't need in elasticsearch and are causing for a large amount of unneeded internal documents.

Is there a way to skip indexing a field based on it's value?

ie: if events.type = foo, skip indexing this

I've been trying to find a solution all morning but no luck.

Thanks in advance for the help :slight_smile:

That's something you need to fix in the ingest layer.

If you are using logstash, it's easy to call the drop filter.

@magnusbaeck wrote a nice example here:

@dadoonet Thank you for the quick reply.

As for the ingest layer, I do see the ability to drop a field at https://www.elastic.co/guide/en/elasticsearch/reference/current/remove-processor.html, is there a way to drop based on the field value?

We're currently using the couchbase Elasticsearch connector found at https://github.com/couchbaselabs/elasticsearch-transport-couchbase to transport data from CB into ES from our application so we can have better search/reporting capabilities for our customers.

We've already requested they add support for pipelines so we can route documents to an index based off the created_at field, and hope to have it within the next 1-2 weeks.

From what I understand, we're pretty blocked until we can specify a pipeline to use the ingest layer?

remove processor removes a field. It does not drop the document by itself.
If I understood you want the later.

So you need to do that in another way.

I don't think there is something doable like this. May be an ingest script though which test the field value and generates then an exception which will "fail" the document.

@talevy has may be more ideas?

Or read from LS if possible the content of CB... Unsure if it's feasible easily.

Correct. We’re basically looking to drop that specific nested doc.event where doc.event.type = X from indexing, but keep the entire doc and the rest of the events.

For more insight, we have 200k individual docs, but due to the nested events on doc.events it translates to 9m docs in the index. If we could ignore a specific event.type it would shed millions off.

It seems Ultimately we’ll likely just have to change our data model.

Can’t you do that as a view in couchbase and replicate that view?

Using Logstash you can with a conditional;

if [fieldname] == "value" {
  drop{ }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.