Excluding fields from _source to avoid storing files

Is there anything horribly wrong with that?
I'm trying to get files indexed (via ingest attachment plugin)and searchable, but not stored in ES.
Likeso:

  "_source": {
          "excludes":[
            "data","attachment.content"]
        }

Done a test run, it's behaving as expected, searchable and not in the index.
Opinions?

Thanks for the time and effort :slight_smile:

Iā€™d prefer using a remove processor in the ingest pipeline.

Would the remove processor still keep the data of the field indexed & not store, does that mean that remove processor only removes the field from the source whereas the value is persisted in the index ?

Sorry for hijacking this thread but the question came out of curiosity.

Thanks,
Vikas

1 Like

ah. My answer was may be bad.

I thought the goal was to remove the base64 content only.

One of the thing I dislike with exclusion is that it will give you very bad results if at some point you want to use the reindex api.

1 Like

Thanks for clarifying,

So in the OP's case, he only has the option to perform an "index" & "updates" to the index will loose the data in the index for the said field.

We can use the reindex api to copy / overwrite the documents from another index to the orignal index then, would that be a good way to do this.

Thank you both for your answers.
Okay - I think I see the issue here.
Since neither base64 nor the actual file will be stored in the index, if we try to copy the document to another index via reindex api we won't be able to do so reliably for the terms of the file itself, as reindex api works with _source to do the copy.
Now if a user wants to upload a different document by 'editing' the existing one we would simply re-index it into segments and re-exclude it again. We are also only excluding the source fields related to that file, so we should also still be able to update the document with new/changed metadata fields (assigned tags and categories, who is it shared with etc).
That sound right?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.