Index "_source.enabled": false does not remove source data and index size does not change at all due to this setting

Hi

Elastic 7.11.2

I am indexing documents through pipelines (using ingest_attachment plugin)

My pipeline looks like this:

PUT _ingest/pipeline/my_pipeline
{
    "description": "pipeline for indexing documents",
    "processors": [
        {
            "attachment": {
                "target_field": "attachment",
                "field": "sisu",
                "indexed_chars": -1
            }
        }
    ]
}

Field "sisu" is html document source and encoded with base64, so after indexing it does look like this:

"_source" : {
.....
"sisu" : "77u/PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4NCjwhRE9DVFlQRSBodG1sID4NCjxodG1sPg0KICA8aGVhZD4NCiAgICA8c3R5bGU+DQppbWcgew0KICAgIC1raHRtbC1.............."
}

Now when I create 2 indexes. One has source enabled and other has source disabled.

PUT myidx_src_enabled
{
    "mappings": {
        "dynamic": false,
        "_source": { "enabled": true },
        "properties": {
            "kohtuasjaObjektId": { "type": "integer" },
            "menetluseObjektId": { "type": "integer" },
            "failiId": { "type": "integer" },
            "lehekyljeId": { "type": "integer" },
            "lkNumber": { "type": "integer" },
            "sisu": { "type": "text", "index": false },
            "attachment": {
                "properties": {
                    "content": { "type": "text" }
                }
            }
        }
    }
}

I will index single html document, which is 35kb in size
And my index is 39.8kb in size

Now I will create new index with source disabled

PUT myidx_src_disabled
{
    "mappings": {
        "dynamic": false,
        "_source": { "enabled": false },
        "properties": {
            "kohtuasjaObjektId": { "type": "integer" },
            "menetluseObjektId": { "type": "integer" },
            "failiId": { "type": "integer" },
            "lehekyljeId": { "type": "integer" },
            "lkNumber": { "type": "integer" },
            "sisu": { "type": "text", "index": false },
            "attachment": {
                "properties": {
                    "content": { "type": "text" }
                }
            }
        }
    }
}

I will index single html document, which is the exact same document 35kb in size
And my index size is the same 39.8kb

I have tried:

  • Disabling source:
    "_source": { "enabled": false },
  • Specifying "store": false for the "sisu"
    "sisu": { "type": "text", "index": false, "store": false},
  • Removing whole "sisu" mapping althogether
    So removing this row: "sisu": { "type": "text", "index": false },

It does not matter what I do, either way seems like this "sisu" is still present there and this size does not change at all.

Am I doing something wrong..?

Regards
Raul

My advice would be to add a remove processor after the attachment processor and totally remove the field sisu.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.