Trouble with reindexing and ingest pipeline

I have problem with reindexing with ingest pipeline. Basically it seems to be working in most cases, but in some it is not and I have no clue how to debug or find a root cause.
Here is my scenario:
I have two indices
1.

PUT /assetsquick-v1
{
  "mappings": {
    "_source": {
      "excludes": [
        "systemMetadata.com:System:searchContentHTML"
      ]
    }, 
    "dynamic" : "false",
      "systemMetadata" : {
        "dynamic" : "false",
        "properties" : {
          "com:System:searchContentHTML" : {
            "properties" : {
              "textValue" : {
                "type" : "text",
                "fields" : {
                  "ft3" : {
                    "type" : "text",
                    "analyzer" : "my_analyzer3"
                  },
                  "ft5" : {
                    "type" : "text",
                    "analyzer" : "my_analyzer5"
                  },
                  "ft8" : {
                    "type" : "text",
                    "analyzer" : "my_analyzer8"
                  },
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  },
                  "std" : {
                    "type" : "text",
                    "analyzer" : "standard"
                  }
                },
                "analyzer" : "my_analyzer"
              }
...
PUT /searchcontenthtml
{
  "mappings": {
    "dynamic": false,
    "properties" : {
      "asset": {"type": "keyword"},
      "textValue" : {
        "type" : "text",
        "index": false
      }
    }
  },
  "settings": {
    "index": {
      "max_ngram_diff": "1",
      "routing": {
        "allocation": {
          "include": {
            "_tier_preference": "data_content"
          }
        }
      },
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  }
}

and policy:

PUT /_enrich/policy/searchcontenthtml-policy
{
  "match": {
    "indices": "searchcontenthtml",
    "match_field": "asset",
    "enrich_fields": ["textValue"]
  }
}

for pipeline:

PUT _ingest/pipeline/searchcontenthtml-pipeline
{
  "description": "Adds html content from searchcontenthtml index into systemMetadata.com:System:searchContentHTML",
  "processors": [
    {
      "enrich": {
        "field": "_id",
        "policy_name": "searchcontenthtml-policy",
        "target_field": "systemMetadata.com:System:searchContentHTML",
        "override": true
      }
    },
    {
      "remove": {
        "field": "systemMetadata.com:System:searchContentHTML.asset",
        "ignore_missing": true
      }
    }
  ]
}

After this reindexing:

POST /_reindex?refresh=true&timeout=60m
{
  "source": {
    "index": "assetsquick-v1"
  },
  "dest": {
    "index": "assetsquick-v2",
    "pipeline": "searchcontenthtml-pipeline"
  }
}

For some documents I'm not able to find documents based on property systemMetadata.com:System:searchContentHTML.textValue, when I was able to find them in assetsquick-v1 and in searchcontenthtml. For most of the documents the search works fine.

GET assetsquick-v2/_search?_source_excludes=*&size=20
{
  "query": {
    "match": {
      "systemMetadata.com:System:searchContentHTML.textValue": "eclipse"
    }
  }
}

Any idea what can be a root cause or where should I look?
The systemMetadata.com:System:searchContentHTML.textValue contains very long texts (15 MB).

More info. Now I'm not able to reproduce the problem. I've just deleted and recreated searchcontenthtml index, policy and pipeline.
Any idea, what can cause the problem? May be reindexing took too space and time? This strange behaviour causes me headache and I'm afraid to run the reindexing in production..

I've found the root cause.
When the policy is executed a static index is created. When you add a new document the static index is not updated, first one has to execute the policy once again

1 Like

Thanks for sharing your solution!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.