Reindexing elastic search does not return all documents

0

I have about 1.5 million documents in my elastic search. I'm hoping to reindex them so that each index filters documents containing certain keywords, and one ( null index ) that do not contain any of the keywords I specified in other indices. I'm not sure why my indices returned fewer documents than expected. Particularly I'm expecting about 1.2 million documents in the null index but it only returned about 30k documents in the new index. Would appreciate ideas on what I've done wrong here!

This is how I reindex documents containing certain keywords in multiple fields

curl --location --request POST 'http://abcdef2344:9200/_reindex' \
--header 'Content-Type: application/json' \
--data-raw '{
  "source": {
    "index": "mydocs_email_*",
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword1"
                  }
                },
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword2"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "analysis_keywords"
  }
}'

Then I use must_not to create another index that do not contain keyword1 and keyword2 .

curl --location --request POST 'http://abcdef2344:9200/_reindex' \
--header 'Content-Type: application/json' \
--data-raw '{
  "source": {
    "index": "mydocs_email_*",
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "must_not": [
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword1"
                  }
                },
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword2"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "analysis_null"
  }
}'

The null index returned 29.7k documents. From the error message it looks like I should expect 1.28 million files. It also said I need to increase the number of fields in the index - which I also did after running the codes above. Though the number of files still stay the same.

{"took":53251,"timed_out":false,"total":1277428,"updated":243,"created":29755,"deleted":0,"batches":30,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"analysis_null","type":"_doc","id":"/email/.......msg","cause":{"type":"illegal_argument_exception","reason":"Limit of total fields [1000] in index [analysis_null] has been exceeded"},"status":400}]

I increased the field limit before reindexing and solved the issue

> DELETE analysis_null
> 
> PUT analysis_null
> {
>   "settings": {
>     "index.mapping.total_fields.limit": 10000
>   }
> }