Reindex creating a lot of deleted documents

Hello,

I've been doing a bit reindexing on my cluster to change the analyser type on a field. No other settings are changed.

To clarify:
index AAA is the source and BBB is the destination.

AAA has a field called data with the following mapping:
"data": {
"type": "text",
"analyzer": "my_analyzer"
},
I create a brand new index BBB and copy the mapping from AAA exactly apart from a change to the data field. e.g.
PUT BBB
{
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 0
}
}
}

PUT BBB/_mapping/doc
{
"properties": {
//all other mappings the same
"data": {
"type": "text"
}
}
}

I then run
POST _reindex
{
"source": {
"index": "AAA"
},
"dest": {
"index": "BBB"
}
}

I get a lot of docs.deleted yet I get no errors. Indexing shows a rate of 20K/s though after 24h my doc count was only 20Million in the new index. When I went to cancel the task there was a large amount of reindex tasks (this may or may not be normal after 24h)

Can you help me understand what this means and point me in the right direction to investigate why I'm getting deleted docs in the new index.

I processed 1.7Billion records from 3 other indexes last week just fine using the same process. I have 3 indexes with around 400M in each that show these symptoms but I can't figure out why.

Thanks.

If anyone else comes across this issue I resolved it by adding the version type as external to the reindex command like so.

POST _reindex
{
"source": {
"index": "AAA"
},
"dest": {
"index": "BBB",
"version_type": "external"
}
}

This took the indexing speed from 16K/s down to around 5K/s but it completed with no docs.deleted in the destination index.

Can anyone help me understand this issue? Do I somehow have duplicates in my data? I let elasticsearch manage the ids of my documents, could this be something I need to investigate?

Any pointers would be appreciated.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.