Errors reindexing documents from version 6 to version 7

I have a several hundred daily indexes indexed under versions 5 and 6. I have recently upgraded to 7.10 and now want to clean up and standardise my index mappings.

I am attempting to reindex using a ruby program and the gem : elasticsearch-extensions-0.0.31

  def do_reindex( name = nil, new_name = nil)
    name ||= @name
    new_name ||= @options[:new_name]
    reindex =  source: { index: name, client: @client }, target: { index: new_name }

this results in output to stderr:

warning: 299 Elasticsearch-7.10.0-51e9d6f22758d0374a0f3f5c6e8f3a7997850f96 "[types removal] Specifying types in bulk requests is deprecated."

the reindex perform returns: {:errors=>120853}, i.e one for each doc in the original index.

and in the cluster logs we see:

[2020-11-29T12:31:47,239][INFO ][o.e.a.b.TransportShardBulkAction] [secesprd02] [authentication_2019.09.02][1] mapping update rejected by primary
java.lang.IllegalArgumentException: mapper [event_source] cannot be changed from type [keyword] to [text]
at org.elasticsearch.index.mapper.ParametrizedFieldMapper.merge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.ParametrizedFieldMapper.merge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.ObjectMapper.doMerge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.RootObjectMapper.doMerge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.ObjectMapper.merge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.RootObjectMapper.merge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.Mapping.merge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.DocumentMapper.merge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.MapperService.internalMerge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.MapperService.internalMerge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.MapperService.merge( ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest( [elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun( [elasticsearch-7.10.0.jar:7.10.0]
at [elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary( [elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary( [elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary( [elasticsearch-7.10.0.jar:7.10.0]
at$1.doRun( [elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun( [elasticsearch-7.10.0.jar:7.10.0]
at [elasticsearch-7.10.0.jar:7.10.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:?]
at java.util.concurrent.ThreadPoolExecutor$ [?:?]
at [?:?]

since upgrading to 7.10 the mapping of the source index shows:

  "authx_2019.09.02": {
    "mappings": {
      "properties": {

i.e. no type

and I don't give any type in the reindex.

What am I doing wrong?

It looks like this client is out of date. The client is still using scroll + bulk API to emulate reindex API (source). I believe scroll+bulk pre-dates the official reindex API.

In the source code of the extension you mentioned, the scroll results are taken verbatim and mirrored to bulk requests. I’m sure the _type field is present in the hits from the scroll query, and that’s where the _type makes its way into the bulk requests and generates the errors.

I’d recommend looking for a more up-to-date client, and/or looking over the Reindex API Docs

This is indeed an indication that your client is using an older, deprecated, form of bulk request, but it's not the problem here. It will become a problem in the next major version so you need to address it before you upgrade to 8.x, but you can ignore it for now.

The actual problem is included in the bulk indexing responses. It doesn't look like this library makes it easy to see this, but I think client-side transport logging will show you the details. I lack the Ruby expertise to tell you exactly how to get this, sorry.

The log you shared does suggest the problem might be related to mappings, specifically some confusion around the type of the event_source field. It's trying to guess the type of this field dynamically, which means you're not setting up the mappings before starting the reindex process. I recommend doing that.

Eh I just realised this library is developed by Elastic, so don't worry we'll get to this task before 8.x is released.

Ah, I missed the mention of event_source in the transport logs. Doh! I am fundamentally not visually orientated and I find picking out vital information from the screeds of java traceback really difficult unless I know what I am looking for.

Turns out the problem is really simple! A typo in the index pattern for the target template!

Thanks for the help folks and I will work on the ruby side. I suspect that I am picking up the original reindex call (which will have been left for backward compatibility) and not the one in the extension.

That's why we're here :slight_smile: There is a bit of a knack to it...

The one in the extension (i.e. this one) is doing a client-side reindex (i.e. scroll + bulk indexing). Nothing wrong with that, but if you want the server-side reindex it's this one which is in the main elasticsearch-api library instead.

The server-side one does basically the same thing (scroll + bulk indexing) just without involving the client. I don't think it would have worked in this case either, for the same reason.

I now have the reindexing picking up the correct template but I am now getting errors for every document: mapper [level] cannot be changed from type [integer] to [text] but the mapping of both the original and the target indexes have :

 "level": {
          "type": "integer"

when I look at the index properties using cerebro.

Weird! Any idea how I have stuffed it up this time?

BTW this happens with both implementations of reindex! My program now has a switch to tell it which call to use.

I get it! reindex is working off the _source where level is in fact a text string. I used a script to
explicitly cast it to int and that worked.

editing since this may have been emailed...

No, it worked because a small change to the template json caused ES to ignore the "properties" completely.

So back to trying to figure out why it thinks there is a miss match in the field types.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.