Migrating data off of ES 0.90.3 and into ES 1.7.x

I have found myself in an uncomfortable spot where I have to migrate data off of ES 0.90.3 (I have no access to the server itself, just to the data) and into a different ES 1.7.x (here I have full access).

So far I have tried two approaches:

  • elasticdump (did not work, as it requires ES 1.2 at least)
  • logstash, which gives me this error:

A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["some_host:9200"], index=>"id_prod", query=>"*", codec=><LogStash::Codecs::JSON charset=>"UTF-8">, scan=>true, size=>1000, scroll=>"1m", docinfo=>false, docinfo_target=>"@metadata", docinfo_fields=>["_index", "_type", "_id"], ssl=>false>
Error: [400] {
"error":"SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards failed; shardFailures {
[i22flp_3TGWYPYsOkCG5OA][id_prod][0]: SearchParseException[[id_prod][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.ChannelBufferBytesReference@2a];
}{
[i22flp_3TGWYPYsOkCG5OA][id_prod][2]: SearchParseException[[id_prod][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.ChannelBufferBytesReference@2a];
}{
[i22flp_3TGWYPYsOkCG5OA][id_prod][1]: SearchParseException[[id_prod][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.ChannelBufferBytesReference@2a];
}{
[i22flp_3TGWYPYsOkCG5OA][id_prod][4]: SearchParseException[[id_prod][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.ChannelBufferBytesReference@2a];
}{
[i22flp_3TGWYPYsOkCG5OA][id_prod][3]: SearchParseException[[id_prod][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.ChannelBufferBytesReference@2a]; }]",
"status":400} {:level=>:error}

I have verified that this is related to the source server (removed elasticsearch output from logstash config, leaving just stdout for testing). I have no idea what's going on there and the only thing I have found on the Internets is related to... the number of spaces in the data part of a query.

Any help much appreciated!

When you say you have access to the old data, do you mean the physical data itself (e.g. the data directory)?

I think the first thing I would try is to rsync that over to your new server's data directory, then try to start ES 1.7 and allow it to import the "dangling indices". I don't quite remember, but I don't think the underlying Lucene version has changed enough in that period of time to drop bwc for 0.90.

It'd be worth a try anyway, if you have the raw data directory.

Otherwise, I'm not sure. Make sure you are using the HTTP output for Elasticsearch (not the Java version, which uses the transport client and requires identical ES versions on LS + ES). The logstash folks in the logstash forum and/or #logstash IRC might be able to help more.

Worst case, you could always write a script to dump from one cluster to the other. The Python client for example has a Reindex Helper, which will execute a scan/scroll and reindex into a new index (or new cluster). You could also write something similar in your language of choice. Not ideal, but it'd be an effective option.

Do yo have access to the 0.90 cluster with a transport client?

No, only via HTTP.

How can I check this?

@rysiek can you use the port 9300, or the port that has been configured for internal cluster transport protocol?

Yes, I seem to be able to connect to port 9300 on that server. I think I can safely assume, based on the history of that host, that ES has not been configured to use a different port for this.

I have released 0.90.11.5 of my knapsack export/import plugin which is a transport client using scan/scroll query to fill compressed archives

http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/0.90.11.5/

I hope it works with 0.90.3

If you are lucky, you can export your documents into a compressed archive file, and import it later with another knapsack version into a more recent ES version like 1.7

And how do I install said plugin if I do not have access to the source server?..

You just ramp up a new node with the plugin, this new node connects to port 9300. You said the port is open.

Okay, solved. What we did was:

  • ramp up a new 0.90.3 node that we control
  • cluster it with the source node
  • wait till we get all the data
  • un-cluster it
  • upgrade the new node to the version of ES we run in production
  • use elasticdump to shunt data between the new node and the production cluster.