Reindex from Remote not working

I'm investigating upgrading from our ES 1.7.5 system (running on Windows) to ES 5.1.1. I'm doing some testing with Reindex from Remote to see if I can simply bring my current cluster's data over to a 5.x system. However, it's failing for me.
When I issue the command as shown here Reindex From Remote, I receive the following error:

{
   "error": {
      "root_cause": [
         {
            "type": "illegal_argument_exception",
            "reason": "Cannot guess the xcontent type without mark/reset support on class org.apache.http.nio.entity.ContentInputStream"
         }
      ],
      "type": "illegal_argument_exception",
      "reason": "Cannot guess the xcontent type without mark/reset support on class org.apache.http.nio.entity.ContentInputStream"
   },
   "status": 400
}

To get more info, I decided to up the rootLogger.level to debug. When I run the request again, I get a different error:

{
   "error": {
      "root_cause": [
         {
            "type": "illegal_argument_exception",
            "reason": "Required [version]"
         }
      ],
      "type": "illegal_argument_exception",
      "reason": "Required [version]"
   },
   "status": 400
}

The debug logging I've seen doesn't really seem to be posting a request to the remote system. All I've seen in the logs are GETs like:

[2016-12-20T16:04:38,708][DEBUG][o.e.c.RestClient         ] request [GET http://myremoteserver:9200/] returned [HTTP/1.1 200 OK]
[2016-12-20T16:04:38,714][DEBUG][tracer                   ] curl -iX GET 'http://myremoteserver:9200/'
# HTTP/1.1 200 OK
# Content-Type: text/plain; charset=UTF-8
# Content-Length: 9
#
# {"OK":{}}

Is there some incompatibility between 1.7.5 and 5.1.1? Is this some other known issue?

Thanks,
Tom Doman
tom@degreed.com

No, it isn't. Though I recently fixed another thing to do with content detection and Cannot guess the xcontent type without mark/reset support on class org.apache.http.nio.entity.ContentInputStream seems likely to be a bug.

Usually the content type detection isn't needed.... What do you get back when you curl -iX GET 'http://myremoteserver:9200/'? The response that the tracer logged doesn't look like the kind of response I'd expect from Elasticsearch.

Yeah, what I showed above is pretty much the extent of it. It looks like it never really issues the query to get data from the remote server at all so there's no info to use to guess at the content type ... there is no content. Note that the log snippet below is what happens when I have the log level turned up to debug which doesn't yield the Cannot guess the xcontent type exception. Anyway, here's a little more from the log in case it helps:

[2016-12-21T11:16:57,023][DEBUG][o.a.h.i.n.c.PoolingNHttpClientConnectionManager] Connection released: [id: http-outgoing-0][route: {}->http://myremoteserver:9200][total kept alive: 1; route allocated: 1 of 10; total allocated: 1 of 30]
[2016-12-21T11:16:57,025][DEBUG][o.e.c.RestClient         ] request [GET http://myremoteserver:9200/] returned [HTTP/1.1 200 OK]
[2016-12-21T11:16:57,028][DEBUG][tracer                   ] curl -iX GET 'http://myremoteserver:9200/'
# HTTP/1.1 200 OK
# Content-Type: text/plain; charset=UTF-8
# Content-Length: 9
#
# {"OK":{}}
[2016-12-21T11:16:57,030][DEBUG][o.a.h.i.n.c.PoolingNHttpClientConnectionManager] Connection manager is shutting down
[2016-12-21T11:16:57,030][DEBUG][o.a.h.i.n.c.ManagedNHttpClientConnectionImpl] http-outgoing-0 10.0.0.253:35888<->13.85.73.78:9200[ACTIVE][r:r]: Close
[2016-12-21T11:16:57,532][DEBUG][o.a.h.i.n.c.InternalIODispatch] http-outgoing-0 [CLOSED]: Disconnected
[2016-12-21T11:16:58,032][DEBUG][o.a.h.i.n.c.PoolingNHttpClientConnectionManager] Connection manager shut down
[2016-12-21T11:16:58,038][DEBUG][r.suppressed             ] path: /_reindex, params: {}
java.lang.IllegalArgumentException: Required [version]
	at org.elasticsearch.common.xcontent.ConstructingObjectParser$Target.finish(ConstructingObjectParser.java:324) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.common.xcontent.ConstructingObjectParser$Target.access$000(ConstructingObjectParser.java:237) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.common.xcontent.ConstructingObjectParser.apply(ConstructingObjectParser.java:143) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.common.xcontent.ConstructingObjectParser.apply(ConstructingObjectParser.java:77) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.index.reindex.remote.RemoteScrollableHitSource$1RetryHelper$1.onSuccess(RemoteScrollableHitSource.java:164) [reindex-5.1.1.jar:5.1.1]
	at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:554) [rest-5.1.1.jar:5.1.1]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:309) [rest-5.1.1.jar:5.1.1]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:300) [rest-5.1.1.jar:5.1.1]
	at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119) [httpcore-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) [httpasyncclient-4.1.2.jar:4.1.2]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.client.InternalRequestExecutor.inputReady(InternalRequestExecutor.java:83) [httpasyncclient-4.1.2.jar:4.1.2]
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) [httpasyncclient-4.1.2.jar:4.1.2]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) [httpasyncclient-4.1.2.jar:4.1.2]
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) [httpcore-nio-4.4.5.jar:4.4.5]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
[2016-12-21T11:16:58,076][DEBUG][o.a.h.i.n.c.InternalIODispatch] http-outgoing-0 [CLOSED] [content length: 9; pos: 9; completed: true]
[2016-12-21T11:16:58,084][DEBUG][i.n.h.c.c.ZlibCodecFactory] -Dio.netty.noJdkZlibDecoder: false
[2016-12-21T11:16:58,085][DEBUG][i.n.h.c.c.ZlibCodecFactory] -Dio.netty.noJdkZlibEncoder: false

Is there a stack trace under that?

Yep, sorry I should have included that one originally. I just added it to the log section above.

OK! So you've uncovered some bugs in reindex-from-remote, but I think they are in the error-reporting portion. You should be able to work around the issues. It looks like you are pointing to something that isn't responding with Elaticsearch's normal JSON. That is why I asked what it looked like if you manually executed those curl commands. If they output what the tracer logged then reindex-from-remote isn't going to work because the thing on the other side doesn't look like Elasticsearch.

Hmmm, if I use the Sense plugin for Chrome and send the following request to my remote 1.7.5 server, it responds w/ data, lots of it.

POST articles/article/_search
{
   "query": {
      "match_all": {}
   }
}

But if I post this to my 5.1.1 system, I get the error we've been discussing:

POST _reindex
{
   "source": {
      "remote": {
         "host": "http://myremoteserver:9200",
         "username": "user",
         "password": "pwd"
      },
      "index": "articles",
      "query": {
         "match_all": {}
      }
   },
   "dest": {
      "index": "articles_dest"
   }
}

It really doesn't seem to matter what I put in the query, it always gives the same error which makes me think that query request isn't being made. Based on the log, I don't see that in the trace.

And what happens if you shell into the 5.1 machine and run exactly

curl -iX GET 'http://myremoteserver:9200/'

?

No need to shell into the 5.1 machine, it's my local Windows 10 box. :slight_smile: Here's what I get when I use curl:

HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 9

{"OK":{}}

Which is exactly what we see in the logs.

If I use the following, I get a nice response:

curl -iX GET "http://user:pwd@myremoteserver:9200/_search"

Note: it's windows, needs the double quotes. :slight_smile:

OK. I think the trouble is that Elasticsearch isn't use preemptive authorization, instead sending the request and retrying if it is told that it needs authorization. Can you make whatever thing is doing the password checking return a 403 Forbidden? Maybe reindex from remote should do preemptive authorization to work around situations like this....

Isn't the "thing doing the password checking" ES 1.7.5? I expected 5.1 would do preemptive authorization since I've provided it with a user name and password.

Is it Shield?

Yeah, I expected that is what it'd do when I first implemented it but it isn't the default in httpclient most things send the right response. We test reindex-from-remote from modern versions of shield but not 1.7.

Nope, we do not use shield in any of our environments. In all of the above, we're going directly to ES. For our non-sensitive dev environments, we just have basic auth set up.

That isn't a feature built into Elasticsearch though.

My 1.7.5 elasticsearch.yml file has the following configuration:

http.basic.enabled: true
http.basic.user: "user"
http.basic.password: "pwd"

If that's not "built in", I don't know what is. As I showed above, invoking a search via CURL w/ my configured user name and password works properly. When I use Sense, the browser prompts me for the user name and password the first time. This feels like a bug to me, I assumed user error at first but sounds like it may be worthwhile to raise it as an issue.

Yeah, it is a bug. I'm trying to help you work around it because I'm not going to get a chance to fix it for a bit.

I've files these issues:



All of which you found.

Perfect, thank you very much Nik, I appreciate it! I'm looking at all my options to get from 1.x to 5.x and if reindex from remote works, that looks like it'll be the best option for us to get there.

It looks like this plugin might be how you got basic auth. Do you know if this is it?