Es mapreduce integration test fails since 2.0

Hey guys,

I have an IT test for my MR job that does the following:

  1. Precreates index with mapping
  2. Creates test parquet files
  3. Executes MR job that reads parquet files and using EsOutputFromat stores them into ES.
  4. checks if data is in the index

In the test I was creating a node with local set to false and http set to true. And this test was working ok until elasticsearch version 1.7.3. es-hadoop version is 2.1.1
But now Im trying to migrate it to 2.0.0 and I'm having trouble. My MR job is not able to find any nodes and I can see from the console that published address is always where in previous ES version it was someting else. Not sure where the problem lies and I would appreciate any help.

So below the code how I'm creating the node and returning a client in my test:

private static Client buildHttpClient(String dataDirectory, String clusterName) {
  Settings.Builder settings = Settings.settingsBuilder()
      .put("http.enabled", "true")
      .put("path.home", "target/es")
      .put("", dataDirectory);
  node = nodeBuilder().local(false).clusterName(clusterName).settings(settings).node();
  return node.client();

This is my method to get the es.node (host / port) to provide in MR job

private String getEsNode() {
    final NodesInfoResponse nodeInfos =
    NodeInfo nodeInfo = nodeInfos.getNodes()[0];
    TransportAddress publishAddress = nodeInfo.getHttp().getAddress().publishAddress();
    InetSocketAddress address = ((InetSocketTransportAddress) publishAddress).address();
    return address.getHostName() + ":" + address.getPort();

Below console output on 1.7.3 version when test was passing:

15/10/30 13:23:13 INFO elasticsearch.node: [Mechamage] version[1.7.3], pid[29294], build[05d4530/2015-10-15T09:14:17Z]
15/10/30 13:23:13 INFO elasticsearch.node: [Mechamage] initializing ...
15/10/30 13:23:13 INFO elasticsearch.plugins: [Mechamage] loaded [], sites []
15/10/30 13:23:13 INFO elasticsearch.env: [Mechamage] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [420.6gb], net total_space [464.7gb], types [hfs]
15/10/30 13:23:14 WARN elasticsearch.bootstrap: JNA not found. native methods will be disabled.
15/10/30 13:23:15 INFO elasticsearch.node: [Mechamage] initialized
15/10/30 13:23:15 INFO elasticsearch.node: [Mechamage] starting ...
15/10/30 13:23:15 INFO elasticsearch.transport: [Mechamage] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/]}
15/10/30 13:23:15 INFO elasticsearch.discovery: [Mechamage] EsTest/7-HKR-WURp6bB2SCmEPn0A
15/10/30 13:23:19 INFO cluster.service: [Mechamage] new_master [Mechamage][7-HKR-WURp6bB2SCmEPn0A][MPLKD186017-976][inet[/]]{local=false}, reason: zen-disco-join (elected_as_master)
15/10/30 13:23:19 INFO elasticsearch.http: [Mechamage] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/]}
15/10/30 13:23:19 INFO elasticsearch.node: [Mechamage] started
15/10/30 13:23:19 INFO elasticsearch.gateway: [Mechamage] recovered [0] indices into cluster_state
15/10/30 13:23:19 INFO cluster.metadata: [Mechamage] [user-2-1] creating index, cause [api], templates [], shards [5]/[1], mappings []
15/10/30 13:23:19 INFO cluster.metadata: [Mechamage] [user-2-1] create_mapping [attributes]
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See for further details.

And here console output on version 2.0.0 when test is failing:

15/10/30 13:15:00 INFO elasticsearch.node: [Solara] initialized
15/10/30 13:15:00 INFO elasticsearch.node: [Solara] starting ...
15/10/30 13:15:00 INFO elasticsearch.transport: [Solara] publish_address {}, bound_addresses {}, {[fe80::1]:9300}, {[::1]:9300}
15/10/30 13:15:00 INFO elasticsearch.discovery: [Solara] EsTest/cekAX7ZSQHeETODF3xkmJg
15/10/30 13:15:03 INFO cluster.service: [Solara] new_master {Solara}{cekAX7ZSQHeETODF3xkmJg}{}{}{local=false}, reason: zen-disco-join(elected_as_master, [0] joins received)
15/10/30 13:15:03 INFO elasticsearch.http: [Solara] publish_address {}, bound_addresses {}, {[fe80::1]:9200}, {[::1]:9200}
15/10/30 13:15:03 INFO elasticsearch.node: [Solara] started
15/10/30 13:15:03 INFO elasticsearch.gateway: [Solara] recovered [0] indices into cluster_state
15/10/30 13:15:04 INFO cluster.metadata: [Solara] [user-2-1] creating index, cause [api], templates [], shards [5]/[1], mappings []
15/10/30 13:15:04 INFO cluster.metadata: [Solara] [user-2-1] create_mapping [attributes]

Ah and this is the exception I'm getting from actual job:

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring( ~[na:1.8.0_51]
at ~[elasticsearch-hadoop-2.1.1.jar:2.1.1]
at ~[elasticsearch-hadoop-2.1.1.jar:2.1.1]
at ~[elasticsearch-hadoop-2.1.1.jar:2.1.1]
at$EsRecordWriter.init( ~[elasticsearch-hadoop-2.1.1.jar:2.1.1]
at$EsRecordWriter.write( ~[elasticsearch-hadoop-2.1.1.jar:2.1.1]
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write( ~[hadoop-core-2.5.0-mr1-cdh5.3.0.jar:na]
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write( ~[hadoop-core-2.5.0-mr1-cdh5.3.0.jar:na]
at$Context.write( ~[hadoop-core-2.5.0-mr1-cdh5.3.0.jar:na]
at ~[bin/:na]
... 11 common frames omitted

ES-Hadoop 2.1 supports ES 1.x. ES-Hadoop 2.2 supports both ES 1.x and ES 2.x.
I'll emphasize this in the reference docs.

1 Like

Thanks man that helped.

But I used 2.2.0-m1 cause I think full release of es-hadoop is not yet
there right?

The latest release is not m1 but beta1 and I recommend using that instead.