Writing to dynamic/multi-resources not working with Pig and ES-Hadoop 2.2

cjuste · January 25, 2016, 4:47pm

Hello,

I'm trying ES-Hadoop to integrate it in our process. I'm using ES 2.1, Pig 0.14 and the version 2.2.0-rc1 of ES-Hadoop.
I need to write on multiple indexes/types at the time.

I'm reading a file with the current line :

{"partitionId":15,"siteId":"br2ryqd66","visitorId":"0001525cf5423a334e3df","visitId":"00015bbf7a52c4cbba536","eventId":"eawe38cukbpqfmuaursoszqjnly819fs","ts":"2016-01-07T23:54:24.824Z","eventType":"visit","eventName":"visit_closed","eventLive":1,"visit":{},"partner":{},"visit_closed":{},"meta":{"type":"event", "index":"v00000262"}}

I'm trying to store it on ES with Pig using EsStorage.
The command

STORE A INTO 'v00000262/event' USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true','es.http.timeout = 5m', 'es.index.auto.create = false', 'es.mapping.id=eventId', 'es.mapping.timestamp=ts', 'es.mapping.parent=visitorId', 'es.mapping.exclude=meta','es.nodes=$es_url');

works perfectly but the command

STORE A INTO 'v00000262/{meta.type}' USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true','es.http.timeout = 5m', 'es.index.auto.create = false', 'es.mapping.id=eventId', 'es.mapping.timestamp=ts', 'es.mapping.parent=visitorId', 'es.mapping.exclude=meta','es.nodes=$es_url');

returns the error Invalid target URI HEAD@null/v00000262/{meta.type}

In the log, I have (I've deleted the ES IP but it's the right one) :

================================================================================
Pig Stack Trace

ERROR 1002: Unable to store alias A

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias A
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1694)
at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:558)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[...:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:383)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:391)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:467)
at org.elasticsearch.hadoop.rest.RestRepository.indexExists(RestRepository.java:449)
at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexExistence(InitializationUtils.java:203)
at org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:263)
at org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:233)
at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:69)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1767)
at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1443)
at org.apache.pig.PigServer.execute(PigServer.java:1356)
at org.apache.pig.PigServer.access$500(PigServer.java:113)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1689)
... 14 more

I don't see where I've done a mistake...
Is it a bug or I've forgotten something ?

Thanks in advance

costin · January 27, 2016, 10:22pm

Can you turn on logging? Your example looks fine and should work - here's an example I just tried:

Doc with json lines such as:
{"number":"1","name":"Buckethead","url":"Bucketheadland.com","meta":{"type":"awesome"}}

and the following script:

A = LOAD artists.dat USING PigStorage() AS (json: chararray);
STORE A INTO 'json-pig/nestedpattern-{meta.type}' USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true');";

yields the expected result.

P.S. Note that with json input, one cannot select or exclude fields - these options apply only for JSON that is generated by ES from Pig tables.

cjuste · January 28, 2016, 9:48am

Thanks for your help.

Here is my full code :

REGISTER ./elasticsearch-hadoop.jar

A = LOAD './test.json' USING TextLoader() as (json: chararray);
STORE A INTO 'v00000262/{meta.type}' USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true','es.http.timeout = 5m', 'es.index.auto.create = false', 'es.mapping.id=eventId', 'es.mapping.timestamp=ts', 'es.mapping.parent=visitorId', 'es.mapping.exclude=meta','es.nodes=$es_url');

With DEBUG enabled, I get the current logs in PIG Shell (Pig is in local mode, don't know if it matters) :

16/01/28 09:44:29 DEBUG pig.EsStorage: Elasticsearch input marked as JSON; bypassing serialization through [org.elasticsearch.hadoop.serialization.builder.NoOpValueWriter] instead of [class org.elasticsearch.hadoop.pig.PigValueWriter]
16/01/28 09:44:29 DEBUG pig.EsStorage: Using pre-defined writer serializer [org.elasticsearch.hadoop.serialization.builder.NoOpValueWriter] as default
16/01/28 09:44:29 DEBUG pig.EsStorage: Using pre-defined reader serializer [org.elasticsearch.hadoop.pig.PigValueReader] as default
16/01/28 09:44:29 DEBUG pig.EsStorage: JSON input specified; using pre-defined bytes/json converter [org.elasticsearch.hadoop.pig.PigBytesConverter] as default
16/01/28 09:44:29 DEBUG pig.EsStorage: Using pre-defined field extractor [org.elasticsearch.hadoop.pig.PigFieldExtractor] as default
16/01/28 09:44:29 ERROR rest.NetworkClient: Node [...:9200] failed (Invalid target URI HEAD@null/v00000262/{meta.type}); no other nodes left - aborting...
3628 [main] ERROR org.apache.pig.tools.grunt.Grunt  - ERROR 1002: Unable to store alias A
16/01/28 09:44:29 ERROR grunt.Grunt: ERROR 1002: Unable to store alias A
Details at logfile: /shared/clement/pig_1453974266088.log

And the file /shared/clement/pig_1453974266088.log contains :

Pig Stack Trace
---------------
ERROR 1002: Unable to store alias A

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias A
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1694)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
        at org.apache.pig.Main.run(Main.java:558)
        at org.apache.pig.Main.main(Main.java:170)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[...:9200]]
        at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:383)
        at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:391)
        at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:467)
        at org.elasticsearch.hadoop.rest.RestRepository.indexExists(RestRepository.java:449)
        at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexExistence(InitializationUtils.java:203)
        at org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:263)
        at org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:233)
        at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:69)
        at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
        ... 
================================================================================

costin · January 29, 2016, 7:22pm

Can you please try to enable logging on the serialization and REST package as indicated here?

This will provide more information such as whether the connectivity actually works and what request are made to ES.
This is an example from the test suite:

21:18:24,430 TRACE pool-1-thread-1 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"cluster_name":"ES-HADOOP-TEST","nodes":{"ynQtzsBVRQ-8FQSu1rHzvg":{"name":"Man-Spider","transport_address":"local[1]","host":"local","ip":"0.0.0.0","version":"2.2.0-SNAPSHOT","build":"0682430","http_address":"127.0.0.1:9500","attributes":{"local":"true"},"transport":{"bound_address":["local[1]"],"publish_address":"local[1]","profiles":{}}}}}]
21:18:24,432 TRACE pool-1-thread-1 commonshttp.CommonsHttpTransport - Closing HTTP transport to 127.0.0.1:9500
21:18:24,432 TRACE pool-1-thread-1 commonshttp.CommonsHttpTransport - Opening HTTP transport to 127.0.0.1:9500
21:18:24,432 TRACE pool-1-thread-1 commonshttp.CommonsHttpTransport - Tx [GET]@[127.0.0.1:9500][_nodes/http] w/ payload [null]
21:18:24,436 TRACE pool-1-thread-1 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"cluster_name":"ES-HADOOP-TEST","nodes":{"ynQtzsBVRQ-8FQSu1rHzvg":{"name":"Man-Spider","transport_address":"local[1]","host":"local","ip":"0.0.0.0","version":"2.2.0-SNAPSHOT","build":"0682430","http_address":"127.0.0.1:9500","attributes":{"local":"true"},"http":{"bound_address":["[::1]:9500","127.0.0.1:9500"],"publish_address":"127.0.0.1:9500","max_content_length_in_bytes":104857600}}}}]
21:18:24,439 TRACE pool-1-thread-1 commonshttp.CommonsHttpTransport - Closing HTTP transport to 127.0.0.1:9500
21:18:24,440  INFO pool-1-thread-1 mr.EsOutputFormat - Writing to [json-pig/nestedpattern-{meta.type}]
21:18:24,445 TRACE pool-1-thread-1 commonshttp.CommonsHttpTransport - Opening HTTP transport to 127.0.0.1:9500
21:18:24,452 DEBUG pool-1-thread-1 bulk.AbstractBulkFactory - JSON input; using internal field extractor for efficient parsing...
21:18:24,457 TRACE pool-1-thread-1 bulk.JsonTemplatedBulk - About to extract information from [{"number":"1","name":"Buckethead","url":"http://bucketheadland.com","meta":{"type":"1"}}]
21:18:24,457 TRACE pool-1-thread-1 field.JsonFieldExtractors - About to look for paths [[meta.type]] in doc ...

cjuste · February 1, 2016, 8:14am

Here is the log I get :

16/02/01 08:11:42 DEBUG pig.EsStorage: Elasticsearch input marked as JSON; bypassing serialization through [org.elasticsearch.hadoop.serialization.builder.NoOpValueWriter] instead of [class org.elasticsearch.hadoop.pig.PigValueWriter]
16/02/01 08:11:42 DEBUG pig.EsStorage: Using pre-defined writer serializer [org.elasticsearch.hadoop.serialization.builder.NoOpValueWriter] as default
16/02/01 08:11:42 DEBUG pig.EsStorage: Using pre-defined reader serializer [org.elasticsearch.hadoop.pig.PigValueReader] as default
16/02/01 08:11:42 DEBUG pig.EsStorage: JSON input specified; using pre-defined bytes/json converter [org.elasticsearch.hadoop.pig.PigBytesConverter] as default
16/02/01 08:11:42 DEBUG pig.EsStorage: Using pre-defined field extractor [org.elasticsearch.hadoop.pig.PigFieldExtractor] as default
16/02/01 08:11:42 TRACE commonshttp.CommonsHttpTransport: Opening HTTP transport to 10.0.1.155:9200
16/02/01 08:11:42 TRACE rest.NetworkClient: Caught exception while performing request [10.0.1.155:9200][v00000262/{meta.type}] - falling back to the next node in line...
org.elasticsearch.hadoop.rest.EsHadoopTransportException: Invalid target URI HEAD@null/v00000262/{meta.type}
	at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:405)
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:104)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:383)
	at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:391)
	at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:467)
	at org.elasticsearch.hadoop.rest.RestRepository.indexExists(RestRepository.java:449)
	at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexExistence(InitializationUtils.java:203)
	at org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:263)
	at org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:233)
	at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:69)
	at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
	at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
	at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
	at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212)
	at org.apache.pig.PigServer$Graph.compile(PigServer.java:1767)
	at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1443)
	at org.apache.pig.PigServer.execute(PigServer.java:1356)
	at org.apache.pig.PigServer.access$500(PigServer.java:113)
	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1689)
	at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
	at org.apache.pig.Main.run(Main.java:558)
	at org.apache.pig.Main.main(Main.java:170)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.commons.httpclient.URIException: escaped absolute path not valid
	at org.apache.commons.httpclient.URI.setRawPath(URI.java:2837)
	at org.apache.commons.httpclient.URI.parseUriReference(URI.java:2023)
	at org.apache.commons.httpclient.URI.<init>(URI.java:147)
	at org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java:265)
	at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:403)
	... 34 more
16/02/01 08:11:42 ERROR rest.NetworkClient: Node [10.0.1.155:9200] failed (Invalid target URI HEAD@null/v00000262/{meta.type}); no other nodes left - aborting...
102230 [main] ERROR org.apache.pig.tools.grunt.Grunt  - ERROR 1002: Unable to store alias A
16/02/01 08:11:42 ERROR grunt.Grunt: ERROR 1002: Unable to store alias A

Seems it's a bug don't you think ?

cjuste · February 4, 2016, 9:37am

I tried again with the release 2.2 but I still have the same error...

costin · February 21, 2016, 6:07pm

@cjuste it sure looks like a bug however as I've mentioned above in the logs, the problem is I cannot reproduce it. In fact the test suite already contains a test similar to your example.

Based on your configuration it looks like the $es_url is the only thing that is different - would changing that make a difference?
Also can you confirm you have only one version of ES and that is the latest one?

cjuste · February 22, 2016, 4:04pm

@costin I've just tried with a fresh install of elasticsearch.

I installed Elastic Search on Ubuntu 14.04 LTS using the .deb package 2.2.0. There's only 1 node.
I've just downloaded ES-Hadoop 2.2.0 (just in case).

Here's my full pig script :

%default es_url '10.0.1.145'

REGISTER ./elasticsearch-hadoop-pig.jar;

A = LOAD './test.json' USING TextLoader() as (json: chararray);
STORE A INTO 'v00000262/{meta.type}' USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true','es.http.timeout = 5m', 'es.index.auto.create = false', 'es.mapping.id=eventId', 'es.mapping.timestamp=ts', 'es.mapping.parent=visitorId', 'es.mapping.exclude=meta','es.nodes=$es_url');

Here is the log I get :

16/02/22 16:02:59 DEBUG pig.EsStorage: Elasticsearch input marked as JSON; bypassing serialization through [org.elasticsearch.hadoop.serialization.builder.NoOpValueWriter] instead of [class org.elasticsearch.hadoop.pig.PigValueWriter]
16/02/22 16:02:59 DEBUG pig.EsStorage: Using pre-defined writer serializer [org.elasticsearch.hadoop.serialization.builder.NoOpValueWriter] as default
16/02/22 16:02:59 DEBUG pig.EsStorage: Using pre-defined reader serializer [org.elasticsearch.hadoop.pig.PigValueReader] as default
16/02/22 16:02:59 DEBUG pig.EsStorage: JSON input specified; using pre-defined bytes/json converter [org.elasticsearch.hadoop.pig.PigBytesConverter] as default
16/02/22 16:02:59 DEBUG pig.EsStorage: Using pre-defined field extractor [org.elasticsearch.hadoop.pig.PigFieldExtractor] as default
16/02/22 16:02:59 TRACE commonshttp.CommonsHttpTransport: Opening HTTP transport to 10.0.1.145:9200
16/02/22 16:02:59 TRACE rest.NetworkClient: Caught exception while performing request [10.0.1.145:9200][v00000262/{meta.type}] - falling back to the next node in line...
org.elasticsearch.hadoop.rest.EsHadoopTransportException: Invalid target URI HEAD@null/v00000262/{meta.type}
	at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:443)
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:104)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:423)
	at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:431)
	at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:507)
	at org.elasticsearch.hadoop.rest.RestRepository.indexExists(RestRepository.java:467)
	at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexExistence(InitializationUtils.java:203)
	at org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:263)
	at org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:233)
	at ...
16/02/22 16:02:59 ERROR rest.NetworkClient: Node [10.0.1.145:9200] failed (Invalid target URI HEAD@null/v00000262/{meta.type}); no other nodes left - aborting...
5442 [main] ERROR org.apache.pig.tools.grunt.Grunt  - ERROR 1002: Unable to store alias A
16/02/22 16:02:59 ERROR grunt.Grunt: ERROR 1002: Unable to store alias A

I'm running pig 0.15.0 from HortonWorks on Ubuntu 14.04

costin · February 22, 2016, 6:44pm

I'll try to replicate the issue myself. If you replace es_url with the actual value do you see any change in behaviour?

cjuste · February 23, 2016, 7:59am

It doesn't change anything, unfortunately...

cjuste · March 7, 2016, 4:33pm

I have OpenJDK 1.7.0_91 on the pig VM.
May it cause any trouble ?

costin · March 8, 2016, 7:49am

It might - any reason why you are not using Sun/Oracle JDK?

cjuste · March 8, 2016, 9:11am

I'm now using Oracle JDK 8. But this hasn't changed anything. But, I've found another thing.
If I change my request, I have a totally different error. I reduced the number of parameters, to only specify json and es.nodes.

STORE A INTO '{index}/{type}' USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true','es.nodes=$es_url');

In this case, it connects correctly to ES (I get all the nodes' IP).
And I get the error:

java.lang.Exception: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: id must not be null
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: id must not be null
	at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:467)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:415)
	at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:145)
	at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:225)
	at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:248)
	at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:267)
	at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:214)
	at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:196)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.close(PigOutputFormat.java:146)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:670)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Seems the problems come from my mapping parameters (I've deleted the mapping.exclude but doesn't change anything).

costin · March 12, 2016, 2:11pm

Turn on REST logging to see what data is sent to ES and the error being returned.

cjuste · March 15, 2016, 1:40pm

I know why I got previous error (id must not be null). I've specified routing and parent as required, so it's logical that if I don't specify them, I get an error. But this means I've successfully contacted ES, contrary to previous error (escaped absolute path not valid).

I deduce that the matter comes from combining mapping parameters and changing index/type.

You said

without specifying any mapping. Have you tried with a mapping ?

Topic		Replies	Views
[Hadoop] writing ES string array from Pig using elasticsearch-hadoop plugin Elasticsearch	1	361	July 6, 2017
[hadoop] Extra Documents in Elastic Search Elasticsearch	3	356	July 6, 2017
Storing into Elasticsearch using Apache Pig Elasticsearch es-hadoop	17	1588	July 6, 2017
Is it possible to write to ES from a json file in HDFS where JSON file has inconsistent or different keys in different records Elasticsearch	2	321	July 6, 2017
ES - Amazon EMR - Pig Elasticsearch es-hadoop	3	2412	July 6, 2017

Writing to dynamic/multi-resources not working with Pig and ES-Hadoop 2.2

================================================================================ Pig Stack Trace

Related topics

================================================================================
Pig Stack Trace