Error when moving data from hive using elasticsearch-hadoop plugin to elasticsearch

Hi all,

I raised following issue in the past.

Here the advised solution was to generate unique id for each record to prevent duplicates from getting inserted.
I am creating my primary key using md5 function. I take all the columns which creates uniqueness for the record and create its md5 as primary key.

Before doing this primary key fix, record count used to be higher in elasticsearch because of duplicate insertion
Now after the fix, record count is less in elasticsearch.
I am not able to find the reason.
Here is my ES Table properties.

TBLPROPERTIES('es.mapping.id' = 'id', 'es.nodes' = '%ES_NODES%',
'es.port' = '%ES_PORT%', 'es.index.auto.create' = 'false', 'es.batch.size.bytes' = '1mb', 'es.batch.size.entries' = '500', 'es.batch.write.retry.count' = '100',
'es.batch.write.retry.wait' = '60s', 'es.batch.write.refresh' = 'false','es.nodes.discovery' = 'false',
'es.nodes.client.only' = 'false', 'es.resource' = '%ES_RESOURCE%', 'es.query' = '?q=*', 'es.nodes.wan.only' = 'true')

Am i missing some property?

Harbeer

Also here is the error, which i see in logs.

[HISTORY][DAG:dag_1525939456125_0003_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 1, taskAttemptId=atte mpt_1525939456125_0003_1_00_000002_0, creationTime=1525941957539, allocationTime=1525941958909, startTime=1525941963683, finishTime=1525942544544, timeTaken=580861, status=FAILED, taskFailureType=NO N_FATAL, errorEnum=FRAMEWORK_ERROR, diagnostics=Error: Error while running task ( failure ) : attempt_1525939456125_0003_1_00_000002_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apa che.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row

at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Unfortunately, that error message does not help too much. Can you check the job task logs to see if there's anything else that might highlight a problem?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.