Unable to index the document through ES-Hadoop(Spark) : In local mode it is working ,but from cluster it is not working

Srinivas2 · December 10, 2019, 2:04pm

Hello Everyone,
Need help on below issue.
I am indexing a nested JSON through ES-Hadoop , but it is failing with the below error

org.apache.spark.util.TaskCompletionListenerException: Could not write all entries for bulk operation [1/1]. Error sample (first [5] error messages):
        org.elasticsearch.hadoop.rest.EsHadoopRemoteException: mapper_parsing_exception: failed to parse;org.elasticsearch.hadoop.rest.EsHadoopRemoteException: not_x_content_exception: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
        {"index":{}}

I am using below code to write the document into AWS-ES. In below code nonModifiedDataFrame always have only one record. documentpath have one Json file(size is around 100 MB) and I have to add indexid and indextimestamp as two columns. If I use sparkContext.textFile(documentpath) to create RDD[String] I am unable to add two more columns in the json , so I took dataframe approach and cleaning some information in Json y using Method replaceDotsWithUnderScore.

def indexThroughSpark(spark:SparkSession,indexid:String,documnetPath:String,indexName:String
                       ,indextimestamp:String)={
    import spark.implicits._
    val nonModifiedDataFrame= spark.read.json(documnetPath)
      .withColumn("indexid",lit(indexid))
      .withColumn("indextimestamp",lit(indextimestamp))

    val convertedString:RDD[String] = nonModifiedDataFrame.toJSON.rdd
    val replacedString = convertedString.map{ line => ModifyKeysForDots().replaceDotsWithUnderScore(line)
    }

    val cfg =
      Map(
        ("es.resource","indexfor_spark/_doc")
      )


    EsSpark.saveJsonToEs(replacedString,cfg)



  }

If I execute the above code in my local environment i.e. master is local it is working fine and I am able to see the no of documents , but if I run the same code in cluster and master as Yarn it is failing .

Thanks In advance.

james.baiera · December 17, 2019, 9:08pm

That seems pretty strange. The exception that is being thrown from Elasticsearch is that it cannot determine the content type of the data, though the data type should already be made clear. Can you share which versions of ES-Hadoop and Elasticsearch you are using?

system · January 14, 2020, 9:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cannot index nested documents with ES-Hadoop 6.x.x jar Elasticsearch es-hadoop	2	1331	May 10, 2018
Elasticsearch and spark Elasticsearch	7	1173	July 6, 2017
Unable to index using elastic-hadoop plugin Elasticsearch	8	449	July 6, 2017
Elasticsearch-hadoop: bulk indexing JSON Elasticsearch	5	571	July 6, 2017
Use Spark to index data in HDFS Elasticsearch es-hadoop	2	1631	July 6, 2017

Unable to index the document through ES-Hadoop(Spark) : In local mode it is working ,but from cluster it is not working

Related topics