Hi
I am using pyspark with es-hadoop process es data
ES 7.4.0
spark 2.3.1
PUT test
{
"mappings": {
"properties": {
"price": {
"type": "short"
}
}
}
}
PUT test/_doc/1
{
"price": 1
}
pyspark --driver-class-path ~/jars/elasticsearch-hadoop-7.4.0.jar --jars ~/jars/elasticsearch-hadoop-7.4.0.jar
conf = {
'es.resource': 'test',
"es.nodes.wan.only": "true",
"es.nodes": 'http://localhost:9200',
"es.port": '9200',
'es.net.http.auth.user': '',
"es.net.http.auth.pass": '',
}
rdd = sc.newAPIHadoopRDD(inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=conf)
"""
ERROR:
Task 0.0 in stage 1.0 (TID 1) had a not serializable result: org.apache.hadoop.io.ShortWritable
Serialization stack:
- object not serializable (class: org.apache.hadoop.io.ShortWritable, value: 1)
- writeObject data (class: java.util.HashMap)
- object (class java.util.HashMap, {price=1})
- field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
- object (class scala.Tuple2, (1,{price=1}))
- element of array (index: 0)
- array (class [Lscala.Tuple2;, size 1); not retrying
Traceback (most recent call last):
"""
When i change short to long, i got the correct es data, why short type not serializable?