Elasticsearch return empty list of objects to spark

I have a problem with Elasticsearch .

I want to read data from Elasticsearch index with Pyspark . my data look like as below:

user_id: 123,
features: {
    hashtags: [
        {
            text: "hello",
            count: 2
        },
        {
            text: "world",
            count: 1
        }
    ]
}
...

I have a problem with Elasticsearch .

I want to read data from Elasticsearch index with Pyspark . my data look like as below:

user_id: 123,
features: {
    hashtags: [
        {
            text: "hello",
            count: 2
        },
        {
            text: "world",
            count: 1
        }
    ]
}
...

and when data loaded it seems Elasticsearch return empty list of objects. my dataframe after read is look like as below:

+----------+-------------------+
|  features|            user_id|
+----------+-------------------+
|{[{}, {}]}|                123|
|    {[{}]}|                384|
|    {[{}]}|                 94|
|{[{}, {}]}|                880|
+----------+-------------------+

I read data from elastic with using this configuration:

tweets = sqlContext.read.format("org.elasticsearch.spark.sql") \
    .option("es.nodes", "localhost") \
    .option("es.port", "9200") \
    .option("es.read.field.as.array.include", "features.hashtags")\
    .option("es.read.field.include", "user_id, features.hashtags")\
    .option("es.resource", "twitter")\
    .load().limit(10)

can you help me for resolve it?

We're now actively discussing this at Elasticsearch return empty list of objects to spark · Issue #1784 · elastic/elasticsearch-hadoop · GitHub.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.