Elasticsearch 2.0 and Spark - TimestampType conversion issue

(eliasah) #1

I'm trying to write a DataFrame into Elasticsearch 2.0 with the following schema

|-- actionId: long (nullable = true)
|-- userId: long (nullable = true)
|-- saleDate: timestamp (nullable = true)

When the index is created during the job the saleDate fields seems to be converted into a long. Here is a part of the mapping :

"saleDate": { "type": "long" },

Is this behavior expected? If so How would it be possible to write a time-stamp field without declaring the mapping before-hand?

(Costin Leau) #2

There's no functionality in ES-Hadoop to force a certain type conversion outside the existing type and that is on purpose.
Elasticsearch can do that much more reliably and better than the connector by declaring the mapping (sometimes templates help a lot) apriori.

(eliasah) #3

Not even if we give the schema of the DataFrame when we want to write to Elasticsearch?

(Costin Leau) #4

No. The schema is simply Spark's representation of the data. The mapping in ES, is its own.
The conversion in ES relies on conventions and if needed, is pluggable through ValueReader/ValueWriter.

By the way, have you tried using the es.mapping.date.rich parameter introduced in 2.1?

(Lior Baber) #5

it happened to me as well see my Stackoverfollow quetsion

you can see in my code that I did use the es.mapping.date.rich parameter (which suppose to be true by default)
any new update regarding this?


(system) #6