Elasticsearch Hadoop id mapping can't handle big number?

unruledboy · March 23, 2018, 2:20am

Hi,

I save RDD to Elasticsearch via Scala like this:

EsSpark.saveJsonToEs(rdd, index, Map(
"es.mapping.id" -> "IncrementalId"))

When the IncrementalId is small number (like less than 1 billion), the id mapping is fine. But when the IncrementalId is a lot bigger (generated using Snowflake algo), then it became wrong, like this:

IncrementalId        |  Mapped Id in Es ( _id )
106640324646928380   | "106640324646928384"
106640324661608450   | "106640324661608448"
106640324677337090   | "106640324677337088"

Just a few number bigger/smaller

Is it possible Es not being able to handle big number like this when doing the id mapping?

warkolm · March 23, 2018, 2:35am

What did you map it as?

unruledboy · March 23, 2018, 3:27am

Hi, IncrementalId is a big int, and I did not specify any other mapping, the meta "_id" value is string.

I did the test for over 10,000 documents, they all have the same problem. I assume you can simply reproduce it by specifying a big integer number like 106640324646928380 and use "es.mapping.id" to map to the unique _id meta value and see.

system · April 20, 2018, 3:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[elasticsearch-hadoop] How to specify es.mapping.id value from inside a map? Elasticsearch es-hadoop	2	2411	January 17, 2018
Generating custom _id when exporting from hadoop/spark to ES Elasticsearch es-hadoop	2	401	July 28, 2022
Mapping es.mapping.id Elasticsearch	11	619	June 9, 2021
Use saveJsonToEs and always keep the same _id field Elasticsearch es-hadoop	3	1260	July 6, 2017
[Hadoop] Setting Document ID in Map Reduce Mapper Elasticsearch	5	986	July 6, 2017

Elasticsearch Hadoop id mapping can't handle big number?

Related topics