Should I set the zero value of a floating-point field to 0.0 rather than 0, to help dynamic detection?

GrahamHannington · March 9, 2017, 3:26am

I think the answer is “Yes”, but I thought it was worth asking this question. Perhaps the answer is “No” or “It doesn’t matter”; in which case, I’ll have learned something about dynamic mapping, and I can save bytes by omitting the trailing .0.

I am a member of the development team for a product that extracts data from proprietary binary-format logs, and then forwards that data to Logstash; for example, as JSON Lines over TCP.

Currently, the product represents floating-point field values in JSON like this, where a zero value is represented as a single digit, for conciseness:

"wait": 10.123456
....
"wait": 0

In practice, Elasticsearch gets the mapping “right”: it maps wait to double. (I’m currently testing in a backlevel Elasticsearch, 2.4; in Elastic 5.2, I’d expect this to be a float.)

But I’m curious how Elasticsearch does that, because, on its own, without sampling other values of that same field, the value 0 makes the wait field look like an integer (or long).

Perhaps I’ve just been “lucky” so far, but that thought makes me nervous. I’d like to get a better understanding of how Elasticsearch dynamically maps numbers in JSON to its own, more granular, numeric data types.

I’ve read the Elastic documentation topic “Dynamic field mapping”. (Nit: while I can understand the reason for the separate rows, the values “floating point number” and “integer” under the column heading “JSON datatype” are both misleading and strictly incorrect: JSON has no such separate numeric data types.)

I confess I’ve not tried too hard to “trick” Elasticsearch—say, by feeding it a bunch of 0 values for a field in a new index, followed by a floating-point value for the same field. I’m curious: would Elasticsearch initially set the mapping to long, and then, when it met the floating-point value, adjust the mapping to float?

Back to the primary question, though: should I change “my” product to output 0.0 for floating-point fields, instead of just 0, to help Elasticsearch dynamic mapping? (And: why, or why not?)

GrahamHannington · March 28, 2017, 4:24am

I’ve tried this in Elastic 5.2. The answer is “No”.

In detail:

If I forward to a new Elasticsearch index—via the Logstash elasticsearch output; hence, via the Elasticsearch bulk API—an event with a field with the value 0, and then I query the mapping in Elasticsearch, the (dynamically detected) data type is long
If I do the same thing with the value 0.0, the data type is float
If I forward to a new index a field with the value 0, and then I follow that with an event where that field has the value 0.0, then the data type remains long

I don’t get a mapping error from Elasticsearch when I forward a floating-point value for a field that has been mapped to long. I’m curious about that; I’m going to create a new topic.

system · April 25, 2017, 4:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Varying data types in object causing mapping errors Elasticsearch	3	230	September 25, 2023
Is this an error in documentation of "numeric_detection" Elasticsearch	6	475	January 18, 2021
Storing a floating-point value in a long field: what, no mapping error? Elasticsearch	4	6146	April 26, 2017
Float value is matched as integer Elasticsearch	1	336	August 19, 2021
Setting precision_step to zero Elasticsearch	3	602	July 6, 2017

Should I set the zero value of a floating-point field to 0.0 rather than 0, to help dynamic detection?

Related topics