Should I set the zero value of a floating-point field to 0.0 rather than 0, to help dynamic detection?

I think the answer is “Yes”, but I thought it was worth asking this question. Perhaps the answer is “No” or “It doesn’t matter”; in which case, I’ll have learned something about dynamic mapping, and I can save bytes by omitting the trailing .0.

I am a member of the development team for a product that extracts data from proprietary binary-format logs, and then forwards that data to Logstash; for example, as JSON Lines over TCP.

Currently, the product represents floating-point field values in JSON like this, where a zero value is represented as a single digit, for conciseness:

"wait": 10.123456
....
"wait": 0

In practice, Elasticsearch gets the mapping “right”: it maps wait to double. (I’m currently testing in a backlevel Elasticsearch, 2.4; in Elastic 5.2, I’d expect this to be a float.)

But I’m curious how Elasticsearch does that, because, on its own, without sampling other values of that same field, the value 0 makes the wait field look like an integer (or long).

Perhaps I’ve just been “lucky” so far, but that thought makes me nervous. I’d like to get a better understanding of how Elasticsearch dynamically maps numbers in JSON to its own, more granular, numeric data types.

I’ve read the Elastic documentation topic “Dynamic field mapping”. (Nit: while I can understand the reason for the separate rows, the values “floating point number” and “integer” under the column heading “JSON datatype” are both misleading and strictly incorrect: JSON has no such separate numeric data types.)

I confess I’ve not tried too hard to “trick” Elasticsearch—say, by feeding it a bunch of 0 values for a field in a new index, followed by a floating-point value for the same field. I’m curious: would Elasticsearch initially set the mapping to long, and then, when it met the floating-point value, adjust the mapping to float?

Back to the primary question, though: should I change “my” product to output 0.0 for floating-point fields, instead of just 0, to help Elasticsearch dynamic mapping? (And: why, or why not?)

I’ve tried this in Elastic 5.2. The answer is “No”.

In detail:

  • If I forward to a new Elasticsearch index—via the Logstash elasticsearch output; hence, via the Elasticsearch bulk API—an event with a field with the value 0, and then I query the mapping in Elasticsearch, the (dynamically detected) data type is long

  • If I do the same thing with the value 0.0, the data type is float

  • If I forward to a new index a field with the value 0, and then I follow that with an event where that field has the value 0.0, then the data type remains long

I don’t get a mapping error from Elasticsearch when I forward a floating-point value for a field that has been mapped to long. I’m curious about that; I’m going to create a new topic.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.