I think the answer is “Yes”, but I thought it was worth asking this question. Perhaps the answer is “No” or “It doesn’t matter”; in which case, I’ll have learned something about dynamic mapping, and I can save bytes by omitting the trailing .0
.
I am a member of the development team for a product that extracts data from proprietary binary-format logs, and then forwards that data to Logstash; for example, as JSON Lines over TCP.
Currently, the product represents floating-point field values in JSON like this, where a zero value is represented as a single digit, for conciseness:
"wait": 10.123456
....
"wait": 0
In practice, Elasticsearch gets the mapping “right”: it maps wait
to double
. (I’m currently testing in a backlevel Elasticsearch, 2.4; in Elastic 5.2, I’d expect this to be a float
.)
But I’m curious how Elasticsearch does that, because, on its own, without sampling other values of that same field, the value 0
makes the wait
field look like an integer (or long
).
Perhaps I’ve just been “lucky” so far, but that thought makes me nervous. I’d like to get a better understanding of how Elasticsearch dynamically maps numbers in JSON to its own, more granular, numeric data types.
I’ve read the Elastic documentation topic “Dynamic field mapping”. (Nit: while I can understand the reason for the separate rows, the values “floating point number” and “integer” under the column heading “JSON datatype” are both misleading and strictly incorrect: JSON has no such separate numeric data types.)
I confess I’ve not tried too hard to “trick” Elasticsearch—say, by feeding it a bunch of 0
values for a field in a new index, followed by a floating-point value for the same field. I’m curious: would Elasticsearch initially set the mapping to long
, and then, when it met the floating-point value, adjust the mapping to float
?
Back to the primary question, though: should I change “my” product to output 0.0
for floating-point fields, instead of just 0
, to help Elasticsearch dynamic mapping? (And: why, or why not?)