BTW, please forgive me in advance for even mentioning the word Solr in
this forum because I know ES folks cringe at comparisons between the
two technologies. I understand they are different and I am simply
making an analogy for the "Data Input & Indexing behavior" angle ...
so bear with me here.
The stacktrace from the ES server's NFE is at the end of this thread.
I have faced similar NumberFormatException issues before in Solr as
well. I think these happen simply because the underlying Lucene isn't
ready to accept/ignore an empty string for numbers or date/time data.
So I am assuming that this is no different for ES which is built atop
Lucene as well. (1) Let me know if you agree with me so far.
In Solr, I got around this by having its Data Import Handler run
scripts on the incoming documents to either place a number like -1 as
a placeholder or by removing the field explicitly from the document
So with ES, I was hoping it would be more straightforward. My feed in
ES is the magical and much revered CouchDB river And I try not to
define the mappings myself because ES does such a great job of
figuring them out and it is one of the many many many conveniences of
ES that I want to take advantage of.
I was hoping that ES would acknowledge the fact that letting empty
strings through (for core type fields like number, date and time) has
no merit and would simply ignore the empty values. (2) Is this a "bad"
thing to hope for?
The data that failed looks like:
"nextDay" : "",
"ground" : "",
So imagine my surprise at how well ES did, in order to be able to
guess that shipping.nextDay was supposed to be a number! But then not
ignoring the junk pumped into it as an empty string.
(2) I'm not bad mouthing ES, I'm asking: Can we expect ES to tackle
this or would we be wrong to place such an expectation on ES?
(3) If the data appropriately had a null value then ES would have
handled it already because when there is a (JSON) null value for the
field and the null_value has not been setup then ES defaults to not
adding the field at all. That is not the case here so what would the
workaround be? If any? Sanitize my data? Oh lord the tears are rolling
down my cheeks, please say that's not my only option.
Please let me know what you think.
=== STACKTRACE ====
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
Caused by: java.lang.NumberFormatException: empty String
... 14 more