Fields containing only whitespace are assigned null values by EsInputFormat in a JSON document

I am using EsInputFormat on EC2 to query data from Elasticsearch.

Example document:

{
"feature_key": {
"brand_name": " ",
"product_type": "HOME",
"language_tag": "ja_JP"
},
"stats": {
"1_count": 0,
"2_count": 1,
"3_count": 1,
},
}

The brand_name field contains the IDEOGRAPHIC WHITESPACE character
(whitespace in Japanese). I have disabled analyze on the feature_key field
and all its sub-fields. After I finished indexing all documents, if I query
through the rest end point, the document gets returned as is.

But using the EsInputFormat to query the document returns null value in the
brand_name field.

{
"feature_key": {
"brand_name": null,
"product_type": "HOME",
"language_tag": "ja_JP"
},
"stats": {
"1_count": 0,
"2_count": 1,
"3_count": 1,
},
}

How can change this behavior? I do not want the _source/document to be
modified in any way.

Thanks,
Suchin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f30efa1e-4e75-4e59-9e4e-dba7c5982aed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Looks like a bug likely caused by the conversion to Hadoop Writable. Can you please raise an issue at Github (under
es-hadoop)?

Thanks,

On 4/7/15 10:11 PM, Suchindra Agarwal wrote:

I am using EsInputFormat on EC2 to query data from Elasticsearch.

Example document:

|
{
"feature_key":{
"brand_name":" ",
"product_type":"HOME",
"language_tag":"ja_JP"
},
"stats":{
"1_count":0,
"2_count":1,
"3_count":1,
},
}
|

The brand_name field contains the IDEOGRAPHIC WHITESPACE character (whitespace in Japanese). I have disabled analyze
on the feature_key field and all its sub-fields. After I finished indexing all documents, if I query through the rest
end point, the document gets returned as is.

But using the EsInputFormat to query the document returns null value in the brand_name field.

|
{
"feature_key":{
"brand_name":null,
"product_type":"HOME",
"language_tag":"ja_JP"
},
"stats":{
"1_count":0,
"2_count":1,
"3_count":1,
},
}
|

How can change this behavior? I do not want the _source/document to be modified in any way.

Thanks,
Suchin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f30efa1e-4e75-4e59-9e4e-dba7c5982aed%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f30efa1e-4e75-4e59-9e4e-dba7c5982aed%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5524C1B7.3090204%40gmail.com.
For more options, visit https://groups.google.com/d/optout.