How to assure that search results are the same numeric type as they are mapped to? (Integer vs Long)

When I have indexed some data, and some fields are stored as long's, when I retrieve it with Java, they are converted to Java Integer values, not retained as longs, despite their storage type in the index.

I have searched here to find out a way of making sure they retain their index storage type, but have not found a satisfactory answer. I can always check the type of the incoming numeric types, but I would rather that the Elasticsearch Java API did that for me. Is there any way to do that?

Prior discussions of this say this is because the search results are returned as JSON with no typing information, so Elasticsearch depends on the JSON libraries it uses to do this, and small numbers are always Integers, big numbers Longs.

Is this the case? Is there anyway around this?

EDIT: I guess a better way to ask the above, and more general, is: Is there any way to force the Java API to Elasticsearch to return search hits as the same Java classes (within reason - Dates, Longs, etc) that they were indexed from?

  • Tim

Hi, exactly same problem here. I have tried some debugs reverse engineering and seems like added mapping to index are not applied in get operation (prepareGet) - e.g. LongFieldMapper. I have not found in documentation whether this is wanted or not. Also no note in documentation how can be mapper explicitly defined in get.
Only noticed project elasticsearch-osem which looks dead.
But I still believe there is some nice solution, I don't want to do the conversion manually too.

Hey,
I have some progress - if fields are explicitly defined in getRequest they are mapped according to mapping:
GetResponse getResponse = client.prepareGet(...).setFields(my_long_field).execute().actionGet();
getResponse.getField(my_long_field).getClass(); // -> Long

In source (getResponse.getSource()) there are values with general conversion.

The field transformation looks to be implemented in org.elasticsearch.action.get.TransportGetAction.get() -> org.elasticsearch.index.get.ShardGetService.innerGet() -> ... (follow gFields)

So if it is possible to enum all fields in get request it can be done easily (then also fetchSource should be disabled).
If it's not, adequate FieldMappers (see ShardGetService.innerGetLoadFromStoredFields()) should be used on returned source.

note: elasticsearch v1.7.3 here (because of some legacy dependencies)

That fields parameter isn't for getting things out of the _source. It does so to be backwards compatible with Elasticsearch 0.90. I believe it won't fetch from the source in 5.0 but I could be mistaken.

Your workaround works but it won't work forever. Your comment about JSON is mostly right (technically they could be serialized as YAML or SMILE or CBOR as well). My suggestion is if you are using sourceAsMap() to yank numbers out of the map and cast them to Number and call longValue/floatValue/intValue. We've already boxed them to cram them into the map so there is no extra cost there.

The reason they come back in JSON is that Elasticsearch documents don't keep their Java class information. source(Map) is entirely an API level construct - it immediately serializes the map to a byte array. Those object never make it over the wire to Elasticsearch and even if they did it wouldn't know what to do with them.

Further, unless it has to, Elasticsearch doesn't parse the _source when it returns documents. This is a fairly significant speed up. It only has to do that parsing if you have source filtering or if you ask for the response to be in a format that differs from the one that you indexed the documents with. It figures out the format that the documents are in by sniffing the first few bytes.

Yes, fields can be casted manually from source (sourceAsMap()) but the point here is to force elastic to use same mapping as used when creating index (.admin().indices().prepareCreate(...).addMapping(...)). I don't want to define/implement mapping twice.
e.g. If there is mapping with "type": "long" I want to get Long automatically when I get document back (prepareGet).