Coerce long to double not working


(Benjamin Gathmann) #1

Hi there,
I have problems with importing data because some of my values do not always have the same type.
I expected that this would be no big deal due to Elasticsearch's index.mapping.coerce feature.
So I went ahead and did (using Python):

report = open("mydocument.json",'rb').read()
es.index(index='abc', doc_type="reports", body=report)

However, this failed with a MapperParsingException

current_type [string], merged_type [long]

The documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/coerce.html) implies that index.mapping.coerce is true by default, but I anyway created my index again and explicitly set the option to true, like so:

 es.indices.create(index='abc', body='{settings:{ "index.mapping.coerce": true}}')

Then I had another try at indexing my document, but this time I got stuck at another MapperParsingException:

current_type [double], merged_type [long]

The value in question is a simply 0. I cannot imagine how there can be a problem in coercing this to a double 0.0. Is there anything else I need to configure?

Btw, I use Elasticsearch 2.1

Benjamin


(Nik Everett) #2

Is there any chance you can reproduce the problem with a sequence of curl commands? Like in a way so that I could copy and paste it to a terminal to reproduce it? Without something like that helping with issues like this ends up being a crazy guessing game.


(Benjamin Gathmann) #3

Hi Nik,
Can I email you the file in question?

I do not use curl, but the Python Elasticsearch library, and the complete code is like this:

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': '192.168.56.102', 'port': 9200}])
es.indices.create(index='abc', body='{settings:{ "index.mapping.coerce": true}}')
report = open("mydocument.json",'rb').read()
es.index(index='abc', doc_type="reports", body=report)

Benjamin


(Benjamin Gathmann) #4

OK, here comes a very simple example using CURL:

ben@ben-ubuntu14:~$curl -XPUT '192.168.56.102:9200/test/' -d '{"settings":{"index.mapping.coerce": true}}'
{"acknowledged":true}
ben@ben-ubuntu14:~$ curl -XPUT '192.168.56.102:9200/test/some/1' -d '{"persons":[{"name":"Steven", "age":"unknown"},{"name":"Martin", "age":25}]}'
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Merging dynamic updates triggered a conflict: mapper [persons.age] of different type, current_type [string], merged_type [long]"}],"type":"mapper_parsing_exception","reason":"Merging dynamic updates triggered a conflict: mapper [persons.age] of different type, current_type [string], merged_type [long]"},"status":400}

(Nik Everett) #5

Sure. The email from my profile should reach me.

The trouble with the Python Elasticsearch library is that it limits the audience. If I wanted to reproduce exactly what you're doing I'd have to setup python and everything. If I turn it into curl commands might lose something in translation. If you do it you can make sure the bug still comes up. Anyway, this is my guess at a recreation:

curl -XDELETE 'localhost:9200/abc?pretty'
curl -XPUT 'localhost:9200/abc?pretty' -d'{
  "settings": {
    "index.mapping.coerce": true
  }
}'
cat >> mydocument.json << HERE
 doc here
HERE
curl -XPOST localhost:9200/_bulk --data-binary "@mydocument.json"; echo

(Nik Everett) #6

Cool. Much simpler. I can look at that.


(Nik Everett) #7

Ok - got it. This works:

curl -XDELETE localhost:9200/test?pretty
curl -XPUT 'localhost:9200/test?pretty' -d '{  "settings":{"index.mapping.coerce": true}}'
curl -XPUT 'localhost:9200/test/some/1?pretty' -d '{"persons":[{"Jack":"Martin", "age":25}]}'
curl -XPUT 'localhost:9200/test/some/1?pretty' -d '{"persons":[{"name":"Steven", "age":"10"},{"name":"Martin", "age":25}]}'

You had two problems in your example:

  1. The first thing that came for age was a string so it inferred a string as the type of the field. Dynamic field creation is funny like that. I usually actually turn it off in production because I want control of they types.
  2. Adding that first persons list with the age in there as a number got it all created properly but then it complained that "unknown" wasn't a number. I don't think it's meant to handle things like "unknown".

(Benjamin Gathmann) #8

I did some more testing, and the actual reason for my problem might be this:
If I send several entries of the same kind with different types at once, the coercion fails (and a mapper_parsing_exception is thrown).
Now look at this example:

ben@ben-ubuntu14:~$curl -XPOST '192.168.56.102:9200/test/persons/' -d '{"name":"Steven", "age":"unknown"}'
{"_index":"test","_type":"persons","_id":"AVGMlpEeWllYL3SlR_qa","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}
ben@ben-ubuntu14:~$ curl -XPOST '192.168.56.102:9200/test/persons/' -d '{"name":"Martin", "age":25}'
{"_index":"test","_type":"persons","_id":"AVGMltVoWllYL3SlR_qb","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}

Here, the coercion seems to work.
But funnily, if I GET the entries, see what is returned:

curl -XGET http://192.168.56.102:9200/test/person/AVGMlpEeWllYL3SlR_qa?pretty
{
    "_id": "AVGMlpEeWllYL3SlR_qa", 
    "_index": "test", 
    "_source": {
        "age": "unknown", 
        "name": "Steven"
    }, 
    "_type": "person", 
    "_version": 1, 
    "found": true
}

curl -XGET http://192.168.56.102:9200/test/person/AVGMltVoWllYL3SlR_qb?pretty
{
    "_id": "AVGMltVoWllYL3SlR_qb", 
    "_index": "test", 
    "_source": {
        "age": 25, 
        "name": "Martin"
    }, 
    "_type": "person", 
    "_version": 1, 
    "found": true
}

How can 25 be an integer, when "age" is a string?:

 curl -XGET http://192.168.56.102:9200/test?pretty
"test": {
        "aliases": {}, 
        "mappings": {
            "person": {
                "properties": {
                    "age": {
                        "type": "string"
                    }, 
                    "name": {
                        "type": "string"
                    }
                }
            }
        },

(Nik Everett) #9

This certainly feels like a bug in the property creation. @jpountz, is this something you think you might have fixed in your recent changes to mappings? I don't believe so but you are the expert there.


(Adrien Grand) #10

OK I see the bug. To give some history, in 1.x dynamic updates were modifying mappings in-place but this caused issues when different nodes generated different mappings since they could not reach agreement anymore afterwards. This caused us to change the logic in 2.x: parsing a document now returns two things: a parsed document and an optional mapping update. When this mapping update is not null, it is first validated on the master node before the document is indexed. What's happening here is that parsing your document first generates a mapping update that tries to create a string field, but since this field is not added to the mapping it is not visible when trying to parse the 2nd document which generates a mapping update to create a string field. Elasticsearch tries to merge them before sending the mapping update to the master node but it fails since they are not compatible. I opened https://github.com/elastic/elasticsearch/issues/15377


(Benjamin Gathmann) #11

Hi Adrien,
Thank you for this explanation. However, I see two unexplained phenomena
here:

  1. The mapper exception
  2. The fact that an integer is stored as an integer in a string field
    (instead of being coerced to string)

Details see my examples.

Your explanation deals with 1. I believe.

What is the mystery behind 2.?

Yours
Benjamin
jpountz http://discuss.elastic.co/users/jpountz Adrien Grand
http://discuss.elastic.co/users/jpountz
December 10

OK I see the bug. To give some history, in 1.x dynamic updates were
modifying mappings in-place but this caused issues when different nodes
generated different mappings since they could not reach agreement anymore
afterwards. This caused us to change the logic in 2.x: parsing a document
now returns two things: a parsed document and an optional mapping update.
When this mapping update is not null, it is first validated on the master
node before the document is indexed. What's happening here is that parsing
your document first generates a mapping update that tries to create a
string field, but since this field is not added to the mapping it is not
visible when trying to parse the 2nd document which generates a mapping
update to create a string field. Elasticsearch tries to merge them before
sending the mapping update to the master node but it fails since they are
not compatible. I opened


https:/%20/github.com/elastic/elasticsearch/issues/15377


(Adrien Grand) #12

Regarding 2., are you confused because the _source document still shows the age field as an integer? Elasticsearch never modifies the source. What the coerce option does is that the field will be indexed and stored as the type that is defined in the mappings. However it does not change the representation of the field in the _source document.


(Benjamin Gathmann) #13

Yes, exactly. Coming from relational DBMSes, this is kind of hard to
understand. I guess I need to delve some more into the ultimate ES guide.
:wink:

Am Freitag, 11. Dezember 2015 schrieb Adrien Grand :


(Adrien Grand) #14

No worries!


(system) #15