Thanks!
Comments and more questions inline.
On Tue, Apr 10, 2012 at 3:29 PM, Igor Motov imotov@gmail.com wrote:
By default, elasticsearch tries to deduce field types from filed values.
If you will check mappings after the first PUT request, you will see
something like this:$ curl 'http://localhost:9200/test/_mapping?pretty=true'
{
"test" : {
"tweet" : {
"properties" : {
* "content" : {*
"dynamic" : "true",*
"properties" : {*
"text" : { "type" : "string"},*
"title" : { "type" : "string"}*
}}* }, "contentType" : {"type" : "string"}, "user" : {"type" : "string"} }
}
}As you can see from this mapping, elasticsearch is now treating content as
an object type field (
Elasticsearch Platform — Find real-time answers at scale | Elastic)
and it will be expecting to see objects in this field until index is
deleted. If you ran the second requests first, it would have expected to
see strings there and failed on object.
OK I understand the limitation now. The doc you referenced makes clear that
'once a field has been added, its type can not change'.
There are a couple of ways around it. You can use different field names for
different types:
curl -XPUT 'http://localhost:9200/test/tweet/1' -d '{ "user": "jane doe",
"contentType": "article", "content*_article*": { "title": "some news",
"text": "blah blah" } }'
curl -XPUT 'http://localhost:9200/test/tweet/2' -d '{ "user": "john doe",
"contentType": "url", "content*_url*": "http://example.com/foo" }'
curl -XPUT 'http://localhost:9200/test/tweet/3' -d '{ "user": "john doe",
"contentType": "number", "content*_number*": 123 }'Or you can assign different elasticsearch types to records with different
content types, (different elasticsearch types can have different mappings).curl -XPUT 'http://localhost:9200/test/*article*/1' -d '{ "user": "jane
doe", "contentType": "article", "content": { "title": "some news", "text":
"blah blah" } }'
curl -XPUT 'http://localhost:9200/test/*url*/2' -d '{ "user": "john doe",
"contentType": "url", "content": "http://example.com/foo" }'
curl -XPUT 'http://localhost:9200/test/*number*/3' -d '{ "user": "john
doe", "contentType": "number", "content": 123 }'
Unfortunately, I can't always predict the schema of the documents to index.
Some follow a fixed schema which will never change, some follow a fixed
schema with fields that can take values of different types (as Javascript
and JSON allow that, as well as other programming languages like Java for
example using inheritance), others follow schemas that evolve over time.
I think my use case is a pretty common use case these days: semi-structured
JSON documents with open or evolving schemas.
Also, after a few more tests, I bumped into another serious problem with
arrays. For example a single PUT like this:
curl -XPUT 'http://localhost:9200/test3/mixedarray/1' -d '{ "array": [123, "
http://www.example.com/whatever"] }'
fails with:
{"error":"MapperParsingException[Failed to parse [array]]; nested:
NumberFormatException[For input string: "http://www.example.com/whatever\"];
","status":400}
My documents are simple valid JSON, so I'm surprised to bump into these
schema mapping problems after reading on the project home page that
elasticsearch was 'schema-free & document oriented'
Any thoughts on how to fix this? I'd be happy to help and contribute a
patch if you give me a few pointers and some initial ideas on how to
approach this.
Thanks!
- Jean-Sebastien