I am just trying out Elasticsearch 5.0 Alpha 1, I can not create an index because type string is removed.
$ curl -XPUT "http://localhost:9200/monument?pretty" --data-binary @monument.settings.json
{
"error" : {
"root_cause" : [ {
"type" : "mapper_parsing_exception",
"reason" : "Failed to parse mapping [monument]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]"
} ],
"type" : "mapper_parsing_exception",
"reason" : "Failed to parse mapping [monument]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]"
}
},
"status" : 400
}
This is a huge change, is it a removal or a deprecation? What's the difference between current string type and new text type?
Yes, it is a huge change. text is much like what string was - it is tokenized and analyzed. keyword is still analyzed but isn't tokenized - or, rather, it is always tokenized as a single token. It is similar to not_analyzed from older versions.
We did the split because the two things turned out to be more different then they seem. For instance - keyword happily support doc_values, the on disk column wise storage used for aggregations (defaults to true). text still, sadly, only support fielddata, the in memory column wise storage used for aggregations (default to false because it likes to fill your heap).
There ought to be a blog post that'll explain it better than I can. I can't find it now so it probably doesn't exist yet. But it will exist soon. I promise.
I slightly disagree, it's not a big change from the perspective of internal field processing.
The confusion is in ES < 5, where you have plenty of combinations to setup "text" fields:
analyzed strings
not_analyzed strings
keyword analyzer for one token generation
plus fiddling with norms, doc_values etc.
Now with ES 5, it is easier, especially for beginners:
text fields consist always of analyzed strings, not suitable for aggregations
keyword fields are single tokens, without norms, docvalued for aggregations
Note that Lucene is doing "the right thing" behind the scenes, where you previously had to study the effects of lots of knobs manually to turn on and off in your mapping setup.
My question was more about the string/text difference than keyword/text. I regret this backward compatibility break. May I suggest to alias string=text (if possible)?
How are the existing indices (built with v2.x), with mappings containing string type, handling incoming documents? Does using text vs string type change the internal data structure ? Same for index templates, what if during migration I have a template containing a string type.
The new text is different from string. So string can not be aliased and is banned for good reasons, because nobody should continue to use the string type in expectation to make it work again somehow. The result is prone to be different and would lead to conflicts, annoyance, frustrations etc.
Yes, text is determined to work on a different internal data structure, it does no longer allow not_analyzed, no doc values e.g.
Mappings and index templates have to be migrated, yes. Re-indexing everything is not decent but unavoidable IMHO.
Why is this automatic migration not applied when manually creating an index (in my example, monument.settings.json contains a mapping used with 2.2)? For v5. 0, it could just raise a deprecation warning.
This is OK
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string"
}
}
}
}
}
But this fails:
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string",
"include_in_all": false
}
}
}
}
}
And even if not really usefull, the following fails as well:
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string",
"null_value": "unknown"
}
}
}
}
}
Raises an error on the field "nom":
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [produit]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [produit]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]"
}
},
"status": 400
}
But when you remove the "fournisseur" field (with the problematic null_value), there isn't any problem anymore:
PUT produits
{
"mappings": {
"produit": {
"properties": {
"nom": {
"type": "string"
}
}
}
}
}
is OK.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.