ES5.0a1: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field

Hello,

I am just trying out Elasticsearch 5.0 Alpha 1, I can not create an index because type string is removed.

$ curl -XPUT "http://localhost:9200/monument?pretty" --data-binary @monument.settings.json
{
  "error" : {
    "root_cause" : [ {
      "type" : "mapper_parsing_exception",
      "reason" : "Failed to parse mapping [monument]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]"
    } ],
    "type" : "mapper_parsing_exception",
    "reason" : "Failed to parse mapping [monument]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]"
    }
  },
  "status" : 400
}

This is a huge change, is it a removal or a deprecation? What's the difference between current string type and new text type?

Yes, it is a huge change. text is much like what string was - it is tokenized and analyzed. keyword is still analyzed but isn't tokenized - or, rather, it is always tokenized as a single token. It is similar to not_analyzed from older versions.

We did the split because the two things turned out to be more different then they seem. For instance - keyword happily support doc_values, the on disk column wise storage used for aggregations (defaults to true). text still, sadly, only support fielddata, the in memory column wise storage used for aggregations (default to false because it likes to fill your heap).

There ought to be a blog post that'll explain it better than I can. I can't find it now so it probably doesn't exist yet. But it will exist soon. I promise.

2 Likes

There was a discussion at https://github.com/elastic/elasticsearch/issues/11901

I slightly disagree, it's not a big change from the perspective of internal field processing.

The confusion is in ES < 5, where you have plenty of combinations to setup "text" fields:

  • analyzed strings
  • not_analyzed strings
  • keyword analyzer for one token generation
  • plus fiddling with norms, doc_values etc.

Now with ES 5, it is easier, especially for beginners:

  • text fields consist always of analyzed strings, not suitable for aggregations
  • keyword fields are single tokens, without norms, docvalued for aggregations

Note that Lucene is doing "the right thing" behind the scenes, where you previously had to study the effects of lots of knobs manually to turn on and off in your mapping setup.

1 Like

My question was more about the string/text difference than keyword/text. I regret this backward compatibility break. May I suggest to alias string=text (if possible)?

How are the existing indices (built with v2.x), with mappings containing string type, handling incoming documents? Does using text vs string type change the internal data structure ? Same for index templates, what if during migration I have a template containing a string type.

The new text is different from string. So string can not be aliased and is banned for good reasons, because nobody should continue to use the string type in expectation to make it work again somehow. The result is prone to be different and would lead to conflicts, annoyance, frustrations etc.

Yes, text is determined to work on a different internal data structure, it does no longer allow not_analyzed, no doc values e.g.

Mappings and index templates have to be migrated, yes. Re-indexing everything is not decent but unavoidable IMHO.

I believe there is automatic migration for both. You can absolutely open a 2.x index in 5.0. It'd be crazy not to do.

Why is this automatic migration not applied when manually creating an index (in my example, monument.settings.json contains a mapping used with 2.2)? For v5. 0, it could just raise a deprecation warning.

Thanks for your explanation though

5.0 should automatically upgrade simple string mapping definitions to text/keyword. Could you share the mapping of your appellation_courante field?

It's type string with analyzer french.

GĂ©rald

I opened a PR: https://github.com/elastic/elasticsearch/pull/17861

Thanks for fixing that.
Just checked 5.0A2, it works smoothly now.

Awesome, thanks!

Well I cried "victory" too early:

Here are some mapping tests with 5.0A2

This is OK
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string"
}
}
}
}
}

But this fails:
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string",
"include_in_all": false
}
}
}
}
}

And even if not really usefull, the following fails as well:
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string",
"null_value": "unknown"
}
}
}
}
}

This one is really strange:
PUT produits
{
"mappings": {
"produit": {
"properties": {
"nom": {
"type": "string"
},
"fournisseur": {
"type": "object",
"properties": {
"nom": {
"type": "string",
"index": "analyzed",
"null_value": "inconnu"
},
"pays": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}

Raises an error on the field "nom":
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [produit]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [produit]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]"
}
},
"status": 400
}

But when you remove the "fournisseur" field (with the problematic null_value), there isn't any problem anymore:
PUT produits
{
"mappings": {
"produit": {
"properties": {
"nom": {
"type": "string"
}
}
}
}
}
is OK.

The last problem comes from the fact that I have 2 "nom" fields: "nom" and "fournisseur.nom": my fault!
Yet the error message could have been clearer.