ES5.0a1: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field

gquintana · April 13, 2016, 8:22pm

Hello,

I am just trying out Elasticsearch 5.0 Alpha 1, I can not create an index because type string is removed.

$ curl -XPUT "http://localhost:9200/monument?pretty" --data-binary @monument.settings.json
{
  "error" : {
    "root_cause" : [ {
      "type" : "mapper_parsing_exception",
      "reason" : "Failed to parse mapping [monument]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]"
    } ],
    "type" : "mapper_parsing_exception",
    "reason" : "Failed to parse mapping [monument]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [appellation_courante]"
    }
  },
  "status" : 400
}

This is a huge change, is it a removal or a deprecation? What's the difference between current string type and new text type?

nik9000 · April 13, 2016, 8:45pm

Yes, it is a huge change. text is much like what string was - it is tokenized and analyzed. keyword is still analyzed but isn't tokenized - or, rather, it is always tokenized as a single token. It is similar to not_analyzed from older versions.

We did the split because the two things turned out to be more different then they seem. For instance - keyword happily support doc_values, the on disk column wise storage used for aggregations (defaults to true). text still, sadly, only support fielddata, the in memory column wise storage used for aggregations (default to false because it likes to fill your heap).

There ought to be a blog post that'll explain it better than I can. I can't find it now so it probably doesn't exist yet. But it will exist soon. I promise.

jprante · April 13, 2016, 10:27pm

There was a discussion at https://github.com/elastic/elasticsearch/issues/11901

I slightly disagree, it's not a big change from the perspective of internal field processing.

The confusion is in ES < 5, where you have plenty of combinations to setup "text" fields:

analyzed strings
not_analyzed strings
keyword analyzer for one token generation
plus fiddling with norms, doc_values etc.

Now with ES 5, it is easier, especially for beginners:

text fields consist always of analyzed strings, not suitable for aggregations
keyword fields are single tokens, without norms, docvalued for aggregations

Note that Lucene is doing "the right thing" behind the scenes, where you previously had to study the effects of lots of knobs manually to turn on and off in your mapping setup.

gquintana · April 14, 2016, 6:29am

My question was more about the string/text difference than keyword/text. I regret this backward compatibility break. May I suggest to alias string=text (if possible)?

How are the existing indices (built with v2.x), with mappings containing string type, handling incoming documents? Does using text vs string type change the internal data structure ? Same for index templates, what if during migration I have a template containing a string type.

jprante · April 14, 2016, 8:39am

The new text is different from string. So string can not be aliased and is banned for good reasons, because nobody should continue to use the string type in expectation to make it work again somehow. The result is prone to be different and would lead to conflicts, annoyance, frustrations etc.

Yes, text is determined to work on a different internal data structure, it does no longer allow not_analyzed, no doc values e.g.

Mappings and index templates have to be migrated, yes. Re-indexing everything is not decent but unavoidable IMHO.

nik9000 · April 14, 2016, 2:26pm

I believe there is automatic migration for both. You can absolutely open a 2.x index in 5.0. It'd be crazy not to do.

gquintana · April 15, 2016, 7:01am

Why is this automatic migration not applied when manually creating an index (in my example, monument.settings.json contains a mapping used with 2.2)? For v5. 0, it could just raise a deprecation warning.

Thanks for your explanation though

jpountz · April 18, 2016, 1:48pm

5.0 should automatically upgrade simple string mapping definitions to text/keyword. Could you share the mapping of your appellation_courante field?

gquintana · April 18, 2016, 4:23pm

It's type string with analyzer french.

Gérald

jpountz · April 19, 2016, 4:02pm

I opened a PR: https://github.com/elastic/elasticsearch/pull/17861

gquintana · May 9, 2016, 6:15pm

Thanks for fixing that.
Just checked 5.0A2, it works smoothly now.

jpountz · May 9, 2016, 7:10pm

Awesome, thanks!

gquintana · May 11, 2016, 7:10am

Well I cried "victory" too early:

Here are some mapping tests with 5.0A2

This is OK
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string"
}
}
}
}
}

But this fails:
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string",
"include_in_all": false
}
}
}
}
}

And even if not really usefull, the following fails as well:
PUT /product
{
"mappings": {
"product": {
"properties": {
"code": {
"type": "string",
"null_value": "unknown"
}
}
}
}
}

This one is really strange:
PUT produits
{
"mappings": {
"produit": {
"properties": {
"nom": {
"type": "string"
},
"fournisseur": {
"type": "object",
"properties": {
"nom": {
"type": "string",
"index": "analyzed",
"null_value": "inconnu"
},
"pays": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}

Raises an error on the field "nom":
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [produit]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [produit]: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field [nom]"
}
},
"status": 400
}

But when you remove the "fournisseur" field (with the problematic null_value), there isn't any problem anymore:
PUT produits
{
"mappings": {
"produit": {
"properties": {
"nom": {
"type": "string"
}
}
}
}
}
is OK.

gquintana · May 11, 2016, 7:44am

The last problem comes from the fact that I have 2 "nom" fields: "nom" and "fournisseur.nom": my fault!
Yet the error message could have been clearer.

Topic		Replies	Views
Creating Index ES 5 error for type "text" Elasticsearch	9	3784	December 22, 2016
Converting from 2.x to 5.x: "type": "string", "index": "no" -- use type or keyword? Elasticsearch	4	646	March 7, 2017
The [string] field is deprecated Elasticsearch	3	2075	June 22, 2017
String Type in ES5.0 Elasticsearch	3	556	February 22, 2017
[ES 5.0] Mapping issues Elasticsearch	2	1022	December 8, 2016

ES5.0a1: The [string] type is removed in 5.0. You should now use either a [text] or [keyword] field instead for field

Related topics