Understanding doc_values

Hi all,

I read the documentation about doc_values and I'm trying to apply it in es 2.4.5

In my understanding I can enable doc_values and disable fielddata on a not_analyzed string and be able to do aggregations on it.

So I created a template to use doc_values and not fielddata

POST _template/test
{
"order": 0,
"template": "test",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"data": {
"properties": {
"name": {
"type": "string",
"fielddata": {
"format": "disabled"
},
"index": "not_analyzed",
"doc_values": true
}
}
}
}
}

Then I indexed a document in a new index matching the pattern

POST test/data/1
{
"name":"The Undertaker"
}

I can search that field

GET test/_search
{
"query": {
"match_all": {}
}
}

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "data",
"_id": "1",
"_score": 1,
"_source": {
"name": "The Undertaker"
}
}
]
}
}

But I can't do aggregations

GET test/_search
{
"size": 20,
"aggs": {
"nameagg": {
"terms": {
"field": "name",
"size": 10
}
}
}
}

{
"error": {
"root_cause": [
{
"type": "illegal_state_exception",
"reason": "Field data loading is forbidden on [name]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query_fetch",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "test",
"node": "1smXs7ldTr67Vuwu-GKagQ",
"reason": {
"type": "illegal_state_exception",
"reason": "Field data loading is forbidden on [name]"
}
}
]
},
"status": 500
}

I would expect to be able to aggregate on this field, at least this is what I understood from the docs.
Why is it complaining about fielddata?
Anyone can help me to understand this?

Thanks in advance

This does look like an issue, the syntax looks correct for disabling fielddata and enabling doc_values, but it seems to thing fielddata is enabled regardless.

I did test and since doc_values are the default for not_analyzed fields in 2.4.5, you can use:

POST _template/test
{
  "order": 0,
  "template": "test",
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "data": {
      "properties": {
        "name": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

And then your example works

1 Like

Ok so it means:

  • doc_values are enabled by default for not_analyzed strings
  • indexing my document i will have doc_values and aggs will use it
  • fielddata is not disabled, but since my is a not_analyzed string it will never use ti (instead uses doc_values)

am I correct?
how can I confirm this is an issue? should I open one on github?

Fielddata is still going to be disabled in this case. Doc values will be used for sorting and aggregations.

This is an issue (I was able to reproduce it), so yes, please do open an issue on github.

1 Like

opened issue #25484

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.