Why I see fielddata when doc_value is enabled in Aggregations?

Based on Elastic Documents, every type except text(an analyzed string) supports doc_values which I suppose when available, should completely omit fielddata in Aggregation.

However this not the case for me, whenever I do term aggregation based on a keyword or ip type I see they are loaded as fieldata, although this is not happening for other types (e.g session_id as long type in this case)

Is this the correct behavior? if true, how can I prevent fielddata creation?

I'm using elasticsearch 6.5 and This is my mapping

{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 0,
      "codec": "best_compression"
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "time": {
          "type": "date",
          "format": "epoch_millis"
        },
        "session_token": {
          "type": "keyword"
        },
        "session_ref": {
          "type": "keyword"
        },
        "session_id": {
          "type": "long"
        },
        "src": {
          "type": "ip"
        },
        "version": {
          "type": "byte"
        }
      }
    }
  }
}

this is a sample aggregation which causes fielddata to get loaded

GET test_ind/_search?size=0
  {
  "aggs" : {
    "by_token":{
      "terms":{ 
        "field": "token",
        "size": 100
      }
    }
  }
  }

and here is fielddata status after aggregation

"test_ind" : {
  "uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
  "primaries" : {
    "fielddata" : {
      "memory_size_in_bytes" : 1564696,
      "evictions" : 0,
      "fields" : {
        "session_ref" : {
          "memory_size_in_bytes" : 0
        },
        "session_token" : {
          "memory_size_in_bytes" : 1564696
        }
      }
    }
  },
  "total" : {
    "fielddata" : {
      "memory_size_in_bytes" : 1564696,
      "evictions" : 0,
      "fields" : {
        "session_ref" : {
          "memory_size_in_bytes" : 0
        },
        "session_token" : {
          "memory_size_in_bytes" : 1564696
        }
      }
    }
  }
}

and here is segments stat

"test_ind" : {
  "uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
  "primaries" : {
    "segments" : {
      "count" : 8,
      "memory_in_bytes" : 472939,
      "terms_memory_in_bytes" : 423365,
      "stored_fields_memory_in_bytes" : 3504,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 0,
      "points_memory_in_bytes" : 41598,
      "doc_values_memory_in_bytes" : 4472,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    }
  },
  "total" : {
    "segments" : {
      "count" : 8,
      "memory_in_bytes" : 472939,
      "terms_memory_in_bytes" : 423365,
      "stored_fields_memory_in_bytes" : 3504,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 0,
      "points_memory_in_bytes" : 41598,
      "doc_values_memory_in_bytes" : 4472,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    }
  }
}

See Global Ordinals discussion: Global ordinals performance and size on-heap
From there:

"They are monitored in the fielddata stats, we could have a dedicated section but if you don't have any text field that loads fielddata then the memory reported can be attributed to global ordinals entirely."

thanks for the clarification.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.