How to clear cache of keyword field in elastic search?

Hi ,

I am facing cache issue with keyword field in Elastic. I have a document with keyword field which is indexed with some value example : abc later the same document is reindexed again with some value abcd, old value still present in aggregation. How to remove the old value abc with count as zero from aggregation.

Can someone help me how to clear aggregation cache?

I have used clear cache API on fields and on index as well which seems not working

Hi @Bhavyagc

You will need to show an example as that does not sound like correct behavior...

Perhaps the document is not really being updated, but adding another document.

Can you show some repeatable code / example?

1 Like

Hi @stephenb ,

Please find the below example and the exact issue I am facing....

I'm having some problems with aggregations returning old data. When I query directly using bool/must/term the data is not there - as it should. Only in aggregation results.

  1. I have a property called foo.
  2. Then I updated all the documents where foo had the value from x to y.

Querying directly for foo = x yields 0 results (Old value) and foo = y yields 10 results (New value)

When aggregating foo' I receive x in the foo property (withdoc_count=0) and y in the foo property (withdoc_count=10`, which seems correct) . why still x is appearing in foo property with zero doc count?

Elasticsearch version 8.15 running in Elastic Cloud.

Note: when I removed the index completely and populated the index again, everything was working as it should. It only appears to be happening on existing indices.

Hi @Bhavyagc

When I asked for a sample code this is something like I meant..

This code does not produce the result you are seeing so you will need to show us a simple example that shows your results.

Not to say what you are seeing is not valid you just need to show use what you are actually doing...

PUT myindex
{
  "mappings": {
    "properties": {
      "foo" : {"type": "keyword"}
    }
  }
}
  
PUT myindex/_doc/1
{
  "foo" : "x"
}

PUT myindex/_doc/2
{
  "foo" : "x"
}

PUT myindex/_doc/3
{
  "foo" : "x"
}

GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo"
      }
    }
  }
}


PUT myindex/_doc/1
{
  "foo" : "y"
}

PUT myindex/_doc/2
{
  "foo" : "y"
}

PUT myindex/_doc/3
{
  "foo" : "x"
}

GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo"
      }
    }
  }
}

PUT myindex/_doc/1
{
  "foo" : "y"
}

PUT myindex/_doc/2
{
  "foo" : "y"
}

PUT myindex/_doc/3
{
  "foo" : "y"
}

GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo"
      }
    }
  }
}

The results of the aggregation in order are as expected

  "aggregations": {
    "foo_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "x",
          "doc_count": 3
        }
      ]
    }
  }
}

  "aggregations": {
    "foo_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "y",
          "doc_count": 2
        },
        {
          "key": "x",
          "doc_count": 1
        }
      ]
    }
  }
}

  "aggregations": {
    "foo_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "y",
          "doc_count": 3
        }
      ]
    }
  }
}

BTW I also tried with the _update API
example

POST myindex/_update/1
{
  "script" : "ctx._source.foo = 'y'"
}

Can you try a _refresh after you have finished your updates?
You can look at Shard request cache settings page as well for addition setting

Thank you for your response.

Can you please try with min_doc_count in the query post updating all values from x to y ?

GET myindex/_search
{
  "size": 0, 

  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo",
        "min_doc_count": 0
      }
    }
  }
}

Below is the results of aggregation

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "foo_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "y",
          "doc_count": 3
        },
        {
          "key": "x",
          "doc_count": 0
        }
      ]
    }
  }
}

Not sure what you are doing

PUT myindex
{
  "mappings": {
    "properties": {
      "foo" : {"type": "keyword"}
    }
  }
}
  
PUT myindex/_doc/1
{
  "foo" : "x"
}

PUT myindex/_doc/2
{
  "foo" : "x"
}

PUT myindex/_doc/3
{
  "foo" : "x"
}

GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo",
        "min_doc_count": 0
      }
    }
  }
}


POST myindex/_update/1
{
  "script" : "ctx._source.foo = 'y'"
}

POST myindex/_update/2
{
  "script" : "ctx._source.foo = 'y'"
}

POST myindex/_update/3
{
  "script" : "ctx._source.foo = 'x'"
}

GET _cat/indices/myindex?v

GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo",
        "min_doc_count": 0
      }
    }
  }
}

POST myindex/_update/1
{
  "script" : "ctx._source.foo = 'y'"
}

POST myindex/_update/2
{
  "script" : "ctx._source.foo = 'y'"
}

POST myindex/_update/3
{
  "script" : "ctx._source.foo = 'y'"
}

GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo",
        "min_doc_count": 0
      }
    }
  }
}

Last search results in


{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "foo_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "y",
          "doc_count": 3
        }
      ]
    }
  }
}

Can you provide a repeatable example?

Please find the steps I followed and the query.

PUT myindex
{
  "mappings":{
    "properties": {
      "foo" : {"type": "keyword"}
    }
  }
}
  
PUT myindex/_doc/1
{
  "foo" : "x"
}

PUT myindex/_doc/2
{
  "foo" : "x"
}

PUT myindex/_doc/3
{
  "foo" : "x"
}

GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo",
        "min_doc_count": 0
      }
    }
  }
}
  
POST myindex/_update/1
{
  "script" : "ctx._source.foo = 'y'"
}

POST myindex/_update/2
{
  "script" : "ctx._source.foo = 'y'"
}

POST myindex/_update/3
{
  "script" : "ctx._source.foo = 'x'"
}


GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo",
        "min_doc_count": 0
      }
    }
  }
}


POST myindex/_update/3
{
  "script" : "ctx._source.foo = 'y'"
}

GET myindex/_search
{
  "size": 0, 
  "aggs": {
    "foo_agg": {
      "terms": {
        "field": "foo",
        "min_doc_count": 0
      }
    }
  }
}

Please find the below out put in the same order of search agg query...

output 1:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "foo_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "x",
          "doc_count": 3
        }
      ]
    }
  }

Output 2:

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "foo_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "y",
          "doc_count": 2
        },
        {
          "key": "x",
          "doc_count": 1
        }
      ]
    }
  }
}

Output 3:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "foo_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "y",
          "doc_count": 3
        },
        {
          **"key": "x",**
          **"doc_count": 0**
        }
      ]
    }
  }
}

Why are you setting min_doc_count?
If you don't want the value of 0 to come back, you probably need to remove this or set it to 1.

Otherwise, may be a forcemerge will clean this. Could you try?

1 Like