I'm currently getting huge fielddata stored in RAM when a user does an aggregation on the _id field. I'd like to prevent that from happening because it breaks the system down if someone does that. Is there any recommendation on how to do that?
Yes, the official documentation says the following about the _id field and aggregation and sorting:
The value of the
_idfield is also accessible in aggregations or for sorting, but doing so is discouraged as it requires to load a lot of data in memory. In case sorting or aggregating on the
_idfield is required, it is advised to duplicate the content of the
_idfield in another field that has
So you should copy the value of your _id fields to a new field with
doc_values: true in the index mapping.
You may also want to read this interesting blog post to understand fielddata and doc_values.
See also #49166 which will add an option to throw an exception rather than loading fielddata on the
@Bernt_Rostad The issue is that I'm using the _id field to prevent duplicates.. so it's necessary that I use it. I do have the _id in another field, but the issue is that some users still try to use _id either intentionally or by accident.
@DavidTurner -- That's exactly what I'm looking for. Surprised that it's an option that's only now going into the stack! Thanks