Solving aggregating/sorting on numerical fields

Hi there,
It's my first post here and I'm still a beginner regarding ES, so please feel free to help with guidelines I might have forgotten when posting this :slight_smile:

I took over a rather old project (late 2017) that was using ES 5.5 and was working fine at the time, and have to update its components (this is my first time using ES). ES is used there to handle indexing and searching documents, in what is basically a book collection; two indices handle searching into the "structure" (titles, chapters, etc.) and four other indices, more comprehensive, handle full-text indexing.

While fixing the breaking changes from 5.5 to 7.10 went smoothly up until now (including dividing mapping types into multiple indices), and while my indexing scripts run fine, I run into some issues when performing full-text search on my application.

I get the following errors, observed when looking at my server logs (Tomcat/Catalina):
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [ordnum] in order to load field data by uninverting the inverted index. Note that this can use significant memory.
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [typedoc] in order to load field data by uninverting the inverted index. Note that this can use significant memory.
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [ordnum] in order to load field data by uninverting the inverted index. Note that this can use significant memory.

ordnum is an ordering number, and typedoc a short string used as in internal "type" descriptor. Both are defined as such in indices (typedoc is only used in one index, ordnum is used in three indices with the exact same definition every time):

{
	"settings": {
		[...]
	},

	"mappings": {
		"properties": {
			"typedoc": {
				"type":"keyword",
				"index":"true"
			},
			[...]
			"ordnum": {
				"type":"long"
			},
			[...]
		}
	}
}

Note that both of these fields were previously set with index to not_analyzed. Since this has disappeared in more recent versions of ES, I changed to true for the keyword and removed it for the long. (I also tried changing the long to a keyword, but that didn't solve anything.)

While the entire query is too obtuse and "private" to disclose here, I can tell you that typedoc is aggregated as such (among numerous columns in the "aggregations" query field) : "typedoc": { "terms" : { "field" : "typedoc", "size" : 5 } },, while is used for sorting purposes in the sort field like this: "sort": [{"numvol": { "order": "asc" }}, {"ordnum": { "order": "asc" }}],.

While I do understand some of the problems here (e.g. ordnum is a numeric field and sorting is thus disabled), I don't really grasp some things at play here.

  1. I've seen many possibilities to solve these issues: change the numeric fields to keywords, use the "fielddata" attribute set to 'true' (but it's not available on numeric types though), use the "doc_values" attribute set to 'true', using multi-type fields... What would be the best solution here? What are the main differences between all of those?

  2. Why does aggregating on typedoc bring an error despite it being a keyword-type field?

Thanks in advance for your help!

Well, I somehow solved it myself. The key element is here:

While fixing the breaking changes from 5.5 to 7.10 went smoothly up until now (including dividing mapping types into multiple indices)

What happenned was that when splitting these unique indices that had multiple types, I did not notice and understand that there was a default mapping type, inherited by every subsequent mapping type. Thus, instead of putting each field from the default mapping type to each other mapping type, I treated it like a normal mapping type and put it in its own index. Which was a mistaked.
I deleted that index, and copied each of its field to every other index.

...That solved it, after reindexing everything!
I assume I got those errors either because of residual shards from previous mapping schemas (it's unlikely), or that it displayed a generic field error because ES couldn't find those fields indexes.

Hope that will help anyone in a similar situation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.