Understanding fielddata mapping parameter

stravag · March 1, 2021, 12:58pm

I'm attempting to build a kind of "next word prediction" using elasticsearch. The goal is to have suggest search terms based on different fields of the index.

I found a solution based on the work done here Search like a Google with Elasticsearch. Autocomplete, Did you mean and search for items. – Volodymyr Bilyachat using aggregations on a "synthetic" suggestion field that uses a shingle filter.

This works to my expectations, but I had to enable fielddata: true on the suggestions field to get access to the parts created by the shingle filter and aggregating on them.

According to the documentation Text field type | Elasticsearch Reference [7.11] | Elastic this: "... load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory..."

Can someone help me quantify what that can mean in terms of memory usage?

Let's say I have an index with 1million documents, and the suggestion field contains english sentences averaging 20 words. My 2/4 shingle filter would then create 74 tokens for each of those sentences.

Now when I query and aggregate terms on that field with few restrictions, what get's loaded into memory when fielddata is enabled, what can help me quantify that?

{
	"size": 0,
    "aggs": {
      "suggestions": {
	    "terms": {
		  "field": "suggestions",
	      "include": "c.*"
    	}
	  }
	},
	"query": {
      "prefix": {
    	"suggestions": {
    		"value": "c"
    	}
      }
	}
}

system · March 29, 2021, 12:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fielddata cache shows nothing Elasticsearch	3	1039	March 14, 2017
Why I see fielddata when doc_value is enabled in Aggregations? Elasticsearch	3	592	December 26, 2018
Fielddata loaded for terms aggregation on not_analyzed string field Elasticsearch	1	475	January 9, 2019
Fielddata: use or not to use Elasticsearch	4	833	February 14, 2017
High fielddata usage on 2.3.3 Elasticsearch	3	770	July 5, 2017

Understanding fielddata mapping parameter

Related topics