How to show unique value of a field(database column) without fielddata

I have requirement to get unique values of a text fields .

For example i have 1 field having values : Jira Software , Kibana,Bitbucket,Jira software,Solr,Jira software .

As an output I want to get the :--- Jira software , Kibana,Bitbucket,Solr.

I have around 10-12 text fields , So I am setting fielddata =true and then using term aggregation to get unique values of the text fields Is this the right approach ? Or there can be other suggestable approaches ?

_client.CreateIndex("test", c => c
.Mappings(m => m.Map(mm => mm.AutoMap()
.Properties(props => props.Text(t => t.Name(n => n.Global_Name).Analyzer("keyword").Fielddata(true)));

var results = await _elasticClient.SearchAsync< TestOutput >(sr => sr.Aggregations(a => a
.Terms("distinct_Product", t => t
.Field(f => f.Global_Name)
)
));
Please help me to figure out the right way to do this.

Yes, a terms aggregation is the way to get unique field values. It is not necessary to enable fielddata on the field to do this, only to use the keyword analyzer.

without field data i am getting error - # Response:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [global_Name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"buyerdiscovery","node":"uVnd59kuSnWtrGTKqj2bew","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [global_Name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}]},"status":400}

I just want to confirm to get the unique values enabling fielddata on text fields is the right approach ? As it consumes lots of memory. So is there any other suggestable approach?

1 approach : Can I first get the list of all values of a field using match query and then removed duplicate values from this list .

code snippet
var results = await _elasticClient.SearchAsync(sr => sr.Source(so => so
.Includes(f => f
.Field(ff => ff.Global_Name)
));
then filter out the unique values of global_name
But i am confused which approach is best ?

Yes, in order to execute a terms agg on a field mapped as text, you must enable fielddata on that field. And, yes, that practice is discouraged due to concerns for resource usage and performance impact. In particular, this results in increased usage of fielddata cache, and, hence, JVM heap.

The application of a terms aggregation to an analyzed text is often done in error. For example, a field, city, indexed with the standard analyzer will result in multiple tokens emitted for cities with multi-word names, and a terms agg on such a field will result buckets for each of those words, like "New" and "York".

That's the purpose of the keyword type: a type that has default analysis of emitted a single token, so that e.g. a terms aggregation will return a bucket of "New York".

If you do have a scenario where a field has multi-word values and you intend for those words to be bucketed individually, then enabling fielddata on the field is the logical design; resource requirements will be greater.

If you have a scenario where a field has multi-word values and you want searches to work easily against those words (e.g. need to have documents with city = "New York", "New Rochelle", etc. by searching for "New", but you want terms agg buckets to only have "New York" and "New Rochelle", but not "New", "York", and "Rochelle", then the standard practice is to map "city" as a text field, with "raw" as a multi-field, with "raw" mapped as keyword, and use city.raw in your terms agg.

1 Like

Thanks Glen_Smith , for the reply . It is working fine with the approach you suggested, I have mapped fields as a keyword and using in term aggregation.
This code snippet is working for me.
_client.CreateIndex("test", c => c
.Mappings(m => m.Map(mm => mm.AutoMap()
.Properties(props => props.Keyword(t => t.Name(n => n.Global_Name)))

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.