How to show unique value of a field(database column) without fielddata

Pinky_Agarwal · November 18, 2019, 6:46pm

I have requirement to get unique values of a text fields .

For example i have 1 field having values : Jira Software , Kibana,Bitbucket,Jira software,Solr,Jira software .

As an output I want to get the :--- Jira software , Kibana,Bitbucket,Solr.

I have around 10-12 text fields , So I am setting fielddata =true and then using term aggregation to get unique values of the text fields Is this the right approach ? Or there can be other suggestable approaches ?

_client.CreateIndex("test", c => c
.Mappings(m => m.Map(mm => mm.AutoMap()
.Properties(props => props.Text(t => t.Name(n => n.Global_Name).Analyzer("keyword").Fielddata(true)));

var results = await _elasticClient.SearchAsync< TestOutput >(sr => sr.Aggregations(a => a
.Terms("distinct_Product", t => t
.Field(f => f.Global_Name)
)
));
Please help me to figure out the right way to do this.

Glen_Smith · November 18, 2019, 7:35pm

Yes, a terms aggregation is the way to get unique field values. It is not necessary to enable fielddata on the field to do this, only to use the keyword analyzer.

Pinky_Agarwal · November 19, 2019, 8:25am

without field data i am getting error - # Response:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [global_Name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"buyerdiscovery","node":"uVnd59kuSnWtrGTKqj2bew","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [global_Name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}]},"status":400}

Pinky_Agarwal · November 19, 2019, 8:32am

I just want to confirm to get the unique values enabling fielddata on text fields is the right approach ? As it consumes lots of memory. So is there any other suggestable approach?

1 approach : Can I first get the list of all values of a field using match query and then removed duplicate values from this list .

code snippet
var results = await _elasticClient.SearchAsync(sr => sr.Source(so => so
.Includes(f => f
.Field(ff => ff.Global_Name)
));
then filter out the unique values of global_name
But i am confused which approach is best ?

Glen_Smith · November 20, 2019, 7:32pm

Yes, in order to execute a terms agg on a field mapped as text, you must enable fielddata on that field. And, yes, that practice is discouraged due to concerns for resource usage and performance impact. In particular, this results in increased usage of fielddata cache, and, hence, JVM heap.

The application of a terms aggregation to an analyzed text is often done in error. For example, a field, city, indexed with the standard analyzer will result in multiple tokens emitted for cities with multi-word names, and a terms agg on such a field will result buckets for each of those words, like "New" and "York".

That's the purpose of the keyword type: a type that has default analysis of emitted a single token, so that e.g. a terms aggregation will return a bucket of "New York".

If you do have a scenario where a field has multi-word values and you intend for those words to be bucketed individually, then enabling fielddata on the field is the logical design; resource requirements will be greater.

If you have a scenario where a field has multi-word values and you want searches to work easily against those words (e.g. need to have documents with city = "New York", "New Rochelle", etc. by searching for "New", but you want terms agg buckets to only have "New York" and "New Rochelle", but not "New", "York", and "Rochelle", then the standard practice is to map "city" as a text field, with "raw" as a multi-field, with "raw" mapped as keyword, and use city.raw in your terms agg.

Pinky_Agarwal · November 26, 2019, 2:08am

Thanks Glen_Smith , for the reply . It is working fine with the approach you suggested, I have mapped fields as a keyword and using in term aggregation.
This code snippet is working for me.
_client.CreateIndex("test", c => c
.Mappings(m => m.Map(mm => mm.AutoMap()
.Properties(props => props.Keyword(t => t.Name(n => n.Global_Name)))

system · December 24, 2019, 2:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Get distinct values from "text field" without remapping Elasticsearch	6	1088	August 12, 2022
How to retrieve all unique values from a given field Elastic Tips and Common Fixes elasticsearch	1	8813	November 4, 2022
Aggregations on fields indexed as text (Almost) painless Elasticsearch	1	482	January 18, 2019
How to get all unique values of a field for a single index? Elasticsearch	1	1558	February 14, 2020
Getting distinct field values without using Keyword analyzer Elasticsearch	1	314	July 6, 2017

How to show unique value of a field(database column) without fielddata

Related topics