Significant term aggregation with Snowball analyzer

pramod.kumar2 · December 14, 2018, 10:29am

Hi,

I am using elasticsearch snowball analyzer for product field in an index. I need to get significant terms from elasticsearch aggregation(significant terms aggregation), but the results are not correct. The problem is that the resultant terms I am getting are with not exact as in my resultant part.
For example - productDescription in hits are like -
"SWEET BISCUITS - MILK BIKIS MILK CREAM"
"BRITTANNIA PRODUCTS: MILK BIKIES CREAM 1 00GM X 100NOS"

and significant term I am getting is -

{
"key": "biki",
"doc_count": 4,
"score": 553.7252991452992,
"bg_count": 260
}

Please suggest how can I get correct results like ("BIKIES", "BIKIS")

Below is the query sample -
{
"_source": {
"include": [
"productDescription"
]
},
"query": {
"bool": {
"filter": [{
"query_string": {
"default_field": "productDescription.SnowField",
"default_operator": "AND",
"query": "(milk cream)"
}
},
{
"term": {
"isUnique": true
}
},
{
"range": {
"date": {
"gte": "2018-01-01",
"lte": "2018-12-31"
}
}
}
],
"must": ,
"must_not":
}
},
"sort": [{
"date": {
"order": "desc"
}
}],
"size": 500,
"aggs": {
"my_sample": {
"sampler": {
"shard_size": 20
},
"aggregations": {
"keywords": {
"significant_text": {
"field": "productDescription.SnowField",
"size": 10,
"filter_duplicate_text": true
}
}
}
}
}
}

Mark_Harwood · December 14, 2018, 10:46am

Use a field with an analyzer that doesn't stem or lowercase e.g. the "whitespace" analyzer

pramod.kumar2 · December 14, 2018, 11:35am

Hi Mark,

Thanks for the reply. The problem is not lowercase results, the problem is - I got "biki" from significant term aggregation while I need it as it is like "bikies" and "bikis" as you see it is in hits returned from . This was sample result, as I checked with different searches, I got many words which were mis-spelled(removed s/es i.e. without plural parts). But required is to get meaningful words(suggestions).

Mark_Harwood · December 14, 2018, 11:41am

That's what "stemming" does.

Mark_Harwood · December 14, 2018, 11:47am

Check out this blog which includes an example of taking potentially stemmed significant terms and using them in a terms query with a highlighter to show KWIC (Keywords In Context) examples of the discovered terms in text.
Note it talks about significant_terms rather than the new significant_text aggregation but the same principles still hold.

pramod.kumar2 · December 14, 2018, 11:52am

Hi Mark,

Yes I knew it. That is due to snowball analyzer as I mentioned above. I am using snowball in query as I need to include sound like words in results. And I also tried with removing SnowBall analyzer from aggregation and tried keyword analyzer as well in aggregation field but did not got exact results.

But is there any way I can get exact results like if any way if I need to reindex data with any other analyzer to get significant results or something else by which I can get aggregation results as they exists in productDescription field?

Mark_Harwood · December 14, 2018, 12:04pm

It depends.
If your docs were orders where you wanted to know "which products are typically also bought with pasta?" then you might use a keyword field and significant_terms because you'd be examining significant patterns in repeated orders for exactly the same product.
If your docs were products you'd (hopefully) only ever have exactly one unique product description so the keyword field would be of no use with any significance analysis (everything occurs once). If you were looking at some of the ingredients in the text of these descriptions (eg. common ingredients mentioned in high-fat products) then you might use an analyzed text field and significant_text. Maybe indexing with shingles would help too. Remember the indexed field you search on can be different (eg stemmed) from the indexed field you use for significant_text analysis (e.g. whitespace)

Nitesh_Kumar_dcpl · December 17, 2018, 10:39am

Hi Mark,

If i am using the significant term on multiple indexes, so how can we specify the missing terms

Mark_Harwood · December 17, 2018, 10:45am

Significant terms is a tool for discovering terms - I don't follow why you're asking a question about specifying them?

Nitesh_Kumar_dcpl · December 17, 2018, 11:30am

Hi Mark,

Let me explain you few things.. I have 2 indexes.. I created same alias name on these so that I can search on these at once. In one index, i have field name productDescription and in second index it is productDesc. So the issue in getting significant terms is that when I pass productDescription field name in aggregation, it says that - "Aggregation [keywords] cannot process field [productDescription.StandardField] since it is not present". So is there any way by which I can pass two fields in significant term aggregation or otherwise can ignore it anyhow(Like we pass "missing" property in terms aggregation, but that is not supported in significant terms aggregation.

Mark_Harwood · December 17, 2018, 11:35am

Ah. So missing "fields".
If the overall goal is to blend the term stats from 2 fields in 2 indices the answer is "no".
Generally, significant terms will work best on a single index and single shard since all of the stats are available in one place. If you're trying to use it to spot low-frequency terms (e.g. something that only occurs twice) in a distributed system that makes life hard because every single-occurrence term on a local shard (of which there are typically many) suddenly becomes a candidate for global consideration.

system · January 14, 2019, 11:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Significant Term aggregation Elasticsearch	9	665	July 6, 2017
Significant Terms "No Results Found"? Elasticsearch	2	1634	August 5, 2019
Significant terms aggregation with non tokenized text Elasticsearch	2	491	July 6, 2017
Efficient retrieval of stems mapped to original words Elasticsearch	3	1183	July 5, 2017
Significant Terms with multi word results Elasticsearch	2	420	September 10, 2021

Significant term aggregation with Snowball analyzer

Related topics