Significant_text sorting

slovan9 · June 19, 2019, 12:17pm

Hello,

Im new to Elasticsearch and I cannot solve following issue. I used sampler and significant_text aggregations to find most popular words in my dataset and it worked very well. However, I would like to order the created buckets by their doc_count in a descending (or asc) manner. And that is the issue, I’m not able to find proper solution to do it.

I tried “order” command but without success (and many more). Is it even possible to sort buckets when I used sampler and significant_text aggregations? E.g. sorting worked with term aggregation but not with significant_text. Any ideas? Thanks a lot in advance for any hand.

Mark_Harwood · June 19, 2019, 3:28pm

“Most popular” is not usually a desirable sort order for text because it would generally give you words like “the”, regardless of whatever your query was. That said, if you really want that behaviour you can use a custom scripted scoring heuristic to rank on either foreground or background count.

mr_searchng · June 20, 2019, 9:43am

Mark I have the same problem.
You share your proposed approach in code.
I have another question, kibana dasboard also has significant terms but not significant text. How do I make significant text graph in visualize dashboard.

Thanks

slovan9 · June 20, 2019, 12:20pm

Dear Mark,
Thanks a lot for prompt response. I would like to ask you If is possible to provide example code or just snippet of your proposed solution. Thanx in advance.

mr_searchng · June 20, 2019, 12:32pm

contentkibana

Order command not working.I also visualize how to add code on the dashboard to kibana.

Thanks.

Mark_Harwood · June 22, 2019, 10:04am

Don’t have access to a computer to give a full working snippet but the docs you need are here
You’d use a painless script that just returned the superset_freq for the background stats.
Kibana does not have support for significant_text currently - I expect the best thing to do is to open an issue on the kibana github issues asking for support to be added to something like the tag cloud visualization

mr_searchng · June 22, 2019, 6:14pm

"script_heuristic": {
          "script": {
        "lang": "painless",
        "source": "params._subset_freq/(params._superset_freq - params._subset_freq + 1)"
      }
        }

how to integrate this code into the aggs part ?

Other Question :
Is there no way to visualize significant text code in kiban? if possible, can you share images?

Thanks for your answers

mr_searchng · June 22, 2019, 8:08pm

hey Mark by the way, Can we visualize significant text data in datatable ?
that's enough, I don't need grafic.

Good Work

Mark_Harwood · June 24, 2019, 8:24am

Here's a full reproduction:

GET signalmediaexact/_search?q=h5n1
{
  "size": 0,
  "aggs": {
	"sample": {
	  "sampler": {
		"shard_size": 1000
	  },
	  "aggs": {
		"keywords": {
		  "significant_text": {
			"field": "content",
			"script_heuristic": {
			  "script": {
				"lang": "painless",
				"source": "params._superset_freq + 0f"
			  }
			}
		  }
		}
	  }
	}
  }
}

I added zero to get the script's numeric casting to work properly.
As predicted - the most popular words in my sample are dull eg. the, and, of etc

I suspect the answer is "no" - best to ask in the Kibana forum.

slovan9 · June 24, 2019, 9:19am

Thanks a lot Mark for your time, help and code. It works properly now.

mr_searchng · June 25, 2019, 6:59am

doc_count is not ranked.

Thanks.

Mark_Harwood · June 25, 2019, 7:40am

Maybe you’re looking to use subset frequency rather than superset in the script?

mr_searchng · June 25, 2019, 8:44am

i want it sorted by doc_count.

Mark_Harwood · June 25, 2019, 8:49am

Which? Foreground or background?
"_subset_freq" is the former and "_superset_freq" is the latter.

mr_searchng · June 25, 2019, 9:00am

It's_subset_freg working. And it would be perfect if we visualize it in Kiban.

Thanks.

system · July 23, 2019, 9:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Significant text Kibana	5	320	July 26, 2019
Ordering Significant Terms aggregation? Elasticsearch	5	1024	July 5, 2017
Significant text visualize in kibana Kibana	12	1005	August 27, 2019
Alphabetical sorting of numeric labels not working Kibana	4	1024	September 18, 2020
Sorting issue on keyword Kibana	11	1905	June 12, 2020

Significant_text sorting

Related topics