Significant_text sorting

Hello,

Im new to Elasticsearch and I cannot solve following issue. I used sampler and significant_text aggregations to find most popular words in my dataset and it worked very well. However, I would like to order the created buckets by their doc_count in a descending (or asc) manner. And that is the issue, I’m not able to find proper solution to do it.

I tried “order” command but without success (and many more). Is it even possible to sort buckets when I used sampler and significant_text aggregations? E.g. sorting worked with term aggregation but not with significant_text. Any ideas? Thanks a lot in advance for any hand.

“Most popular” is not usually a desirable sort order for text because it would generally give you words like “the”, regardless of whatever your query was. That said, if you really want that behaviour you can use a custom scripted scoring heuristic to rank on either foreground or background count.

Mark I have the same problem.
You share your proposed approach in code.
I have another question, kibana dasboard also has significant terms but not significant text. How do I make significant text graph in visualize dashboard.

Thanks

Dear Mark,
Thanks a lot for prompt response. I would like to ask you If is possible to provide example code or just snippet of your proposed solution. Thanx in advance.

contentkibana

Order command not working.I also visualize how to add code on the dashboard to kibana.

Thanks.

Don’t have access to a computer to give a full working snippet but the docs you need are here
You’d use a painless script that just returned the superset_freq for the background stats.
Kibana does not have support for significant_text currently - I expect the best thing to do is to open an issue on the kibana github issues asking for support to be added to something like the tag cloud visualization

"script_heuristic": {
          "script": {
        "lang": "painless",
        "source": "params._subset_freq/(params._superset_freq - params._subset_freq + 1)"
      }
        }

how to integrate this code into the aggs part ?

Other Question :
Is there no way to visualize significant text code in kiban? if possible, can you share images?

Thanks for your answers

hey Mark by the way, Can we visualize significant text data in datatable ?
that's enough, I don't need grafic.

Good Work

Here's a full reproduction:

GET signalmediaexact/_search?q=h5n1
{
  "size": 0,
  "aggs": {
	"sample": {
	  "sampler": {
		"shard_size": 1000
	  },
	  "aggs": {
		"keywords": {
		  "significant_text": {
			"field": "content",
			"script_heuristic": {
			  "script": {
				"lang": "painless",
				"source": "params._superset_freq + 0f"
			  }
			}
		  }
		}
	  }
	}
  }
}

I added zero to get the script's numeric casting to work properly.
As predicted - the most popular words in my sample are dull eg. the, and, of etc

I suspect the answer is "no" - best to ask in the Kibana forum.

Thanks a lot Mark for your time, help and code. It works properly now.

1 Like

doc_count is not ranked.

Thanks.

Maybe you’re looking to use subset frequency rather than superset in the script?

i want it sorted by doc_count.

Which? Foreground or background?
"_subset_freq" is the former and "_superset_freq" is the latter.

It's_subset_freg working. And it would be perfect if we visualize it in Kiban.

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.