Hi,
first of all: I really love the new significant terms aggregation as well
as the cardinal count aggregation. Thanks a lot!
I have some detail questions:
What is bg_count (I assume background count) but what is the meaning of
it?
At first I thought the score values are between 0 and 1 but there are
much bigger values. Can anyone give me a rough explanation?
Cheers
Valentin
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0aa544b5-a2a4-40ae-986d-03955a27ea60%40googlegroups.com .
For more options, visit https://groups.google.com/d/optout .
Hi Valentin,
What is bg_count (I assume background count) but what is the meaning of
it?
The bg_count is the number of documents, which contain the term in the
whole index (not just in the search result).
Power insights and outcomes with the Elasticsearch Platform and AI. See into your data and find answers that matter with enterprise solutions designed to help you build, observe, and protect. Try Elasticsearch free today.
At first I thought the score values are between 0 and 1 but there are
much bigger values. Can anyone give me a rough explanation?
You can see the code of the computation here:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/aggregations/bucket/significant/InternalSignificantTerms.java#L94
This is a summarized version of the formula:
double subsetProb = #relative frequency in the search result#;
double supersetProb = #relative frequency in the whole index#;
double absoluteProbChange = subsetProb - supersetProb;
if (absoluteProbChange <= 0) {
return 0;
}
double relativeProbChange = (subsetProb / supersetProb);
return absoluteProbChange * relativeProbChange;
I guess in the future there will be support for other scorings like
mutual information, chi squared or information gain.
Best regards,
Hannes
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/534828FE.20806%40hkorte.com .
For more options, visit https://groups.google.com/d/optout .
Hi Hannes,
thanks for the info. Scoring like mutual information sound fun.
Best regards,
Valentin
On Friday, April 11, 2014 7:40:14 PM UTC+2, Hannes Korte wrote:
Hi Valentin,
What is bg_count (I assume background count) but what is the meaning
of
it?
The bg_count is the number of documents, which contain the term in the
whole index (not just in the search result).
Elasticsearch Platform — Find real-time answers at scale | Elastic
At first I thought the score values are between 0 and 1 but there are
much bigger values. Can anyone give me a rough explanation?
You can see the code of the computation here:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/aggregations/bucket/significant/InternalSignificantTerms.java#L94
This is a summarized version of the formula:
double subsetProb = #relative frequency in the search result#;
double supersetProb = #relative frequency in the whole index#;
double absoluteProbChange = subsetProb - supersetProb;
if (absoluteProbChange <= 0) {
return 0;
}
double relativeProbChange = (subsetProb / supersetProb);
return absoluteProbChange * relativeProbChange;
I guess in the future there will be support for other scorings like
mutual information, chi squared or information gain.
Best regards,
Hannes
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2490b9bd-4531-4964-9f21-6e18d2a92c7e%40googlegroups.com .
For more options, visit https://groups.google.com/d/optout .