Calculation of whymatch in elasticsearch


(usha2626) #1

Hi,

How do we implement whymatch concept in elasticsearch by finding the total
number of fields in which the search term occurs and the frequency of that
search term??

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/72b99acc-bd1e-4954-bc52-f09971397daf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Britta Weber) #2

If you know the fields that are contained in the document, you could
use a function_score query . For counting the number of fields a word
is contained in, you can use function_score with a boost_factor like
this:

{
   "query": {
      "function_score": {
         "functions": [
            {
               "filter": {
                  "term": {
                     "field1": "searchterm"
                  }
               },
               "boost_factor": 1
            },
            {
               "filter": {
                  "term": {
                     "field2": "searchterm"
                  }
               },
               "boost_factor": 1
            },
            .... (here be more filters)
         ],
         "boost_mode": "replace",
         "score_mode": "sum"
      }
   }
}

This will add 1 to the score for each field (field1, field2,... ) that
has the term "searchterm" and the final score for each document will
be the number of fields in the document containing the term. Is this
what you want?

For getting the term frequencies, you can checkout text scoring in scripts here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

See also this thread:
https://groups.google.com/forum/#!msg/elasticsearch/9fOEN1uArIY/7bVZP22zYg8J

Cheers,
Britta

On Fri, Feb 7, 2014 at 7:35 AM, usha2626@gmail.com wrote:

Hi,

How do we implement whymatch concept in elasticsearch by finding the total
number of fields in which the search term occurs and the frequency of that
search term??

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/72b99acc-bd1e-4954-bc52-f09971397daf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALhJbBji5u0oQxJB3Te-k4wKZ108pbn4an5kk%2BDDbjxB4%2BWJnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #3

I was thinking of putting together a simple "highlighter" that just returns
if a field contains a match or not. This sounds like a nice logic
extension to that. It probably wouldn't actually be a "highlighter" but I
imagine it'd run during the highlight phase and function similarly. It'd
need something like the highlight_query, for example, to function
properly. Anyway, I believe the short answer is, if you are looking for
something specific, file an issue on github.

Nik

On Fri, Feb 7, 2014 at 8:28 AM, Britta Weber <britta.weber@elasticsearch.com

wrote:

If you know the fields that are contained in the document, you could
use a function_score query . For counting the number of fields a word
is contained in, you can use function_score with a boost_factor like
this:

{
   "query": {
      "function_score": {
         "functions": [
            {
               "filter": {
                  "term": {
                     "field1": "searchterm"
                  }
               },
               "boost_factor": 1
            },
            {
               "filter": {
                  "term": {
                     "field2": "searchterm"
                  }
               },
               "boost_factor": 1
            },
            .... (here be more filters)
         ],
         "boost_mode": "replace",
         "score_mode": "sum"
      }
   }
}

This will add 1 to the score for each field (field1, field2,... ) that
has the term "searchterm" and the final score for each document will
be the number of fields in the document containing the term. Is this
what you want?

For getting the term frequencies, you can checkout text scoring in scripts
here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

See also this thread:

https://groups.google.com/forum/#!msg/elasticsearch/9fOEN1uArIY/7bVZP22zYg8J

Cheers,
Britta

On Fri, Feb 7, 2014 at 7:35 AM, usha2626@gmail.com wrote:

Hi,

How do we implement whymatch concept in elasticsearch by finding the
total
number of fields in which the search term occurs and the frequency of
that
search term??

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/72b99acc-bd1e-4954-bc52-f09971397daf%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALhJbBji5u0oQxJB3Te-k4wKZ108pbn4an5kk%2BDDbjxB4%2BWJnQ%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06gQg-RDGHaD4ScWzLRshUQ6NBybHfTnD1ufDUUH_CMA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Britta Weber) #4

Hi Nik,

would a script field also work for that? Something like:

{
"script_fields": {
"field_match1": {
"script": "if(_index['field1']['searchterm'].tf() > 0){
return 1;} else {return 0;}"
},
"field_match2": {
"script": "if(_index['field2']['searchterm'].tf() > 0){
return 1;} else {return 0;}"
},
....
}
}

Or did I get that wrong?

On Fri, Feb 7, 2014 at 2:36 PM, Nikolas Everett nik9000@gmail.com wrote:

I was thinking of putting together a simple "highlighter" that just returns
if a field contains a match or not. This sounds like a nice logic extension
to that. It probably wouldn't actually be a "highlighter" but I imagine
it'd run during the highlight phase and function similarly. It'd need
something like the highlight_query, for example, to function properly.
Anyway, I believe the short answer is, if you are looking for something
specific, file an issue on github.

Nik

On Fri, Feb 7, 2014 at 8:28 AM, Britta Weber
britta.weber@elasticsearch.com wrote:

If you know the fields that are contained in the document, you could
use a function_score query . For counting the number of fields a word
is contained in, you can use function_score with a boost_factor like
this:

{
   "query": {
      "function_score": {
         "functions": [
            {
               "filter": {
                  "term": {
                     "field1": "searchterm"
                  }
               },
               "boost_factor": 1
            },
            {
               "filter": {
                  "term": {
                     "field2": "searchterm"
                  }
               },
               "boost_factor": 1
            },
            .... (here be more filters)
         ],
         "boost_mode": "replace",
         "score_mode": "sum"
      }
   }
}

This will add 1 to the score for each field (field1, field2,... ) that
has the term "searchterm" and the final score for each document will
be the number of fields in the document containing the term. Is this
what you want?

For getting the term frequencies, you can checkout text scoring in scripts
here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

See also this thread:

https://groups.google.com/forum/#!msg/elasticsearch/9fOEN1uArIY/7bVZP22zYg8J

Cheers,
Britta

On Fri, Feb 7, 2014 at 7:35 AM, usha2626@gmail.com wrote:

Hi,

How do we implement whymatch concept in elasticsearch by finding the
total
number of fields in which the search term occurs and the frequency of
that
search term??

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/72b99acc-bd1e-4954-bc52-f09971397daf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALhJbBji5u0oQxJB3Te-k4wKZ108pbn4an5kk%2BDDbjxB4%2BWJnQ%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06gQg-RDGHaD4ScWzLRshUQ6NBybHfTnD1ufDUUH_CMA%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALhJbBhGmNn2EQWuxPeoAq2Do%3DgoZSZdYKTDLy2FkW1y1YkZZg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Randall McRee) #5

Use the explain feature! has all of that information.

On Thu, Feb 6, 2014 at 10:35 PM, usha2626@gmail.com wrote:

Hi,

How do we implement whymatch concept in elasticsearch by finding the total
number of fields in which the search term occurs and the frequency of that
search term??

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/72b99acc-bd1e-4954-bc52-f09971397daf%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFjHw36Y-p-NdtF6MdX-kMUPGvZtvBy2_V-pzyMJuJTxftntMA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Navneet Mathpal) #6

Hi Britta Weber ,

I wanted to extract the total number of times a search term occurs..in a
particular field..
Even if the search term is a phrase,the query should return the total
number of times that phrase
occurs and also the frequency of individual term of that phrase in each
field.

Thanks
Navneet Mathpal

On Friday, 7 February 2014 12:05:33 UTC+5:30, usha...@gmail.com wrote:

Hi,

How do we implement whymatch concept in elasticsearch by finding the total
number of fields in which the search term occurs and the frequency of that
search term??

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14c0ce29-5e02-462b-802f-8d131b371b1f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Phrase frequency in a document and in the whole collection
(system) #7