Elasticsearch phraze term frequency .tf() containing multiple words

valerij_vasilcenko · October 28, 2014, 9:36am

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vineeth_mohan_2 · October 28, 2014, 9:59am

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, valerij.vasilcenko@googlemail.com wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mjK%3DbgdSEZvrsfz5d_HnN8BTrJ5d9O4yAHQuOODE4YWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

barry · October 29, 2014, 8:45am

You can also look at developing a custom analyzer so that your phrase is
not broken up at white space.

Selecting the correct combination of char filters and tokenizers will
retain phrases.

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, <valerij.v...@googlemail.com
<javascript:>> wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e5795d8-c5ec-4a18-a356-ccc4a7e13e43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

barry · October 29, 2014, 9:15am

You can also look at developing a custom analyzer so that your phrase is
not broken up at white space when indexed.

Selecting the correct combination of char filters and tokenizers will
retain phrases.

For example, using the whitespace analyzer will separate on whitespace:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=whitespace' -d
'foo bar baz'
{
"tokens" : [ {
"token" : "foo",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "bar",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 2
}, {
"token" : "baz",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 3
} ]
}

However, using the keyword analyzer will retain the entire phrase:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=keyword' -d 'foo
bAr baZ'
{
"tokens" : [ {
"token" : "foo bAr baZ",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 1
} ]
}

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, <valerij.v...@googlemail.com
<javascript:>> wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87fbc699-ade2-489f-b715-a987066d6cc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Phrase frequency in a document and in the whole collection Elasticsearch	4	1563	July 5, 2017
Score based on term frequency only Elasticsearch	5	429	July 6, 2017
Score based on phrase frequency only Elasticsearch	1	629	July 6, 2017
Get the number of occurrences of one of matched keywords Elastic Search	1	47	February 13, 2025
Common terms query with a mix of phrases & single word terms Elasticsearch	7	490	July 6, 2017

Elasticsearch phraze term frequency .tf() containing multiple words

Related topics