Elasticsearch phraze term frequency .tf() containing multiple words

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, valerij.vasilcenko@googlemail.com wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mjK%3DbgdSEZvrsfz5d_HnN8BTrJ5d9O4yAHQuOODE4YWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You can also look at developing a custom analyzer so that your phrase is
not broken up at white space.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html

Selecting the correct combination of char filters and tokenizers will
retain phrases.

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, <valerij.v...@googlemail.com
<javascript:>> wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e5795d8-c5ec-4a18-a356-ccc4a7e13e43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You can also look at developing a custom analyzer so that your phrase is
not broken up at white space when indexed.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html

Selecting the correct combination of char filters and tokenizers will
retain phrases.

For example, using the whitespace analyzer will separate on whitespace:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=whitespace' -d
'foo bar baz'
{
"tokens" : [ {
"token" : "foo",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "bar",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 2
}, {
"token" : "baz",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 3
} ]
}

However, using the keyword analyzer will retain the entire phrase:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=keyword' -d 'foo
bAr baZ'
{
"tokens" : [ {
"token" : "foo bAr baZ",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 1
} ]
}

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, <valerij.v...@googlemail.com
<javascript:>> wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87fbc699-ade2-489f-b715-a987066d6cc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.