Elasticsearch phraze term frequency .tf() containing multiple words

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, valerij.vasilcenko@googlemail.com wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mjK%3DbgdSEZvrsfz5d_HnN8BTrJ5d9O4yAHQuOODE4YWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You can also look at developing a custom analyzer so that your phrase is
not broken up at white space.

Selecting the correct combination of char filters and tokenizers will
retain phrases.

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, <valerij.v...@googlemail.com
<javascript:>> wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e5795d8-c5ec-4a18-a356-ccc4a7e13e43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You can also look at developing a custom analyzer so that your phrase is
not broken up at white space when indexed.

Selecting the correct combination of char filters and tokenizers will
retain phrases.

For example, using the whitespace analyzer will separate on whitespace:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=whitespace' -d
'foo bar baz'
{
"tokens" : [ {
"token" : "foo",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "bar",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 2
}, {
"token" : "baz",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 3
} ]
}

However, using the keyword analyzer will retain the entire phrase:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=keyword' -d 'foo
bAr baZ'
{
"tokens" : [ {
"token" : "foo bAr baZ",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 1
} ]
}

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:

Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2 , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
Vineeth

On Tue, Oct 28, 2014 at 3:06 PM, <valerij.v...@googlemail.com
<javascript:>> wrote:

I want to access frequency of a phraze combined from multiple words e.g.
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() +
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green
energy" as

_index['content']['green energy'].tf() does not work

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87fbc699-ade2-489f-b715-a987066d6cc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.