Hit Counts within a Document

In our current application, it is important to know the number of times
hits were found within a document for a given search. We are considering
using elasticsearch but this is one area I have yet to find a solution for
with elasticsearch. The only thing I have found remotely possible is
getting the highlighted hits, then counting them. This of course could be
rather time consuming possibly having to parse 1000s of documents just to
get the number of hits within a document. The data is there of course
within elasticsearch as it is able to highlight the hits, I just need to
know how to get access to those hit numbers per document.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e7a6b6c-a4c4-4d41-8fa4-035b19341c69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Darren ,

What do you mean by number of hits ?
Is it the number of occurrence of a term in a document ?

Thanks
Vineeth

On Fri, Sep 5, 2014 at 6:32 PM, Darren Trzynka darrentrzynka@gmail.com
wrote:

In our current application, it is important to know the number of times
hits were found within a document for a given search. We are considering
using elasticsearch but this is one area I have yet to find a solution for
with elasticsearch. The only thing I have found remotely possible is
getting the highlighted hits, then counting them. This of course could be
rather time consuming possibly having to parse 1000s of documents just to
get the number of hits within a document. The data is there of course
within elasticsearch as it is able to highlight the hits, I just need to
know how to get access to those hit numbers per document.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8e7a6b6c-a4c4-4d41-8fa4-035b19341c69%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8e7a6b6c-a4c4-4d41-8fa4-035b19341c69%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nkq53EzDCDBMvuDqnAa4MbWyJ6MyXYGUVnAgD0eSiHOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hello Darren ,

If its term frequency of a word that you are looking for , you can use
script fields -

{
"fields": [
"text"
],
"query": {
"term": {
"text": "god"
}
},
"script_fields": {
"tf": {
"script": "_index['text']['god'].tf()"
}
}
}

SCRIPTING -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
SCRIPT FIELDS -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-script-fields.html#search-request-script-fields

Thanks
Vineeth

On Fri, Sep 5, 2014 at 6:45 PM, vineeth mohan vm.vineethmohan@gmail.com
wrote:

Hello Darren ,

What do you mean by number of hits ?
Is it the number of occurrence of a term in a document ?

Thanks
Vineeth

On Fri, Sep 5, 2014 at 6:32 PM, Darren Trzynka darrentrzynka@gmail.com
wrote:

In our current application, it is important to know the number of times
hits were found within a document for a given search. We are considering
using elasticsearch but this is one area I have yet to find a solution for
with elasticsearch. The only thing I have found remotely possible is
getting the highlighted hits, then counting them. This of course could be
rather time consuming possibly having to parse 1000s of documents just to
get the number of hits within a document. The data is there of course
within elasticsearch as it is able to highlight the hits, I just need to
know how to get access to those hit numbers per document.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8e7a6b6c-a4c4-4d41-8fa4-035b19341c69%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8e7a6b6c-a4c4-4d41-8fa4-035b19341c69%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mgG2Sh50A64FWCfZp7Abwa1qX_qs5k-qUx1pJowpp2uQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Vineeth,
Thanks for responding. What I am looking for is provided I perform a
search for various terms, how given the search result can I understand the
frequency of the hits within documents. For example, I perform a full text
search on cat. 5 documents are returned. I could today get the terms that
were found highlighted but that is of course quite nasty. Instead what I
would like returned is the documents but something like for each document
saying:
Document 1 (group: 1): cat - 5
Document 2 (group: 2): cat - 3
Document 3 (group: 1): cat - 2
...
Document n - cat - #

Also, there is other metadata that it would be nice to aggregate on too so
I could get an answer for the above scenario:
group : 1 - cat - 7
group : 2 - cat - 3

Thanks
Darren

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76044495-afc9-4c51-b3f3-6ea7e636bc01%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Darren ,

Following query does what you have asked for ( replace FIELD with the field
you are looking for) -

{
"fields": [
"text"
],
"query": {
"term": {
"text": "god"
}
},
"script_fields": {
"tf": {
"script": "_index['FIELD']['cat'].tf()"
}
}
}

For the second one , use -

{
"query": {
"term": {
"FIELD": "CAT"
}
},
"aggs": {
"groupName": {
"terms": {
"field": "GROUP_FIELD"
},
"aggs": {
"catStats": {
"sum": {
"script": "_index['FIELD']['CAT'].tf()"
}
}
}
}
}
}

Thanks
Vineeth

On Mon, Sep 8, 2014 at 6:24 PM, Darren Trzynka darrentrzynka@gmail.com
wrote:

Vineeth,
Thanks for responding. What I am looking for is provided I perform a
search for various terms, how given the search result can I understand the
frequency of the hits within documents. For example, I perform a full text
search on cat. 5 documents are returned. I could today get the terms that
were found highlighted but that is of course quite nasty. Instead what I
would like returned is the documents but something like for each document
saying:
Document 1 (group: 1): cat - 5
Document 2 (group: 2): cat - 3
Document 3 (group: 1): cat - 2
...
Document n - cat - #

Also, there is other metadata that it would be nice to aggregate on too so
I could get an answer for the above scenario:
group : 1 - cat - 7
group : 2 - cat - 3

Thanks
Darren

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76044495-afc9-4c51-b3f3-6ea7e636bc01%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/76044495-afc9-4c51-b3f3-6ea7e636bc01%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DfRzwke0GK_pn8WxPBJ6c%2B97yOyDPmkXcWkQJf%3Dy5rfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Vineeth,
I just saw your response today and I came to the same conclusion yesterday
after you gave me a nice working example! I took it a step further doing
the same grouping by the field that you did and it came out nicely.
Something is sinking in anyways with me..-)

Besides some possible language support issues, the biggest thing I see for
challenges could be if stemming is involved (you search for federal and
hits are returned on federal, federalizing, etc... so if you just look for
federal in the term count, it wouldn't find all the matches) and then
dealing with case sensitivity when looking at the term frequencies (the
user typed in "Federal cases" which matches by default on federal and
cases) it seems you would need to lower case the lookup for the term
frequencies. What do you think about these cases?

Thanks!
Darren

On Mon, Sep 8, 2014 at 11:28 AM, vineeth mohan vm.vineethmohan@gmail.com
wrote:

Hello Darren ,

Following query does what you have asked for ( replace FIELD with the
field you are looking for) -

{
"fields": [
"text"
],
"query": {
"term": {
"text": "god"
}
},
"script_fields": {
"tf": {
"script": "_index['FIELD']['cat'].tf()"
}
}
}

For the second one , use -

{
"query": {
"term": {
"FIELD": "CAT"
}
},
"aggs": {
"groupName": {
"terms": {
"field": "GROUP_FIELD"
},
"aggs": {
"catStats": {
"sum": {
"script": "_index['FIELD']['CAT'].tf()"
}
}
}
}
}
}

Thanks
Vineeth

On Mon, Sep 8, 2014 at 6:24 PM, Darren Trzynka darrentrzynka@gmail.com
wrote:

Vineeth,
Thanks for responding. What I am looking for is provided I perform a
search for various terms, how given the search result can I understand the
frequency of the hits within documents. For example, I perform a full text
search on cat. 5 documents are returned. I could today get the terms that
were found highlighted but that is of course quite nasty. Instead what I
would like returned is the documents but something like for each document
saying:
Document 1 (group: 1): cat - 5
Document 2 (group: 2): cat - 3
Document 3 (group: 1): cat - 2
...
Document n - cat - #

Also, there is other metadata that it would be nice to aggregate on too
so I could get an answer for the above scenario:
group : 1 - cat - 7
group : 2 - cat - 3

Thanks
Darren

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76044495-afc9-4c51-b3f3-6ea7e636bc01%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/76044495-afc9-4c51-b3f3-6ea7e636bc01%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/vRxbDxqjxVg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DfRzwke0GK_pn8WxPBJ6c%2B97yOyDPmkXcWkQJf%3Dy5rfA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DfRzwke0GK_pn8WxPBJ6c%2B97yOyDPmkXcWkQJf%3Dy5rfA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFkmSJ-8joV%3DzAu6rRNU9GQ100yzGYXqY4QG2gAod1pniu5qzw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hello Darren ,

I am glad that my solution worked for you.

The approach there is to use multi fields.
One field , keep the raw data by declaring the analyzer as not_analyzed.
Example is sited in this link -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#_multi_fields_3

Thanks
Vineeth

On Tue, Sep 9, 2014 at 9:27 PM, Darren Trzynka darrentrzynka@gmail.com
wrote:

Vineeth,
I just saw your response today and I came to the same conclusion yesterday
after you gave me a nice working example! I took it a step further doing
the same grouping by the field that you did and it came out nicely.
Something is sinking in anyways with me..-)

Besides some possible language support issues, the biggest thing I see for
challenges could be if stemming is involved (you search for federal and
hits are returned on federal, federalizing, etc... so if you just look for
federal in the term count, it wouldn't find all the matches) and then
dealing with case sensitivity when looking at the term frequencies (the
user typed in "Federal cases" which matches by default on federal and
cases) it seems you would need to lower case the lookup for the term
frequencies. What do you think about these cases?

Thanks!
Darren

On Mon, Sep 8, 2014 at 11:28 AM, vineeth mohan vm.vineethmohan@gmail.com
wrote:

Hello Darren ,

Following query does what you have asked for ( replace FIELD with the
field you are looking for) -

{
"fields": [
"text"
],
"query": {
"term": {
"text": "god"
}
},
"script_fields": {
"tf": {
"script": "_index['FIELD']['cat'].tf()"
}
}
}

For the second one , use -

{
"query": {
"term": {
"FIELD": "CAT"
}
},
"aggs": {
"groupName": {
"terms": {
"field": "GROUP_FIELD"
},
"aggs": {
"catStats": {
"sum": {
"script": "_index['FIELD']['CAT'].tf()"
}
}
}
}
}
}

Thanks
Vineeth

On Mon, Sep 8, 2014 at 6:24 PM, Darren Trzynka darrentrzynka@gmail.com
wrote:

Vineeth,
Thanks for responding. What I am looking for is provided I perform a
search for various terms, how given the search result can I understand the
frequency of the hits within documents. For example, I perform a full text
search on cat. 5 documents are returned. I could today get the terms that
were found highlighted but that is of course quite nasty. Instead what I
would like returned is the documents but something like for each document
saying:
Document 1 (group: 1): cat - 5
Document 2 (group: 2): cat - 3
Document 3 (group: 1): cat - 2
...
Document n - cat - #

Also, there is other metadata that it would be nice to aggregate on too
so I could get an answer for the above scenario:
group : 1 - cat - 7
group : 2 - cat - 3

Thanks
Darren

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76044495-afc9-4c51-b3f3-6ea7e636bc01%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/76044495-afc9-4c51-b3f3-6ea7e636bc01%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/vRxbDxqjxVg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DfRzwke0GK_pn8WxPBJ6c%2B97yOyDPmkXcWkQJf%3Dy5rfA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DfRzwke0GK_pn8WxPBJ6c%2B97yOyDPmkXcWkQJf%3Dy5rfA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAFkmSJ-8joV%3DzAu6rRNU9GQ100yzGYXqY4QG2gAod1pniu5qzw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAFkmSJ-8joV%3DzAu6rRNU9GQ100yzGYXqY4QG2gAod1pniu5qzw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mqWNojht2KxBtQQgTrnFg1j%3D4VUuYbKkYwUJkAaX%2B%2BTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.