Count of Words (Text Based Search) Using Facets


(Hiro Gangwani) #1

Hi,
We are indexing PDF, Word document in ES using attachment as type. Text
based search is implemented using QueryBuilder and field query.Is it
possible to get the count of words as defined in the search criteria for
each results returned.

For example:
Document A contain Java key word 50 times and Document B contains Java key
word 30 times.
When search criteria is "Java" and text based search is executed we get 2
documents in search results.
Is it possible to get count of Java in document A and document B?
I have used Term facets which just given count of documents where Java text
is defined. In this case only 2. What we need is count of Java word in each
document returned in result.

We are stuck up with this requirement and unable to find the solution for
this. Any help for this issue is appreciated and thanks in advance.

Hiro

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73629eee-7b58-44d4-87b3-aeb0d18b4c03%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jun Ohtani) #2

Hi Hiro,

I think you should use script term statistics.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html#_term_statistics

I post sample json and query DSL to gist.

Note: Term “Java” is indexed just “java”, because standard analyzer use lowercase_filter.
My sample script use “java” not “Java”.

I am hoping that it will be helpful for you.

Regards

Jun Ohtani
johtani@gmail.com
blog : http://blog.johtani.info
twitter : http://twitter.com/johtani

2014/01/28 15:54、Hiro Gangwani hiro.gangwani@gmail.com のメール:

Hi,
We are indexing PDF, Word document in ES using attachment as type. Text based search is implemented using QueryBuilder and field query.Is it possible to get the count of words as defined in the search criteria for each results returned.

For example:
Document A contain Java key word 50 times and Document B contains Java key word 30 times.
When search criteria is "Java" and text based search is executed we get 2 documents in search results.
Is it possible to get count of Java in document A and document B?
I have used Term facets which just given count of documents where Java text is defined. In this case only 2. What we need is count of Java word in each document returned in result.

We are stuck up with this requirement and unable to find the solution for this. Any help for this issue is appreciated and thanks in advance.

Hiro

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73629eee-7b58-44d4-87b3-aeb0d18b4c03%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(jsbonline2006) #3

Hi

If I run the query that you have mentioned that I get the following error:
Added in your thread

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ad078ea-b2e5-450e-bc88-7a3446889a6f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jun Ohtani) #4

Hi Jayesh,

Sorry, I use chrome plugin "sense" .
Do you use curl command?

You try to use the following command, if you use curl.

curl -XGET "http://localhost:9200/sample/doc_count/_search" -d'
{
"query": {
"query_string": {
"default_field": "text",
"query": "java"
}
},
"script_fields": {
"term_count": {
"script": "_index["text"]["java"].tf()"
}
}
}'

Does it make sense?

Regards,
Jun

2014-02-05 Jayesh Bhoyar jsbonline2006@gmail.com:

Hi

If I run the query that you have mentioned that I get the following error:
Added in your thread
https://gist.github.com/johtani/8818938/

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7ad078ea-b2e5-450e-bc88-7a3446889a6f%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--

Jun Ohtani
blog : http://blog.johtani.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPW8A5w0K7ah32Hb1VG8H0C1EqAxZVHVyXKqJeThu-eTSZp7Pg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(jsbonline2006) #5

Hi Jun,

I used Head plugin as well as curl command but for both the approach I am
getting the error :frowning:

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2e94324-b85e-422a-b913-e59e2fc28421%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jun Ohtani) #6

Hi Jayesh,

Umm, what the version of es do you use?
I use 1.0.0.RC1 and 0.90.10.

Regards,
Jun

2014-02-05 Jayesh Bhoyar jsbonline2006@gmail.com:

Hi Jun,

I used Head plugin as well as curl command but for both the approach I am
getting the error :frowning:

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d2e94324-b85e-422a-b913-e59e2fc28421%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--

Jun Ohtani
blog : http://blog.johtani.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPW8A5wfzPS4eZCYi9cR-xCXaYfbL2c1kaC%3DkSuJLcGT-sENHA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jun Ohtani) #7

Hi Jayesh,

Term statistics is available 0.90.10 or higher.

See:
http://www.elasticsearch.org/downloads/0-90-10/

2014-02-05 Jun Ohtani johtani@gmail.com:

Hi Jayesh,

Umm, what the version of es do you use?
I use 1.0.0.RC1 and 0.90.10.

Regards,
Jun

2014-02-05 Jayesh Bhoyar jsbonline2006@gmail.com:

Hi Jun,

I used Head plugin as well as curl command but for both the approach I am
getting the error :frowning:

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d2e94324-b85e-422a-b913-e59e2fc28421%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--

Jun Ohtani
blog : http://blog.johtani.info

--

Jun Ohtani
blog : http://blog.johtani.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPW8A5xCo%3DDpspzTJXkfwFqr69aAW%2BVs9ekj7y-MNP9VOrxCwg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(jsbonline2006) #8

Thanks Jun,
I was using 90.7. let me try using 90.10 or 1.0

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/731bf9d0-6e10-4020-8dea-f75b23baaf79%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hiro Gangwani) #9

Hi,
Can we get the corresponding code using Java API. We are using
QueryBuilders.fieldQuery method for searching the test from files.

That will help.

Thanks,

Hiro

On Tuesday, 28 January 2014 12:24:05 UTC+5:30, Hiro Gangwani wrote:

Hi,
We are indexing PDF, Word document in ES using attachment as type. Text
based search is implemented using QueryBuilder and field query.Is it
possible to get the count of words as defined in the search criteria for
each results returned.

For example:
Document A contain Java key word 50 times and Document B contains Java key
word 30 times.
When search criteria is "Java" and text based search is executed we get 2
documents in search results.
Is it possible to get count of Java in document A and document B?
I have used Term facets which just given count of documents where Java
text is defined. In this case only 2. What we need is count of Java word in
each document returned in result.

We are stuck up with this requirement and unable to find the solution for
this. Any help for this issue is appreciated and thanks in advance.

Hiro

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/47d12008-3ba9-438e-9c3d-4b1c01a18e55%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jun Ohtani) #10

Hi,

I update gist adding TermFreqQuerySample.java.

I hope this helps.

Regards,


Jun Ohtani
johtani@gmail.com
blog : http://blog.johtani.info
twitter : http://twitter.com/johtani

2014/02/13 21:17、Hiro Gangwani hiro.gangwani@gmail.com のメール:

Hi,
Can we get the corresponding code using Java API. We are using QueryBuilders.fieldQuery method for searching the test from files.

That will help.

Thanks,

Hiro

On Tuesday, 28 January 2014 12:24:05 UTC+5:30, Hiro Gangwani wrote:
Hi,
We are indexing PDF, Word document in ES using attachment as type. Text based search is implemented using QueryBuilder and field query.Is it possible to get the count of words as defined in the search criteria for each results returned.

For example:
Document A contain Java key word 50 times and Document B contains Java key word 30 times.
When search criteria is "Java" and text based search is executed we get 2 documents in search results.
Is it possible to get count of Java in document A and document B?
I have used Term facets which just given count of documents where Java text is defined. In this case only 2. What we need is count of Java word in each document returned in result.

We are stuck up with this requirement and unable to find the solution for this. Any help for this issue is appreciated and thanks in advance.

Hiro

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/47d12008-3ba9-438e-9c3d-4b1c01a18e55%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #11