Word count per document


(Aharon Twizer) #1

Hi,

I'm new to ElasticSearch.

What I want to do is to upload a few hundred documents and then look for
words in those documents.

The most important part is to get the count of the each word per document.
e.g. If I look for the word "boy", the answer I'll get is that it appears 3
times in document A and 5 times in document B.

Can I do that with ElasticSearch?

Thanks in advanced!

Cheers,
Aharon.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f716d555-071f-44da-b868-6bc9ddd6455d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #2

Yes, take a look here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 18, 2014 at 2:52 PM, Aharon Twizer aharon.twizer@gmail.comwrote:

Hi,

I'm new to ElasticSearch.

What I want to do is to upload a few hundred documents and then look for
words in those documents.

The most important part is to get the count of the each word per document.
e.g. If I look for the word "boy", the answer I'll get is that it appears 3
times in document A and 5 times in document B.

Can I do that with ElasticSearch?

Thanks in advanced!

Cheers,
Aharon.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f716d555-071f-44da-b868-6bc9ddd6455d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f716d555-071f-44da-b868-6bc9ddd6455d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Ztj0yDSS%2BAT8%3DM-DG7_JrjfsrLuK725RzTPEF57s6wRPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Aharon Twizer) #3

Thanks Itamar.

But with the Term Vector I'll have to make a separate call for each
document (I can have up to 20K documents).

I want to be able to make a single call with the word I'm looking for and
to get the statistics for each document.

On Friday, April 18, 2014 2:52:53 PM UTC+3, Aharon Twizer wrote:

Hi,

I'm new to ElasticSearch.

What I want to do is to upload a few hundred documents and then look for
words in those documents.

The most important part is to get the count of the each word per document.
e.g. If I look for the word "boy", the answer I'll get is that it appears 3
times in document A and 5 times in document B.

Can I do that with ElasticSearch?

Thanks in advanced!

Cheers,
Aharon.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e6e0ed5-3e3f-44a4-b11f-7f8efee2bbeb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #4

You should be able to do this using the aggregations framework:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html

The idea is that you bucket on document ID, and then on terms, then do a
count

But I'm not sure it was designed to handle this scenario, where you have
tens of thousands of buckets and then many unique terms in each bucket.
Maybe someone from ES core can chime in on that.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 18, 2014 at 3:40 PM, Aharon Twizer aharon.twizer@gmail.comwrote:

Thanks Itamar.

But with the Term Vector I'll have to make a separate call for each
document (I can have up to 20K documents).

I want to be able to make a single call with the word I'm looking for and
to get the statistics for each document.

On Friday, April 18, 2014 2:52:53 PM UTC+3, Aharon Twizer wrote:

Hi,

I'm new to ElasticSearch.

What I want to do is to upload a few hundred documents and then look for
words in those documents.

The most important part is to get the count of the each word per
document. e.g. If I look for the word "boy", the answer I'll get is that it
appears 3 times in document A and 5 times in document B.

Can I do that with ElasticSearch?

Thanks in advanced!

Cheers,
Aharon.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4e6e0ed5-3e3f-44a4-b11f-7f8efee2bbeb%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4e6e0ed5-3e3f-44a4-b11f-7f8efee2bbeb%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtQiwBa17exGbhoiGR%2B3-hvYMK4_3ueci1V_Lu7TS23WA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5