What I want to do is to upload a few hundred documents and then look for
words in those documents.
The most important part is to get the count of the each word per document.
e.g. If I look for the word "boy", the answer I'll get is that it appears 3
times in document A and 5 times in document B.
What I want to do is to upload a few hundred documents and then look for
words in those documents.
The most important part is to get the count of the each word per document.
e.g. If I look for the word "boy", the answer I'll get is that it appears 3
times in document A and 5 times in document B.
But with the Term Vector I'll have to make a separate call for each
document (I can have up to 20K documents).
I want to be able to make a single call with the word I'm looking for and
to get the statistics for each document.
On Friday, April 18, 2014 2:52:53 PM UTC+3, Aharon Twizer wrote:
Hi,
I'm new to Elasticsearch.
What I want to do is to upload a few hundred documents and then look for
words in those documents.
The most important part is to get the count of the each word per document.
e.g. If I look for the word "boy", the answer I'll get is that it appears 3
times in document A and 5 times in document B.
You should be able to do this using the aggregations framework:
The idea is that you bucket on document ID, and then on terms, then do a
count
But I'm not sure it was designed to handle this scenario, where you have
tens of thousands of buckets and then many unique terms in each bucket.
Maybe someone from ES core can chime in on that.
But with the Term Vector I'll have to make a separate call for each
document (I can have up to 20K documents).
I want to be able to make a single call with the word I'm looking for and
to get the statistics for each document.
On Friday, April 18, 2014 2:52:53 PM UTC+3, Aharon Twizer wrote:
Hi,
I'm new to Elasticsearch.
What I want to do is to upload a few hundred documents and then look for
words in those documents.
The most important part is to get the count of the each word per
document. e.g. If I look for the word "boy", the answer I'll get is that it
appears 3 times in document A and 5 times in document B.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.