Term vector of all documents

tyka · September 9, 2016, 2:25am

I have a corpus of documents indexed . I also stored the term vectors when indexing. Now I want to retrieve term vectors of all documents satisfying some filtering options.
I was able to get term vector for a single document or for a set of documents by providing the document IDs. But is there a way to get term vectors for all the documents without providing document IDs?
Eventually what I want to do is to get the frequency counts of all the terms in a field, for all documents in an index (i.e., a bag of words matrix).

I am using elasticsearch-py as a client.

Appreciate any pointers. Thanks!

tyka · September 9, 2016, 6:24pm

For this task, is there a way to aggregate on termvectors for a field?

javanna · September 12, 2016, 8:50am

There is no way to aggregate on term_vectors. The only way to retrieve term_vectors that I'm aware of is per document id, the way to retrieve them for all documents matching a query would be to run a search scroll and retrieve term_vectors for each document returned by id. Actually there's also the multi term vector api that allows to retrieve term_vectors for multiple documents at the same time which is a better fit, so you could batch them.

tyka · September 12, 2016, 6:06pm

Thanks!
Yes, I tried the multi-termvector approach, but still I have to provide the list of document IDs which is huge, in the order of hundreds of millions.

Topic		Replies	Views
Is there a way to get all the tokens in the term vector of an index Elasticsearch	3	2622	July 5, 2017
Obtaining the Term Vector for an index, and total word count of the index Elasticsearch	1	335	May 1, 2019
How to get term vectors of all documents for a given type! Elasticsearch	1	293	July 6, 2017
How to get term vectors of all documents for a given type! Elasticsearch	1	298	July 6, 2017
Get document and term frequency for a term across all documents Elasticsearch	1	372	September 23, 2019

Term vector of all documents

Related topics