I have a set of documents and I stored term vectors of a text field when indexing. Is it possible to get all the tokens stored in the term vectors of all documents?
I want to get a vocabulary for my documents and that is why thinking in this direction.
I don't think there is a way to get all term_vectors for all documents in a single call, but you can get term_vectors for a specific document by id. Maybe you can simply run a search scroll and retrieve term_vectors for each document? That may take a while depending on how many documents you have, but it should do it.
Because I have a large number of documents, retrieving for each document seem to be very inefficient. But it looks like that's the best option available.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.