Returning term_vector info within a search query

Patrick_Lam · August 29, 2015, 6:32am

I posted a related question about this same issue at SO (http://bit.ly/1LIdc6y) and received some helpful advice, but I thought I'd post here just in case anybody has some more insight into this.

I'm working on building an application that uses Elasticsearch with Apache Spark. I'm trying to use ES to store/index the documents for query purposes and also use the ES analyzers to process the documents for machine learning (I know that ES is not really built specifically for this). Basically, I need to pull the (ES-analyzed) tokens from each document into Spark.

I know you can get the tokens and counts of each token per document in two ways: through the term_vector API and through the Analyze API. However, both of those are very slow and efficient for large datasets since they have to do a REST call for each document.

My question is this: Is there a way to get the information from the term_vector API returned as a result of the search query itself (through some setting within the request body for example)? I'm mainly interested in the tokens and their frequencies within each document. The closest I've seen is specifying the "fielddata_fields" option for my text field. This manages to return the tokens themselves but not the token frequencies within the document(s). Is there a way to return both using only the search query?

Alex_Ksikes · August 29, 2015, 2:42pm

You could have term vectors returned as part of a script field. In version 2.0 we are making fetch sub phases pluggable, so you could have your own way of fetching term vectors, field data etc ... while searching. Hope this helps.

Topic		Replies	Views
How to get term_vector for a document Elasticsearch	5	515	July 6, 2017
Term Vector Access Elasticsearch	2	278	July 6, 2017
Term Vector pre-analyzed index Elasticsearch	1	339	July 6, 2017
Is there a way to get all the tokens in the term vector of an index Elasticsearch	3	2644	July 5, 2017
Word count/frequency per field Elasticsearch	3	3413	January 10, 2019

Returning term_vector info within a search query

Related topics