You cannot apply projection since fields is internally used as well. For fine grained control over the mapping, consider using DataFrames which are basically RDDs plus schema.
Using Elasticsearch to create such a basic query (to select 1-2 fields) is just wasteful. Simply add "fields" to the query as indicated here.
I'll reiterate my point though, an RDD with a schema is a Spark DataFrame. That provides not just fine control over the underlying structure but also pushed down operations - that is, the connector translating the SQL to an actual ES query.
This documentation section provides more information.
Using an RDD while trying to select the fields and such, will not only reinvent parts of Spark SQL that are already available, but also provide only a subset and ignore all the other optimizations available.
Sounds like a great explanation for preferring Data Frames over RDDs. But, i already have the ES queries created. I guess in such cases the es.mapping.include and es.mapping.exclude properties in the configuration must be used. However, this makes the configuration object specific to a particular index. I think I will have to move to DataFrames eventually!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.