Hi,
I am currently doing some tests to see if I can use Apache Spark and the elasticsearch-spark (elasticsearch-hadoop) to analyse data that I have in elasticsearch.
I am testing using the following versions in a docker-compose setup:
Spark: 3.1.2
Elasticsearch: 7.12.1
For the spark job its a java program using dependencies:
- spark-core_2.12:3.1.2
- elasticsearch-spark-30_2.12:7.13.2
I use spark-submit to send the application to spark.
In elasticsearch I have one (1) index where some different string fields are mapped. I have tested text, keyword and wildcard, however I am most interested in the wildcard variant because of its properties for efficient wildcard searches.
...
"fielda" : { "type" : "keyword" },
"fieldb" : { "type" : "wildcard" },
...
Problem
Fields of type wildcard simply does not seem to exist when accessing the data through the JavaPairRDD in the spark job.
Examples:
When I run the spark job and filters+counts all documents where fieldb is some value I get 0 hits although I know it should give 2 hits.
I have tried using the same code but for fielda which is of type keyword and then the 2 hits are found.
Also, I have made a simple .first() to just get hold of first available indexed document and when I print all the entries in the associated key-value map any field that is of type wildcard just does not appear.
I have compared these results with the response when I use rest API to directly search elasticsearch and there I can see and search both keyword and wildcard fields, no problem.
Some questions of mine:
- Is the wildcard field type not supported? And if not, anybody know if this feature is planned to be added soon in elasticsearch-hadoop/elasticsearch-spark?
- Is there any way to get wildcard fields to work with elasticsearch-spark?