Multi search query with spark

newvalue · September 13, 2015, 3:13pm

Hi all!
I want to scan documents that match for a list of queries. I want to use spark with elastic. I know that elasticsearch support multi query using endpoint _msearch, but I don't know to set up this type of query for spark intergration case (https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html).
The second question is that can you give some advices on chosing number of queries in a multi query (I only concern on the cases when my query type is search or scan).
Thanks!

eliasah · September 17, 2015, 11:54am

What you need to know is when you use the Spark connector with Elasticsearch, you are actually moving the data from the elasticsearch cluster into Spark itself then you can use what ever domain specific transformation or computation on but you are out of the Elasticsearch scope.

Thus, you can't apply the multi search query on your data within Spark. It order to search easily on the data, you'll need to use the DataFrame structure which allows you SQL search queries.

As for your second question, I'm afraid that depends on your cluster configuration and data size considering that you are only using the multi search query on Elasticsearch itself.

You can benchmark your cluster performance using load testing tool like Tsung or Apache JMeter and then you can decided on the number of queries that can suite your use case (time-wise or size-wise, per example)

Personally, I use Tsung as it easy very light and easy to configure and use.

EDIT: I have written a blog post concerning basic load testing Elasticsearch with Tsung. You can find it here.

costin · October 13, 2015, 2:25pm

The connector doesn't support msearch. Simply because it's quite complicated to map the results back the Hadoop/Spark consumer. And because doing so in parallel is fairly difficult.
What would be your use case? What are you trying to achieve ?

newvalue · October 13, 2015, 3:16pm

Thank you for all responses and sorry the late reply.
I agree that it's hard to map the results back the spark consumer. It is one of the reason that I quit trial this approach more.
I currently work on a project that I need to match a lot queries ( a contextual advertising system). Basically, we needs to match a lot of ads (which is described by a set of keywords) to content of each url that goes into our system. Firstly, I thought that we can use a search engine to fast matching keywords. But when the result of that is we have to join thousands of rdds. It became very slow in our practices. As I understand that we can apply multiple queries in a search (such as using msearch) to use the system more effectively, so we hope that the time of execution can be better.
After several tries, we found that It might be impossible, so we quit to trial this approach. By matching directly content using appropriately data structure instead, we get more reasonable time than the idea of using search. However, elastic search can be still useful for small update.

Weizhi_Li · March 24, 2017, 9:53pm

Hi Tuan,

I have the same problem. I have created 5,000 query to ES, it looks like ES cannot handle these 5,000 query in Spark.

However, I can do 100, it is quick.

So, what is your solution finally?

newvalue · March 25, 2017, 1:33am

Hi Weizhi Li,
You can combine thefollowing approaches:

Combine some queries at a time, like you said.
If number of queries is large, just check by hand if a document is matched with any of them.

Weizhi_Li · March 30, 2017, 10:56pm

Hi Tuan,

You are right. Now, I did the bucket query. Then, save the results to S3 sequence.

The speed is slow.

Topic		Replies	Views
Elastisearch-Hadoop how to do a bulk search in spark program Elasticsearch es-hadoop	2	973	October 10, 2017
Possibility of querying all types in elasticsearch-spark instead of the strict index/type es.resource Elasticsearch es-hadoop	7	1673	July 6, 2017
ES-Hadoop PySpark error Elasticsearch es-hadoop	2	2170	January 10, 2018
Perofrmance problem on es-hadoop + spark Elasticsearch es-hadoop	5	1279	July 6, 2017
Databricks Spark SQL and Elasticsearch Elasticsearch es-hadoop	5	501	April 19, 2023

Multi search query with spark

Related Topics