Any guesses?
SQL is not all about joins
I don't believe this is on our roadmap at all, because it's not as simple as just adding the command to the SQL interface.
You mean to say in future releases SQL JOIN support is not on the queue, rest other SQL commands will be available. Why Elastic can not give support of JOIN's?
As I said, it's not just as simple as providing it as a SQL command. Elasticsearch is a distributed system, and to do a join is Not A Simple Thing.
Ok, I was waiting for ElasticSearch to give support of SQL joins which would make life easier. However, Can we do task close to SQL joins in ElasticSearch with query DSL?
The question is "Why do you think you need Joins to solve your use case?".
Could explain what your use case is? May be with a simple example?
I will try to be as simple as possible.
I have two tables, say table1 and table2. These tables are joined using One to One relationship.
I have indexed these two tables data in elasticsearch as doc1 and doc2 using logstash jdbc input.
Without join how I will find common data in doc1 and doc2?
I know what a relational database is. I did most of my experience on that.
Having tables is not a use case. That's an implementation detail.
What is the use case? What is the content of table 1 and table 2?
What kind of data are your users searching for? What are the properties they need to search with?
Here is the use case.
We have two tables port(1 million records) and physicalport(3 million records).
port table contains column hortname and physicalport table contains alias.
In column hostname data is same as in column alias. Ex
hotname: CHHBHARTH02 in some row number
alias : CHHBHARTH02 in some row number
We have to find common data in these two tables. As in above example one data is common.
If we do this in MySQL it takes lots of time to give result,
So we want to experiment with Elasticsearch to get the result faster.
We have to find common data in these two tables.
If I fully understand the use case, there is no easy way to do that in elasticsearch IMO.
Unless someone else has an idea.
Ok. Let's wait till someone see this and reply with a solution.
BTW
Thanks David.
If I understand this correctly, you could easily solve this by normalising the data. A hostname table containing the hostname with an auto-generated ID first. Then the port and physical port tables use the ID in place of the actual string (ID of CHHBHARTH02 in hostname/alias column instead of that word). With proper indexes in places, joins should be much faster as well as the queries - normal and analytics type. This is a typical RDBMS use case and 1-3 million data is not huge at all, so queries and joins should be fast. I'm not sure why you're trying to do this on ES.
We want to experiment with ES to do similar kind of comparison which we do on relational tables.
Can somebody tell if It is right way to send bulk compare request using client.msearch API with term query?
Here is What I am doing.
- I am reading first doc 1 lac records from doc1 using ES javascript APIs.
- Instead of sending one by one compare request on ES I am doing bulk compare using msearch.
- I have divided 1 lac compare requests into 5 msearch API calls and calculating time of each response.
For 1 lac compare its taking 12 seconds but on MySQL its taking 4 seconds.
Is ES is for such kind of comparisons?
Not really I would say. Comparing large sets of data to me this sounds like a type of analysis that might be better performed in a relational database with proper indexing.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.