I have a query to ask you specifically regarding the Search Connectors for Microsoft SQL.
I have successfully connected to the Microsoft SQL database, and there are multiple tables that I would like to query by joining them.
The question is :
How can we perform a join query to connect multiple tables?
While I've looked into Sync rules, it seems that this option is primarily designed for filtering, as per my understanding. I previously inquired with support, and they suggested that we might consider using Logstash JDBC. I'm seeking other options on how we can effectively execute a join query to combine multiple tables.
My second question relates to content syncing, and I'd like to clarify my understanding regarding two types of content synchronization: Full Syncs and Incremental Syncs.
To confirm, Incremental Sync is exclusively available for the SharePoint Connector, while Full Sync is applicable to all connectors. My understanding of Full Sync is that it deletes the previously ingested data and replaces it with all the new data, ensuring that there are no issues with data duplication. Can you please confirm if this interpretation is correct?
How can we perform a join query to connect multiple tables?
Today, there's not a great way to do this. I would have said "with views," but we've had a recent bug report of Views not working with our MySQL connector, and I haven't personally tried MSSQL views with tour connector. I suggest giving that a try, and let us know how it works.
If that won't work, I'm happy to share that Elastic 8.11 (due to be released in the next week or so) will ship a version of our MSSQL connector that has support for "Advanced Sync Rules". While you're correct that Basic Sync Rules are mainly for filtering, "Advanced Sync Rules" allow you to do remote filtering, which for SQL translates into giving you precise control over the query executed against your database. You can read the documentation for this feature here, but reminder that this is not released as of writing.
To confirm, Incremental Sync is exclusively available for the SharePoint Connector, while Full Sync is applicable to all connectors.
This is correct. We hope to offer support for Incremental Syncs for other data sources in the future, but today (<=8.11) this is only available in our Sharepoint Online Connector.
My understanding of Full Sync is that it deletes the previously ingested data and replaces it with all the new data, ensuring that there are no issues with data duplication
Not quite. Full Syncs do a full scan of data in the 3rd-party data source. The first Full Sync will take your index from empty to fully-populated. Full syncs after the first will:
Fetch all the IDs from the current Elasticsearch index
Do a full scan of data in the 3rd-party data source
Issue updates/creates to Elasticsearch as necessary
Issue deletes for any IDs not found in the full data source scan that are currently present in the Elasticsearch index.
This sequence is designed to ensure that the index can go through a Full Sync without having to require any down time for a search application that uses it as a backend.
Hello @Sean_Story , I want to ask your confimation, as the 8.11 already came out and my team already upgraded our Elastic, I've been trying around with the SQL Query.
I want to understand the mechanism where as for example, I had 9 different queries that I need to apply.
If I applied all the 9 queries, and then the connectors will sync the data based on the query to link up the multiple tables.
Hence, when the user did the searching, the searching will match with all the queries is it ? 1 search = 9 sql queries
Or will the 1 search = 1 queries only, it will not process all the other queries?
I'm worried there's misunderstanding here. The connector, at a high level, is responsible for getting data from one source (in your case, MySQL) to an Elasticsearch index.
The Advanced Sync Rules allow you to be very precise about what data is ingested from MySQL to the Elasticsearch index. For example, if you had a table with auto-increment IDs, not using Advanced Sync Rules would get all the documents and put them in Elasticsearch. But using Advanced Sync Rules like:
[
{
"tables": ["table1"],
"query": "SELECT * FROM table1 where id==1"
},
{
"tables": ["table1"],
"query": "SELECT * FROM table1 where id==3"
},
{
"tables": ["table1"],
"query": "SELECT * FROM table1 where id==5"
}
]
Would index just the documents with IDs 1, 3, and 5.
When it comes to search time, through Elasticsearch, it all depends on what your query is. But if you did a match_all query, you will get all documents that were indexed, regardless of which SQL query they were indexed with. So in this latest example, you'd get all three results of 1, 3, and 5 in your result set.
When the user searches in Elasticsearch, it is not executing any on-demand searches/queries through the connector. It only searches the data that is in Elasticsearch.
Hello @Sean_Story thank you for you reply, but could you explain on what do you mean by if I did a match_all query? Is there any configuration I need to do to set it to match all the query?
Hello @Sean_Story I have one more question that I would like to confirm on, does the Advanced Sync Rules query is case-sensitive?
Such as, the query of table name case should reflect with those in Microsoft SQL?
If the table name in Microsoft SQL > 'ProductID' , hence the query should wrote 'ProductID' or can it be written as 'productid' also? Just need clarification on that.
It will depend on your host and its settings. The queries are just being passed directly through a python client, the connector does not enforce any casing restrictions of its own. You should be able to copy the queries as strings from whatever database tool you use to craft/test queries.
I've been playing around with the Advanced sync rules and wanted to do the INNER JOIN.
I've been trying the simplest one even for example :
SELECT * FROM Bird INNER JOIN Chicken ON Bird.ChickenId = Chicken.Id
but its not working and when I take a look my docker logs it shows :
[FMWK][06:31:05][WARNING] [Connector id: u****************L, index name: search-m********, Sync job id: **************] Skipping custom query for tables Bird, Chicken as there are multiple tables with same primary key column name.
Do we really can use the INNER JOIN query? I've tried other query it works fine except for inner join.
One of our engineers suggested that you may be able to get around this with AS usage to make sure that column names do not conflict in your query. The example they gave was:
SELECT users.id AS id, orders.order_id AS order_id, users.name AS user_name, orders.product_name AS product_name, orders.order_date AS order_date FROM users JOIN orders ON users.id = orders.user_id
I've already tried to put AS but it still not working and in my docker logs it shows skipping custom query, below is my query :
[
{
"tables": [
"Penguin",
"Chicken"
],
"query": "SELECT TOP 100 P.Id AS PenguinID, P.Name AS PenguinName, C.Id AS ChickenID FROM Penguin P INNER JOIN Chicken C ON C.Id = P.ChickenId WHERE C.IsDeleted ='0' and C.isPrivate ='0' and P.IsDeleted ='0'"
}
]
Thanks for validating that, @aisyaharifin .
We've validated that this is a bug, and have reproduced in our other SQL connectors too. I've filed us a bug issue:
and hope to see it fixed soon. Thank you very much for bringing this to our attention.
In the mean time, you may be able to work around this by renaming the column names for your primary keys in your tables, or creating a new table that is the result of your join, and syncing that new table. I apologize for the inconvenience, but thank you again for the report and your persistence.
I will discuss the potential workaround with my team.
I truly appreciate your prompt action and update on this matter, especially considering the lack of support due to it still being in the beta version.
To give you an overview, I have an application, let's call it myStaffPortal. This application functions akin to a company blog, containing multiple tables that store post databases. We aim to establish links between these tables specifically to join the table in order to do the Searching.
Once again, thank you so much for your assistance, waiting for your updates on this!
I would like to give you update on my workaround where I create a view table which is the result of the join.
However, its not pulling in any documents from the db and when I checked the docker logs it shows that there is no assign primary key.
Below is the error :
[FMWK][04:00:35][WARNING] [Connector id: *********, index name: search-******, Sync job id: *********] Skipping tables ****Post from database ********** since no primary key is associated with them. Assign primary key to the tables to index it in the next sync interval.
I'm not pro at db but as far as I know there is no primary key assign for view table.
Need help with this as I'm reaching out my deadline.
Is there any workaround on this? Does Elastic able to set the primary key or the db can do this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.