I have a query to ask you specifically regarding the Search Connectors for Microsoft SQL.
I have successfully connected to the Microsoft SQL database, and there are multiple tables that I would like to query by joining them.
The question is :
How can we perform a join query to connect multiple tables?
While I've looked into Sync rules, it seems that this option is primarily designed for filtering, as per my understanding. I previously inquired with support, and they suggested that we might consider using Logstash JDBC. I'm seeking other options on how we can effectively execute a join query to combine multiple tables.
My second question relates to content syncing, and I'd like to clarify my understanding regarding two types of content synchronization: Full Syncs and Incremental Syncs.
To confirm, Incremental Sync is exclusively available for the SharePoint Connector, while Full Sync is applicable to all connectors. My understanding of Full Sync is that it deletes the previously ingested data and replaces it with all the new data, ensuring that there are no issues with data duplication. Can you please confirm if this interpretation is correct?
Thank you so much and have a nice day.
Hi @aisyaharifin , thanks for the good questions!
How can we perform a join query to connect multiple tables?
Today, there's not a great way to do this. I would have said "with views," but we've had a recent bug report of Views not working with our MySQL connector, and I haven't personally tried MSSQL views with tour connector. I suggest giving that a try, and let us know how it works.
If that won't work, I'm happy to share that Elastic 8.11 (due to be released in the next week or so) will ship a version of our MSSQL connector that has support for "Advanced Sync Rules". While you're correct that Basic Sync Rules are mainly for filtering, "Advanced Sync Rules" allow you to do remote filtering, which for SQL translates into giving you precise control over the query executed against your database. You can read the documentation for this feature here, but reminder that this is not released as of writing.
To confirm, Incremental Sync is exclusively available for the SharePoint Connector, while Full Sync is applicable to all connectors.
This is correct. We hope to offer support for Incremental Syncs for other data sources in the future, but today (<=8.11) this is only available in our Sharepoint Online Connector.
My understanding of Full Sync is that it deletes the previously ingested data and replaces it with all the new data, ensuring that there are no issues with data duplication
Not quite. Full Syncs do a full scan of data in the 3rd-party data source. The first Full Sync will take your index from empty to fully-populated. Full syncs after the first will:
- Fetch all the IDs from the current Elasticsearch index
- Do a full scan of data in the 3rd-party data source
- Issue updates/creates to Elasticsearch as necessary
- Issue deletes for any IDs not found in the full data source scan that are currently present in the Elasticsearch index.
This sequence is designed to ensure that the index can go through a Full Sync without having to require any down time for a search application that uses it as a backend.
Hello @Sean_Story , I want to ask your confimation, as the 8.11 already came out and my team already upgraded our Elastic, I've been trying around with the SQL Query.
I want to understand the mechanism where as for example, I had 9 different queries that I need to apply.
If I applied all the 9 queries, and then the connectors will sync the data based on the query to link up the multiple tables.
Hence, when the user did the searching, the searching will match with all the queries is it ? 1 search = 9 sql queries
Or will the 1 search = 1 queries only, it will not process all the other queries?
Is there limitation with the sql queries?
I'm worried there's misunderstanding here. The connector, at a high level, is responsible for getting data from one source (in your case, MySQL) to an Elasticsearch index.
The Advanced Sync Rules allow you to be very precise about what data is ingested from MySQL to the Elasticsearch index. For example, if you had a table with auto-increment IDs, not using Advanced Sync Rules would get all the documents and put them in Elasticsearch. But using Advanced Sync Rules like:
"query": "SELECT * FROM table1 where id==1"
"query": "SELECT * FROM table1 where id==3"
"query": "SELECT * FROM table1 where id==5"
Would index just the documents with IDs
When it comes to search time, through Elasticsearch, it all depends on what your query is. But if you did a
match_all query, you will get all documents that were indexed, regardless of which SQL query they were indexed with. So in this latest example, you'd get all three results of
5 in your result set.
When the user searches in Elasticsearch, it is not executing any on-demand searches/queries through the connector. It only searches the data that is in Elasticsearch.
Hello @Sean_Story thank you for you reply, but could you explain on what do you mean by if I did a match_all query? Is there any configuration I need to do to set it to match all the query?
Hello @Sean_Story I have one more question that I would like to confirm on, does the Advanced Sync Rules query is case-sensitive?
Such as, the query of table name case should reflect with those in Microsoft SQL?
If the table name in Microsoft SQL > 'ProductID' , hence the query should wrote 'ProductID' or can it be written as 'productid' also? Just need clarification on that.
This is a specific type of Elasticsearch query. You can read about it here: Match all query | Elasticsearch Guide [8.11] | Elastic
It will depend on your host and its settings. The queries are just being passed directly through a python client, the connector does not enforce any casing restrictions of its own. You should be able to copy the queries as strings from whatever database tool you use to craft/test queries.