Are there any plans to support incremental syncs for Postgres and MySQL database connectors?
Hi @andrej.peplinski, good question.
Both currently support the "naive" incremental sync, where we avoid sending data to Elasticsearch that we're sure hasn't changed since it was last ingested. What makes this naive though is that we still fetch this data from the source system.
We don't currently have a roadmap item to be smarter about how we fetch data from Postgres or MySQL for incremental syncs. But if you have a support relationship with Elastic, you can absolutely ask your contact to file an Enhancement Request on your behalf. If that's not an option for you, I'd be happy to put you in contact with one of our product managers, if you'd like to make a case for adding that feature.
Alternatively, our code is open, and we very much appreciate community pull requests.
Thanks for the quick response @Sean_Story. I will need to evaluate on how performance critical incremental syncs are for us and then potentially come back to you.
But I have one more follow-up question: What is the criteria for being "sure" that the data hasn't changed? Are the time stamps being used (as mentioned here). And if so, what is the expected time stamp field name?
For our database connectors, this is currently very unsophisticated, and we're using the table's last change date (or if it's a join, the most recent change time of any of the joined tables). So if anything has changed in the table, we're pulling all of it. See this code.
I do imagine that if we implemented a non-naive incremental sync feature for our database connectors, we'd require that you specify a timestamp field that indicated when the row was last changed.
You also may be interested to read: Elastic Connectors: Performance impact of incremental syncs — Search Labs