It is up to the SQL statement to control the rows that are fetched when the
JDBC river restarts.
Note that rivers are deprecated. One of the reason because rivers are
obsoleted is the undefined state if a node restarts. JDBC river simply
re-runs the SQL statement.
Use the JDBC plugin in standalone feeder mode for better control what data
is indexed, e.g. one initial run to index all data, and a cron job for
incremental data.
I can't look at the feeder setup now but I could in the future.
Is my SQL statement incorrect?
Should I be doing something differently?
Does the river not utilize created_at and updated_at in this setup? I
don't have a where clause because I thought using the column strategy it
would take that in to account.
This is an example of what I see in SQL server:
SELECT id as _id, * FROM [MyDBName].[dbo].[MyTableName] WHERE ({fn
TIMESTAMPDIFF(SQL_TSI_SECOND,@P0,"createddate")} >= 0)
Which when i populate the @P0 with a timestamp it seems to be working fine.
On a restart I'm guessing it doesn't know when to start.
Any way that I can check values in elasticsearch within the column
strategy? Such as using Max(CreatedDate) so that it can start there?
I can't look at the feeder setup now but I could in the future.
Is my SQL statement incorrect?
Should I be doing something differently?
Does the river not utilize created_at and updated_at in this setup? I
don't have a where clause because I thought using the column strategy it
would take that in to account.
This is an example of what I see in SQL server:
SELECT id as _id, * FROM [MyDBName].[dbo].[MyTableName] WHERE ({fn
TIMESTAMPDIFF(SQL_TSI_SECOND,@P0,"createddate")} >= 0)
Which when i populate the @P0 with a timestamp it seems to be working fine.
On a restart I'm guessing it doesn't know when to start.
Any way that I can check values in elasticsearch within the column
strategy? Such as using Max(CreatedDate) so that it can start there?
I fixed this by using Stored Procedures. I store the last run dates in my
SQL database and group them by month so as not to overload my under powered
VM's.
So far so good this approach works much better than the column strategy.
On Monday, April 20, 2015 at 10:59:31 AM UTC-4, Jörg Prante wrote:
The column strategy is a community effort, it can manipulate SQL statement
where clauses with timestamp filter.
I do not have enough knowledge about column strategy.
You are correct, at node restart, a river does not know from where to
restart. There is no method to resolve this within river logic.
Jörg
On Mon, Apr 20, 2015 at 2:11 PM, GWired <garrett...@gmail.com
<javascript:>> wrote:
I can't look at the feeder setup now but I could in the future.
Is my SQL statement incorrect?
Should I be doing something differently?
Does the river not utilize created_at and updated_at in this setup? I
don't have a where clause because I thought using the column strategy it
would take that in to account.
This is an example of what I see in SQL server:
SELECT id as _id, * FROM [MyDBName].[dbo].[MyTableName] WHERE ({fn
TIMESTAMPDIFF(SQL_TSI_SECOND,@P0,"createddate")} >= 0)
Which when i populate the @P0 with a timestamp it seems to be working
fine.
On a restart I'm guessing it doesn't know when to start.
Any way that I can check values in elasticsearch within the column
strategy? Such as using Max(CreatedDate) so that it can start there?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.