Jdbc river strategy

Hi I am new to ES but my company is starting to use it

When I set up an river I have scheduled it to check for data changes at an
30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:

{
"_index" : "_river",
"_type" : "transactions_test",
"_id" : "_meta",
"_score" : 1,
"_source" : {
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"url" : "db server connect string",
"user" : "username",
"schedule" : "0 20/30 * * * ?",
"password" : "password",
"index" : "transactions_test",
"type" : "transaction_test",
"sql" : "SELECT * from my_transaction_table"
}
}
},

best regards
Erlendur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c27c8d2-2f00-4e18-98f9-28f2af828d9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Erlendur,
In your case, you should use the column strategy instead of simple one. The
column strategy requires two columns in the SQL DB.

  • cerated_at
  • update_at
    Cheers, Ramy

Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur Hákonarson:

Hi I am new to ES but my company is starting to use it

When I set up an river I have scheduled it to check for data changes at an
30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:

{
"_index" : "_river",
"_type" : "transactions_test",
"_id" : "_meta",
"_score" : 1,
"_source" : {
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"url" : "db server connect string",
"user" : "username",
"schedule" : "0 20/30 * * * ?",
"password" : "password",
"index" : "transactions_test",
"type" : "transaction_test",
"sql" : "SELECT * from my_transaction_table"
}
}
},

best regards
Erlendur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/151e2368-7e80-4f5d-a986-9e307f16046e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sorry...

  • created_at
  • updated_at

Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:

Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.

  • cerated_at
  • update_at
    Cheers, Ramy

Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur Hákonarson:

Hi I am new to ES but my company is starting to use it

When I set up an river I have scheduled it to check for data changes at
an 30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:

{
"_index" : "_river",
"_type" : "transactions_test",
"_id" : "_meta",
"_score" : 1,
"_source" : {
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"url" : "db server connect string",
"user" : "username",
"schedule" : "0 20/30 * * * ?",
"password" : "password",
"index" : "transactions_test",
"type" : "transaction_test",
"sql" : "SELECT * from my_transaction_table"
}
}
},

best regards
Erlendur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c8ad2917-3621-4d32-81b8-0197aa10b16e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Ramy

but how does that strategy work
is there any doc on strategies I can view?
the only one I found was on the jprante github wiki and that only describes
the simple strategy

and if I am using tables from a system that I have no control over and
those columns created_at and updated_at are not in those tables?
am I maybe misunderstanding this column strategy?

best regards
Erlendur

On Tuesday, November 25, 2014 11:00:18 AM UTC, Ramy wrote:

Sorry...

  • created_at
  • updated_at

Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:

Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.

  • cerated_at
  • update_at
    Cheers, Ramy

Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur Hákonarson:

Hi I am new to ES but my company is starting to use it

When I set up an river I have scheduled it to check for data changes at
an 30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:

{
"_index" : "_river",
"_type" : "transactions_test",
"_id" : "_meta",
"_score" : 1,
"_source" : {
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"url" : "db server connect string",
"user" : "username",
"schedule" : "0 20/30 * * * ?",
"password" : "password",
"index" : "transactions_test",
"type" : "transaction_test",
"sql" : "SELECT * from my_transaction_table"
}
}
},

best regards
Erlendur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d457915-2638-4934-ae0c-3f76bdcaf3e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

so far i understood, you need both columns in your database table.
Otherwise the river wont be able to do the checks. The river compares the
updated_at date with own date. If that date equal or in the future than the
one from the river, the river try to update/insert your record to
elasticsearch.

take a look to the code snippet:

{
"strategy": "column",
"type": "jdbc",
"jdbc": {
"url": "db server connect string",
"user": "username",
"schedule": "0 20/30 * * * ?",
"password": "password",
"index": "transactions_test",
"type": "transaction_test",
"sql": "SELECT * from my_transaction_table"
}
}

Am Dienstag, 25. November 2014 13:34:08 UTC+1 schrieb Erlendur Hákonarson:

Thanks Ramy

but how does that strategy work
is there any doc on strategies I can view?
the only one I found was on the jprante github wiki and that only
describes the simple strategy

and if I am using tables from a system that I have no control over and
those columns created_at and updated_at are not in those tables?
am I maybe misunderstanding this column strategy?

best regards
Erlendur

On Tuesday, November 25, 2014 11:00:18 AM UTC, Ramy wrote:

Sorry...

  • created_at
  • updated_at

Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:

Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.

  • cerated_at
  • update_at
    Cheers, Ramy

Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur
Hákonarson:

Hi I am new to ES but my company is starting to use it

When I set up an river I have scheduled it to check for data changes at
an 30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:

{
"_index" : "_river",
"_type" : "transactions_test",
"_id" : "_meta",
"_score" : 1,
"_source" : {
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"url" : "db server connect string",
"user" : "username",
"schedule" : "0 20/30 * * * ?",
"password" : "password",
"index" : "transactions_test",
"type" : "transaction_test",
"sql" : "SELECT * from my_transaction_table"
}
}
},

best regards
Erlendur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e681d42-b3bd-4e76-9254-712ada901a91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

maybe this link will helps you:

and this code snippet:
{
"strategy": "column",
"type": "jdbc",
"jdbc": {
"url": "db server connect string",
"user": "username",
"schedule": "0 20/30 * * * ?",
"password": "password",
"index": "transactions_test",
"type": "transaction_test",
"sql": "SELECT * from my_transaction_table"
}
}

Am Dienstag, 25. November 2014 13:34:08 UTC+1 schrieb Erlendur Hákonarson:

Thanks Ramy

but how does that strategy work
is there any doc on strategies I can view?
the only one I found was on the jprante github wiki and that only
describes the simple strategy

and if I am using tables from a system that I have no control over and
those columns created_at and updated_at are not in those tables?
am I maybe misunderstanding this column strategy?

best regards
Erlendur

On Tuesday, November 25, 2014 11:00:18 AM UTC, Ramy wrote:

Sorry...

  • created_at
  • updated_at

Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:

Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.

  • cerated_at
  • update_at
    Cheers, Ramy

Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur
Hákonarson:

Hi I am new to ES but my company is starting to use it

When I set up an river I have scheduled it to check for data changes at
an 30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:

{
"_index" : "_river",
"_type" : "transactions_test",
"_id" : "_meta",
"_score" : 1,
"_source" : {
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"url" : "db server connect string",
"user" : "username",
"schedule" : "0 20/30 * * * ?",
"password" : "password",
"index" : "transactions_test",
"type" : "transaction_test",
"sql" : "SELECT * from my_transaction_table"
}
}
},

best regards
Erlendur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a543816c-28eb-43d8-ae36-694156b17833%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks again Ramy
This was very helpful

On Tuesday, November 25, 2014 1:40:56 PM UTC, Ramy wrote:

maybe this link will helps you:
https://github.com/jprante/elasticsearch-river-jdbc/pull/137

and this code snippet:
{
"strategy": "column",
"type": "jdbc",
"jdbc": {
"url": "db server connect string",
"user": "username",
"schedule": "0 20/30 * * * ?",
"password": "password",
"index": "transactions_test",
"type": "transaction_test",
"sql": "SELECT * from my_transaction_table"
}
}

Am Dienstag, 25. November 2014 13:34:08 UTC+1 schrieb Erlendur Hákonarson:

Thanks Ramy

but how does that strategy work
is there any doc on strategies I can view?
the only one I found was on the jprante github wiki and that only
describes the simple strategy

and if I am using tables from a system that I have no control over and
those columns created_at and updated_at are not in those tables?
am I maybe misunderstanding this column strategy?

best regards
Erlendur

On Tuesday, November 25, 2014 11:00:18 AM UTC, Ramy wrote:

Sorry...

  • created_at
  • updated_at

Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:

Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.

  • cerated_at
  • update_at
    Cheers, Ramy

Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur
Hákonarson:

Hi I am new to ES but my company is starting to use it

When I set up an river I have scheduled it to check for data changes
at an 30 min interval, my largest index on dev includes 230k documents but
in production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:

{
"_index" : "_river",
"_type" : "transactions_test",
"_id" : "_meta",
"_score" : 1,
"_source" : {
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"url" : "db server connect string",
"user" : "username",
"schedule" : "0 20/30 * * * ?",
"password" : "password",
"index" : "transactions_test",
"type" : "transaction_test",
"sql" : "SELECT * from my_transaction_table"
}
}
},

best regards
Erlendur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0226052-b46a-4b7a-b8d1-a8b65c356ed3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.