Hi I am new to ES but my company is starting to use it
When I set up an river I have scheduled it to check for data changes at an
30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:
Hi Erlendur,
In your case, you should use the column strategy instead of simple one. The
column strategy requires two columns in the SQL DB.
cerated_at
update_at
Cheers, Ramy
Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur Hákonarson:
Hi I am new to ES but my company is starting to use it
When I set up an river I have scheduled it to check for data changes at an
30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:
Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:
Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.
cerated_at
update_at
Cheers, Ramy
Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur Hákonarson:
Hi I am new to ES but my company is starting to use it
When I set up an river I have scheduled it to check for data changes at
an 30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:
but how does that strategy work
is there any doc on strategies I can view?
the only one I found was on the jprante github wiki and that only describes
the simple strategy
and if I am using tables from a system that I have no control over and
those columns created_at and updated_at are not in those tables?
am I maybe misunderstanding this column strategy?
best regards
Erlendur
On Tuesday, November 25, 2014 11:00:18 AM UTC, Ramy wrote:
Sorry...
created_at
updated_at
Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:
Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.
cerated_at
update_at
Cheers, Ramy
Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur Hákonarson:
Hi I am new to ES but my company is starting to use it
When I set up an river I have scheduled it to check for data changes at
an 30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:
so far i understood, you need both columns in your database table.
Otherwise the river wont be able to do the checks. The river compares the
updated_at date with own date. If that date equal or in the future than the
one from the river, the river try to update/insert your record to
elasticsearch.
Am Dienstag, 25. November 2014 13:34:08 UTC+1 schrieb Erlendur Hákonarson:
Thanks Ramy
but how does that strategy work
is there any doc on strategies I can view?
the only one I found was on the jprante github wiki and that only
describes the simple strategy
and if I am using tables from a system that I have no control over and
those columns created_at and updated_at are not in those tables?
am I maybe misunderstanding this column strategy?
best regards
Erlendur
On Tuesday, November 25, 2014 11:00:18 AM UTC, Ramy wrote:
Sorry...
created_at
updated_at
Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:
Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.
cerated_at
update_at
Cheers, Ramy
Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur
Hákonarson:
Hi I am new to ES but my company is starting to use it
When I set up an river I have scheduled it to check for data changes at
an 30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:
and this code snippet:
{ "strategy": "column",
"type": "jdbc",
"jdbc": {
"url": "db server connect string",
"user": "username",
"schedule": "0 20/30 * * * ?",
"password": "password",
"index": "transactions_test",
"type": "transaction_test",
"sql": "SELECT * from my_transaction_table"
}
}
Am Dienstag, 25. November 2014 13:34:08 UTC+1 schrieb Erlendur Hákonarson:
Thanks Ramy
but how does that strategy work
is there any doc on strategies I can view?
the only one I found was on the jprante github wiki and that only
describes the simple strategy
and if I am using tables from a system that I have no control over and
those columns created_at and updated_at are not in those tables?
am I maybe misunderstanding this column strategy?
best regards
Erlendur
On Tuesday, November 25, 2014 11:00:18 AM UTC, Ramy wrote:
Sorry...
created_at
updated_at
Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:
Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.
cerated_at
update_at
Cheers, Ramy
Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur
Hákonarson:
Hi I am new to ES but my company is starting to use it
When I set up an river I have scheduled it to check for data changes at
an 30 min interval, my largest index on dev includes 230k documents but in
production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:
and this code snippet:
{ "strategy": "column",
"type": "jdbc",
"jdbc": {
"url": "db server connect string",
"user": "username",
"schedule": "0 20/30 * * * ?",
"password": "password",
"index": "transactions_test",
"type": "transaction_test",
"sql": "SELECT * from my_transaction_table"
}
}
Am Dienstag, 25. November 2014 13:34:08 UTC+1 schrieb Erlendur Hákonarson:
Thanks Ramy
but how does that strategy work
is there any doc on strategies I can view?
the only one I found was on the jprante github wiki and that only
describes the simple strategy
and if I am using tables from a system that I have no control over and
those columns created_at and updated_at are not in those tables?
am I maybe misunderstanding this column strategy?
best regards
Erlendur
On Tuesday, November 25, 2014 11:00:18 AM UTC, Ramy wrote:
Sorry...
created_at
updated_at
Am Dienstag, 25. November 2014 11:55:18 UTC+1 schrieb Ramy:
Hi Erlendur,
In your case, you should use the column strategy instead of simple one.
The column strategy requires two columns in the SQL DB.
cerated_at
update_at
Cheers, Ramy
Am Dienstag, 25. November 2014 11:04:17 UTC+1 schrieb Erlendur
Hákonarson:
Hi I am new to ES but my company is starting to use it
When I set up an river I have scheduled it to check for data changes
at an 30 min interval, my largest index on dev includes 230k documents but
in production is expected to grow to 300million docs
this 230k index is a heavy load on the server when it checks for data
changes, puts the core in 100% for approx. 5 minutes.
It looks like it is reindexing the index every time, I am using simple
strategy, can someone show me where I can find documentation on the
different strategies?
here is a sample of my river statement:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.