I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
On Wed, Sep 10, 2014 at 3:29 PM, James mail@employ.com wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
On Wed, Sep 10, 2014 at 3:29 PM, James <ma...@employ.com <javascript:>>
wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items
in my database that not only have an "active" flag but that also do not
have an "indexed" flag, that means I need to add them to the index. Then I
was going to add that item to the index. Since I am using taking this path,
it doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I'm setting up a system where I have a main SQL database which is
synced with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items
in my database that not only have an "active" flag but that also do not
have an "indexed" flag, that means I need to add them to the index. Then I
was going to add that item to the index. Since I am using taking this path,
it doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I've also put this question up on stackoverflow for anyone who might be
able to help me understand.
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I'm sorry but that doesn't answer my question. It's elasticsearch that is
Java. I need to sync elasticsearch with my SQL DB. I'm stuck between these
two scenarios:
Scenario 1:
PHP website adds data to the SQL DB
JDBC driver used by elasticsearch to grab values from SQL DB into index
Scenario 2:
PHP website adds data to SQL
CRON job uses PHP elasticsearch library to convert SQL to JSON and send it
to elasticsearch to be indexed.
How do you mean "PHP elasticsearch library can convert SQL to JSON"? How
can this be? It is only for Elasticsearch.
As a matter fact, there is no "JDBC driver used by elasticsearch", there is
a plugin elasticsearch-river-jdbc, a community effort - I assume you mean
this implementation?
What do you mean by "better" implementation, in regard to what requirements?
Jörg
On Thu, Sep 11, 2014 at 2:16 PM, James mail@employ.com wrote:
I'm sorry but that doesn't answer my question. It's elasticsearch that is
Java. I need to sync elasticsearch with my SQL DB. I'm stuck between these
two scenarios:
Scenario 1:
PHP website adds data to the SQL DB
JDBC driver used by elasticsearch to grab values from SQL DB into index
Scenario 2:
PHP website adds data to SQL
CRON job uses PHP elasticsearch library to convert SQL to JSON and send it
to elasticsearch to be indexed.
Thank you for the reply. Yes I meant the elasticsearch river. Simply put, I
want to syncronize the entries in my SQL database with my elasticsearch, so
I can use elasicsearch for searching and not doing fulltext search. I want
to know that when a new item gets added or removed from that database that
it also gets added / removed from elasicsearch.
My understand, which might be wrong, is I can either use the PHP
elasticsearch library to push updates (adds / removes) to elasticsearch
when new items are added to SQL:
Or I can use the river JDBC river plugin for elasticsearch to connect to my
database directly and syncronize elasticsearch with the SQL database.
My two questions are:
Is my understanding above correct
Does one option have advantages over the other
James
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
Synchronization of data is a very broad question. This is just because the
data organization in an RDBMS is very different from ES. You surely know
that. See also object-relational impedance mismatch
The JDBC river plugin allows you to define SQL statements so you can easily
construct JSON out if it, for indexing into ES.
If you can map identifiers from your RDBMS to JSON doc IDs and allocate the
_id field in the JDBC river plugin, you are lucky. In that case you can
just overwrite existing docs in ES to keep up with the most recent version.
Synchronization also includes modifications and deletions to avoid stale
docs, and transactional ACID properties. I have no general solution for
this. The best approach is to provide timewindowed indices and drop indices
that are too old, similar to what Logstash does.
Jörg
On Thu, Sep 11, 2014 at 3:39 PM, James mail@employ.com wrote:
Hi Jorg,
Thank you for the reply. Yes I meant the elasticsearch river. Simply put,
I want to syncronize the entries in my SQL database with my elasticsearch,
so I can use elasicsearch for searching and not doing fulltext search. I
want to know that when a new item gets added or removed from that database
that it also gets added / removed from elasicsearch.
My understand, which might be wrong, is I can either use the PHP
elasticsearch library to push updates (adds / removes) to elasticsearch
when new items are added to SQL:
Or I can use the river JDBC river plugin for elasticsearch to connect to
my database directly and syncronize elasticsearch with the SQL database.
My two questions are:
Is my understanding above correct
Does one option have advantages over the other
James
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
Thank you, that answers a lot of my questions. There is still the point of using the Php library for Elasticsearch, where I can send documents directly to Elasticsearch in JSON format without needing a JDBC driver. Is this not a good option?
Synchronization of data is a very broad question. This is just because the data organization in an RDBMS is very different from ES. You surely know that. See also object-relational impedance mismatch Object–relational impedance mismatch - Wikipedia
The JDBC river plugin allows you to define SQL statements so you can easily construct JSON out if it, for indexing into ES.
If you can map identifiers from your RDBMS to JSON doc IDs and allocate the _id field in the JDBC river plugin, you are lucky. In that case you can just overwrite existing docs in ES to keep up with the most recent version.
Synchronization also includes modifications and deletions to avoid stale docs, and transactional ACID properties. I have no general solution for this. The best approach is to provide timewindowed indices and drop indices that are too old, similar to what Logstash does.
Jörg
On Thu, Sep 11, 2014 at 3:39 PM, James mail@employ.com wrote:
Hi Jorg,
Thank you for the reply. Yes I meant the elasticsearch river. Simply put, I want to syncronize the entries in my SQL database with my elasticsearch, so I can use elasicsearch for searching and not doing fulltext search. I want to know that when a new item gets added or removed from that database that it also gets added / removed from elasicsearch.
Or I can use the river JDBC river plugin for elasticsearch to connect to my database directly and syncronize elasticsearch with the SQL database.
My two questions are:
Is my understanding above correct
Does one option have advantages over the other
James
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced with elasticsearch. My plan is to use the main PHP library for elasticsearch.
I was going to have a cron run every thirty minuets to check for items in my database that not only have an "active" flag but that also do not have an "indexed" flag, that means I need to add them to the index. Then I was going to add that item to the index. Since I am using taking this path, it doesn't seem like I need the JDBC driver, as I can add items to elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I do not know the PHP client in particular, but this is just another one of
the official Elasticsearch clients, like there are Elasticsearch clients
for other language, Perl, Python, Ruby, etc.
With an Elasticsearch client, you can use Elasticsearch, not an RDBMS
database.
Jörg
On Thu, Sep 11, 2014 at 6:57 PM, Employ mail@employ.com wrote:
Thank you, that answers a lot of my questions. There is still the point of
using the Php library for Elasticsearch, where I can send documents
directly to Elasticsearch in JSON format without needing a JDBC driver. Is
this not a good option?
Synchronization of data is a very broad question. This is just because the
data organization in an RDBMS is very different from ES. You surely know
that. See also object-relational impedance mismatch Object–relational impedance mismatch - Wikipedia
The JDBC river plugin allows you to define SQL statements so you can
easily construct JSON out if it, for indexing into ES.
If you can map identifiers from your RDBMS to JSON doc IDs and allocate
the _id field in the JDBC river plugin, you are lucky. In that case you can
just overwrite existing docs in ES to keep up with the most recent version.
Synchronization also includes modifications and deletions to avoid stale
docs, and transactional ACID properties. I have no general solution for
this. The best approach is to provide timewindowed indices and drop indices
that are too old, similar to what Logstash does.
Jörg
On Thu, Sep 11, 2014 at 3:39 PM, James mail@employ.com wrote:
Hi Jorg,
Thank you for the reply. Yes I meant the elasticsearch river. Simply put,
I want to syncronize the entries in my SQL database with my elasticsearch,
so I can use elasicsearch for searching and not doing fulltext search. I
want to know that when a new item gets added or removed from that database
that it also gets added / removed from elasicsearch.
My understand, which might be wrong, is I can either use the PHP
elasticsearch library to push updates (adds / removes) to elasticsearch
when new items are added to SQL:
Or I can use the river JDBC river plugin for elasticsearch to connect to
my database directly and syncronize elasticsearch with the SQL database.
My two questions are:
Is my understanding above correct
Does one option have advantages over the other
James
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items
in my database that not only have an "active" flag but that also do not
have an "indexed" flag, that means I need to add them to the index. Then I
was going to add that item to the index. Since I am using taking this path,
it doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I'm sorry I'm such this is clear to a lot if people but not to me. Can I not use the official Elasticsearch PHP client to add documents to elasticsearch? In which case my php website can use this library to grab data from the database, convert it to the right format and send it to elasticsearch to be indexed?
I do not know the PHP client in particular, but this is just another one of the official Elasticsearch clients, like there are Elasticsearch clients for other language, Perl, Python, Ruby, etc.
With an Elasticsearch client, you can use Elasticsearch, not an RDBMS database.
Jörg
On Thu, Sep 11, 2014 at 6:57 PM, Employ mail@employ.com wrote:
Thank you, that answers a lot of my questions. There is still the point of using the Php library for Elasticsearch, where I can send documents directly to Elasticsearch in JSON format without needing a JDBC driver. Is this not a good option?
Synchronization of data is a very broad question. This is just because the data organization in an RDBMS is very different from ES. You surely know that. See also object-relational impedance mismatch Object–relational impedance mismatch - Wikipedia
The JDBC river plugin allows you to define SQL statements so you can easily construct JSON out if it, for indexing into ES.
If you can map identifiers from your RDBMS to JSON doc IDs and allocate the _id field in the JDBC river plugin, you are lucky. In that case you can just overwrite existing docs in ES to keep up with the most recent version.
Synchronization also includes modifications and deletions to avoid stale docs, and transactional ACID properties. I have no general solution for this. The best approach is to provide timewindowed indices and drop indices that are too old, similar to what Logstash does.
Jörg
On Thu, Sep 11, 2014 at 3:39 PM, James mail@employ.com wrote:
Hi Jorg,
Thank you for the reply. Yes I meant the elasticsearch river. Simply put, I want to syncronize the entries in my SQL database with my elasticsearch, so I can use elasicsearch for searching and not doing fulltext search. I want to know that when a new item gets added or removed from that database that it also gets added / removed from elasicsearch.
Or I can use the river JDBC river plugin for elasticsearch to connect to my database directly and syncronize elasticsearch with the SQL database.
My two questions are:
Is my understanding above correct
Does one option have advantages over the other
James
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced with elasticsearch. My plan is to use the main PHP library for elasticsearch.
I was going to have a cron run every thirty minuets to check for items in my database that not only have an "active" flag but that also do not have an "indexed" flag, that means I need to add them to the index. Then I was going to add that item to the index. Since I am using taking this path, it doesn't seem like I need the JDBC driver, as I can add items to elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I do not know the PHP client in particular, but this is just another one of the official Elasticsearch clients, like there are Elasticsearch clients for other language, Perl, Python, Ruby, etc.
With an Elasticsearch client, you can use Elasticsearch, not an RDBMS database.
Jörg
On Thu, Sep 11, 2014 at 6:57 PM, Employ mail@employ.com wrote:
Thank you, that answers a lot of my questions. There is still the point of using the Php library for Elasticsearch, where I can send documents directly to Elasticsearch in JSON format without needing a JDBC driver. Is this not a good option?
Synchronization of data is a very broad question. This is just because the data organization in an RDBMS is very different from ES. You surely know that. See also object-relational impedance mismatch Object–relational impedance mismatch - Wikipedia
The JDBC river plugin allows you to define SQL statements so you can easily construct JSON out if it, for indexing into ES.
If you can map identifiers from your RDBMS to JSON doc IDs and allocate the _id field in the JDBC river plugin, you are lucky. In that case you can just overwrite existing docs in ES to keep up with the most recent version.
Synchronization also includes modifications and deletions to avoid stale docs, and transactional ACID properties. I have no general solution for this. The best approach is to provide timewindowed indices and drop indices that are too old, similar to what Logstash does.
Jörg
On Thu, Sep 11, 2014 at 3:39 PM, James mail@employ.com wrote:
Hi Jorg,
Thank you for the reply. Yes I meant the elasticsearch river. Simply put, I want to syncronize the entries in my SQL database with my elasticsearch, so I can use elasicsearch for searching and not doing fulltext search. I want to know that when a new item gets added or removed from that database that it also gets added / removed from elasicsearch.
Or I can use the river JDBC river plugin for elasticsearch to connect to my database directly and syncronize elasticsearch with the SQL database.
My two questions are:
Is my understanding above correct
Does one option have advantages over the other
James
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced with elasticsearch. My plan is to use the main PHP library for elasticsearch.
I was going to have a cron run every thirty minuets to check for items in my database that not only have an "active" flag but that also do not have an "indexed" flag, that means I need to add them to the index. Then I was going to add that item to the index. Since I am using taking this path, it doesn't seem like I need the JDBC driver, as I can add items to elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I want to close this issue but I still do not understand if I should be
pushing documents from my database using the PHP client or using the JDBC
river to pull them into elasticsearch from the SQL database.
They can both achieve the same thing, but what is the usecase which defines
when is the right time to use each implementation.
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
You can use either style, it is a matter of taste, or convenience.
With the JDBC plugin, you can also push data instead of pull.
Jörg
On Fri, Sep 12, 2014 at 12:11 PM, James mail@employ.com wrote:
I want to close this issue but I still do not understand if I should be
pushing documents from my database using the PHP client or using the JDBC
river to pull them into elasticsearch from the SQL database.
They can both achieve the same thing, but what is the usecase which
defines when is the right time to use each implementation.
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items in
my database that not only have an "active" flag but that also do not have
an "indexed" flag, that means I need to add them to the index. Then I was
going to add that item to the index. Since I am using taking this path, it
doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I would strongly prefer to maintain control of the indexing side and not in
Elasticsearch. In fact, the Elasticsearch team has talked about deprecating
river plugins. I do not have any numbers, but I would suspect that the
majority of users do not use a river plugin. And yes, the correct term is
the JDBC plugin, not driver. The wrong term confused many.
You can use either style, it is a matter of taste, or convenience.
With the JDBC plugin, you can also push data instead of pull.
Jörg
On Fri, Sep 12, 2014 at 12:11 PM, James mail@employ.com wrote:
I want to close this issue but I still do not understand if I should be
pushing documents from my database using the PHP client or using the JDBC
river to pull them into elasticsearch from the SQL database.
They can both achieve the same thing, but what is the usecase which
defines when is the right time to use each implementation.
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced
with elasticsearch. My plan is to use the main PHP library for
elasticsearch.
I was going to have a cron run every thirty minuets to check for items
in my database that not only have an "active" flag but that also do not
have an "indexed" flag, that means I need to add them to the index. Then I
was going to add that item to the index. Since I am using taking this path,
it doesn't seem like I need the JDBC driver, as I can add items to
elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
Ah that's really interesting, it's good to get some comparison. In that case you are saying to use the official php library for the document indexing?
James
On 12 Sep 2014, at 18:26, Ivan Brusic ivan@brusic.com wrote:
I would strongly prefer to maintain control of the indexing side and not in Elasticsearch. In fact, the Elasticsearch team has talked about deprecating river plugins. I do not have any numbers, but I would suspect that the majority of users do not use a river plugin. And yes, the correct term is the JDBC plugin, not driver. The wrong term confused many.
With the JDBC plugin, you can also push data instead of pull.
Jörg
On Fri, Sep 12, 2014 at 12:11 PM, James mail@employ.com wrote:
I want to close this issue but I still do not understand if I should be pushing documents from my database using the PHP client or using the JDBC river to pull them into elasticsearch from the SQL database.
They can both achieve the same thing, but what is the usecase which defines when is the right time to use each implementation.
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced with elasticsearch. My plan is to use the main PHP library for elasticsearch.
I was going to have a cron run every thirty minuets to check for items in my database that not only have an "active" flag but that also do not have an "indexed" flag, that means I need to add them to the index. Then I was going to add that item to the index. Since I am using taking this path, it doesn't seem like I need the JDBC driver, as I can add items to elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
I must admit I'm new to this so I find some of the information hard to understand. So sorry if I am asking stupid questions.
On 12 Sep 2014, at 18:26, Ivan Brusic ivan@brusic.com wrote:
I would strongly prefer to maintain control of the indexing side and not in Elasticsearch. In fact, the Elasticsearch team has talked about deprecating river plugins. I do not have any numbers, but I would suspect that the majority of users do not use a river plugin. And yes, the correct term is the JDBC plugin, not driver. The wrong term confused many.
With the JDBC plugin, you can also push data instead of pull.
Jörg
On Fri, Sep 12, 2014 at 12:11 PM, James mail@employ.com wrote:
I want to close this issue but I still do not understand if I should be pushing documents from my database using the PHP client or using the JDBC river to pull them into elasticsearch from the SQL database.
They can both achieve the same thing, but what is the usecase which defines when is the right time to use each implementation.
On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
Hi,
I'm setting up a system where I have a main SQL database which is synced with elasticsearch. My plan is to use the main PHP library for elasticsearch.
I was going to have a cron run every thirty minuets to check for items in my database that not only have an "active" flag but that also do not have an "indexed" flag, that means I need to add them to the index. Then I was going to add that item to the index. Since I am using taking this path, it doesn't seem like I need the JDBC driver, as I can add items to elasticsearch using the PHP library.
So, my question is, can I get away without using the JDBC driver?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.