How to use river jdbc load data from many mysql shards into one es type?

fzhou · December 17, 2013, 6:43am

Hi,

I setup a 10 nodes es cluster, 2 master non-data nodes, 8 data nodes, es
0.90.7, want to load data from sharded mysql database cluster. Data in
mysql has same table structure in different shards, I want to load them
into one index type in es cluster, like /test2/SMSMT

I am not sure whether my solution is correct like

From ng30 load into SMSMT

curl -XPUT 'localhost:9200/_river/SMSMT30/_meta' -d '
{
"type" : "jdbc",
"jdbc" : {
"strategy" : "oneshot",
"poll" : "1m",
"max_retries" : 5,
"fetchsize" : 100,
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://172.16.2.95:3306/ng30",
“user" : "aa",
"autocommit" : true,
“password" : "aa123",
"sql" : "select ID as
_id,BusinessId,MessageSent,MessageTo,Type,MessageFrom,CallBack,Protocol,CampaignId,CustomerId,Industry,Subject,MessageText,FormattedMessageText,AppointmentId,ScheduledDeliveryTime,ActualDeliveryTime,ReadTime,ConfirmedTime,LatestDeliveryTime,MessageId,MessageStatusDescription,MessageStatus,Status,CreatedDate,CreatedUserID,LastModifiedDate,LastModifiedUserID,ScheduledTime,VisitId,BouncedTime,AssetId,DeliveryTime,ScompTime,ScompText,BouncedText,StructuredMessageText,ReferralID,AssetScheduleId,LastVisitTime,AppointmentHotlistId,EntityID
from SMSMT"
},
"index" : {
"index" : "test2",
"type" : "SMSMT",
"fetchsize" : 30000,
"bulk_size": 30000,
"max_bulk_requests": 40
}
}'

From ng19 load into SMSMT

mysql> select max(id),count(1) from ng30.SMSMT;
+-----------+----------+
| max(id) | count(1) |
+-----------+----------+
| 541008901 | 22317315 |
+-----------+----------+
1 row in set (1 min 9.71 sec)

curl -XPUT 'localhost:9200/_river/SMSMT19/_meta' -d '
{
"type" : "jdbc",
"jdbc" : {
"strategy" : "oneshot",
"poll" : "1m",
"max_retries" : 5,
"fetchsize" : 100,
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://172.16.2.95:3306/ng19",
“user" : "aa",
"autocommit" : true,
“password" : "aa123",
"sql" : "select ID+ 541009000 as
_id,BusinessId,MessageSent,MessageTo,Type,MessageFrom,CallBack,Protocol,CampaignId,CustomerId,Industry,Subject,MessageText,FormattedMessageText,AppointmentId,ScheduledDeliveryTime,ActualDeliveryTime,ReadTime,ConfirmedTime,LatestDeliveryTime,MessageId,MessageStatusDescription,MessageStatus,Status,CreatedDate,CreatedUserID,LastModifiedDate,LastModifiedUserID,ScheduledTime,VisitId,BouncedTime,AssetId,DeliveryTime,ScompTime,ScompText,BouncedText,StructuredMessageText,ReferralID,AssetScheduleId,LastVisitTime,AppointmentHotlistId,EntityID
from SMSMT"
},
"index" : {
"index" : "test2",
"type" : "SMSMT",
"fetchsize" : 30000,
"bulk_size": 30000,
"max_bulk_requests": 40
}
}'

From ng6 load into SMSMT

mysql> select max(id)+541009000,count(1) from ng19.SMSMT;
+-------------------+----------+
| max(id)+541009000 | count(1) |
+-------------------+----------+
| 945025141 | 7771470 |
+-------------------+----------+

curl -XPUT 'localhost:9200/_river/SMSMT6/_meta' -d '
{
"type" : "jdbc",
"jdbc" : {
"strategy" : "oneshot",
"poll" : "1m",
"max_retries" : 5,
"fetchsize" : 100,
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://172.16.2.95:3306/ng6",
“user" : "aa",
"autocommit" : true,
“password" : "aa123",
"sql" : "select ID+94502600 as
_id,BusinessId,MessageSent,MessageTo,Type,MessageFrom,CallBack,Protocol,CampaignId,CustomerId,Industry,Subject,MessageText,FormattedMessageText,AppointmentId,ScheduledDeliveryTime,ActualDeliveryTime,ReadTime,ConfirmedTime,LatestDeliveryTime,MessageId,MessageStatusDescription,MessageStatus,Status,CreatedDate,CreatedUserID,LastModifiedDate,LastModifiedUserID,ScheduledTime,VisitId,BouncedTime,AssetId,DeliveryTime,ScompTime,ScompText,BouncedText,StructuredMessageText,ReferralID,AssetScheduleId,LastVisitTime,AppointmentHotlistId,EntityID
from SMSMT"
},
"index" : {
"index" : "test2",
"type" : "SMSMT",
"fetchsize" : 30000,
"bulk_size": 30000,
"max_bulk_requests": 40
}
}'

From ng35 load into SMSMT

mysql> select max(id)+94502600,count(1) from ng6.SMSMT;
+------------------+----------+
| max(id)+94502600 | count(1) |
+------------------+----------+
| 1597134881 | 10162439 |
+------------------+----------+
1 row in set (1 min 37.13 sec)

curl -XPUT 'localhost:9200/_river/SMSMT35/_meta' -d '
{
"type" : "jdbc",
"jdbc" : {
"strategy" : "oneshot",
"poll" : "1m",
"max_retries" : 5,
"fetchsize" : 100,
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://172.16.2.95:3306/ng35",
“user" : "aa",
"autocommit" : true,
“password" : "aa123",
"sql" : "select ID+1597134900 as
_id,BusinessId,MessageSent,MessageTo,Type,MessageFrom,CallBack,Protocol,CampaignId,CustomerId,Industry,Subject,MessageText,FormattedMessageText,AppointmentId,ScheduledDeliveryTime,ActualDeliveryTime,ReadTime,ConfirmedTime,LatestDeliveryTime,MessageId,MessageStatusDescription,MessageStatus,Status,CreatedDate,CreatedUserID,LastModifiedDate,LastModifiedUserID,ScheduledTime,VisitId,BouncedTime,AssetId,DeliveryTime,ScompTime,ScompText,BouncedText,StructuredMessageText,ReferralID,AssetScheduleId,LastVisitTime,AppointmentHotlistId,EntityID
from SMSMT"
},
"index" : {
"index" : "test2",
"type" : "SMSMT",
"fetchsize" : 30000,
"bulk_size": 30000,
"max_bulk_requests": 40
}
}'

After I issued above commands, I did not see the data added into
/test2/SMSMT, but I did see /jdbc index got created, and 4 sessions coming
into mysql from es cluster, they are running above sql queries and sending
data.

Can you help me review above work or guide me the correct way to use river
to load large data in parallel?

Thank you
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fa6f9dd-36e1-459a-b7b1-97ac5b13c5c5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 18, 2013, 1:25am

If you use JDBC river 2.3.0, then you must change your river creation
parameter structure. Do not use the "index" subsection any longer. It is
now merged with the "jdbc" subsection.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGMWMP2kOZgPajKdOxfW7aSW7g48Zpv%2BBPcKSqoUhTvWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.