JDBC River && _parent

Hi,

I'm currently trying to understand how to use JDBC River. I just got some
good results this afternoon. :slight_smile:

Now I have a few questions regarding the river :

  • if I have 2 tables with 1 for parent items & two which are more child
    items, should I have 3 different rivers ?
  • how to set the _parent filed ? Is it the id of the parent object ? In ES
    ? And then, how to make this coherent with preceding question ?

Thanks.

Kind regards,
Yann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The JDBC river uses a "pseudo column" method fpr prante/child aware
indexing. Just like you can state a _parent attribute in the bulk API,
you can name a column '_parent' in the JDBC source to use this attribute.

It does not matter in what order you index parent/children docs as long
the parent exists (I admit I'm not even sure about this), so, you should
create the parent docs first. You can create a parent run with the
default "oneshot" river, and then, add another river for the children,
or even more. It's as simple as that.

Yes, the _parent is the _id attribute of the parent document.

Best regards,

Jörg

Am 18.02.13 18:55, schrieb Yann Barraud:

Hi,

I'm currently trying to understand how to use JDBC River. I just got
some good results this afternoon. :slight_smile:

Now I have a few questions regarding the river :

  • if I have 2 tables with 1 for parent items & two which are more
    child items, should I have 3 different rivers ?
  • how to set the _parent filed ? Is it the id of the parent object ?
    In ES ? And then, how to make this coherent with preceding question ?

Thanks.

Kind regards,
Yann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

Thanks for the answer. It seems in fact really simple. Nevertheless, if I
index a living DB using the "simple" strategy, I might face problems,
indexing child before parent, no ?

Cordialement,
Yann Barraud

2013/2/18 Jörg Prante joergprante@gmail.com

The JDBC river uses a "pseudo column" method fpr prante/child aware
indexing. Just like you can state a _parent attribute in the bulk API, you
can name a column '_parent' in the JDBC source to use this attribute.

It does not matter in what order you index parent/children docs as long
the parent exists (I admit I'm not even sure about this), so, you should
create the parent docs first. You can create a parent run with the default
"oneshot" river, and then, add another river for the children, or even
more. It's as simple as that.

Yes, the _parent is the _id attribute of the parent document.

Best regards,

Jörg

Am 18.02.13 18:55, schrieb Yann Barraud:

Hi,

I'm currently trying to understand how to use JDBC River. I just got some
good results this afternoon. :slight_smile:

Now I have a few questions regarding the river :

  • if I have 2 tables with 1 for parent items & two which are more child
    items, should I have 3 different rivers ?
  • how to set the _parent filed ? Is it the id of the parent object ? In
    ES ? And then, how to make this coherent with preceding question ?

Thanks.

Kind regards,
Yann

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You can create children docs before parent docs. It's not the river.
It's an ES feature.

See https://gist.github.com/jprante/4984178 for a simple demonstration.

Cordialement,

Jörg

Am 19.02.13 09:11, schrieb Yann Barraud:

Hi Jörg,

Thanks for the answer. It seems in fact really simple. Nevertheless,
if I index a living DB using the "simple" strategy, I might face
problems, indexing child before parent, no ?

Cordialement,
Yann Barraud

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Great ! I definitely love ES !!

Le mardi 19 février 2013 10:01:01 UTC+1, Jörg Prante a écrit :

You can create children docs before parent docs. It's not the river.
It's an ES feature.

See https://gist.github.com/jprante/4984178 for a simple demonstration.

Cordialement,

Jörg

Am 19.02.13 09:11, schrieb Yann Barraud:

Hi Jörg,

Thanks for the answer. It seems in fact really simple. Nevertheless,
if I index a living DB using the "simple" strategy, I might face
problems, indexing child before parent, no ?

Cordialement,
Yann Barraud

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

I am having trouble creating a parent-child relation between two tables
using JDBC river. I have two tables "person" and "work".

Create table person (
person_id numeric,
person_name varchar2(100)
)

create table work (
work_id numeric,
person_id numeric,
work_name varchar2(500),
genre varchar2(100),
publisher varchar2(500)
)

I create a person index using this:

curl -XPUT 'localhost:9200/_river/person/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"driver" : "oracle.jdbc.OracleDriver",
"url" :
"jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=10000)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=TPAI.WORLD)))",
"user" : "myuser",
"password" : "test123",
"sql" : "select person_id as "_id", person_name from person",
"poll" : "100m"
},
"index" : {
"index" : "jdbcriver_person",
"type" : "jdbc"
}
}'

I specify the mapping using this:

curl -XPOST 'localhost:9200/_river/work/_mapping' -d '{
"work":{
"_parent": {"type": "person"}
}
}'

But when I run the below command, I get a RoutingMissingException[routing
is required for [_river]/[work]/[_meta].

curl -XPUT 'localhost:9200/_river/work/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"driver" : "oracle.jdbc.OracleDriver",
"url" :
"jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=10000)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=TPAI.WORLD)))",
"user" : "myuser",
"password" : "test123",
"sql" : "select work_id as "_id", person_id as "_parent",
work_name, genre, publisher from work",
"poll" : "100m"
},
"index" : {
"index" : "jdbcriver_work",
"type" : "jdbc"
}
}'

Can you tell me if I am missing something.

On Monday, 18 February 2013 13:25:03 UTC-5, Jörg Prante wrote:

The JDBC river uses a "pseudo column" method fpr prante/child aware
indexing. Just like you can state a _parent attribute in the bulk API,
you can name a column '_parent' in the JDBC source to use this attribute.

It does not matter in what order you index parent/children docs as long
the parent exists (I admit I'm not even sure about this), so, you should
create the parent docs first. You can create a parent run with the
default "oneshot" river, and then, add another river for the children,
or even more. It's as simple as that.

Yes, the _parent is the _id attribute of the parent document.

Best regards,

Jörg

Am 18.02.13 18:55, schrieb Yann Barraud:

Hi,

I'm currently trying to understand how to use JDBC River. I just got
some good results this afternoon. :slight_smile:

Now I have a few questions regarding the river :

  • if I have 2 tables with 1 for parent items & two which are more
    child items, should I have 3 different rivers ?
  • how to set the _parent filed ? Is it the id of the parent object ?
    In ES ? And then, how to make this coherent with preceding question ?

Thanks.

Kind regards,
Yann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

I am having trouble creating a parent-child relation between two tables
using JDBC river. I have two tables "person" and "work".

Create table person (
person_id numeric,
person_name varchar2(100)
)

create table work (
work_id numeric,
person_id numeric,
work_name varchar2(500),
genre varchar2(100),
publisher varchar2(500)
)

I create a person index using this:

curl -XPUT 'localhost:9200/_river/person/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"driver" : "oracle.jdbc.OracleDriver",
"url" :
"jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=10000)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=MYDB)))",
"user" : "myuser",
"password" : "test123",
"sql" : "select person_id as "_id", person_name from person",
"poll" : "100m"
},
"index" : {
"index" : "jdbcriver_person",
"type" : "jdbc"
}
}'

I specify the mapping using this:

curl -XPOST 'localhost:9200/_river/work/_mapping' -d '{
"work":{
"_parent": {"type": "person"}
}
}'

But when I run this command, I get a RoutingMissingException[routing is
required for [_river]/[work]/[_meta].
curl -XPUT 'localhost:9200/_river/work/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"driver" : "oracle.jdbc.OracleDriver",
"url" :
"jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=10000)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=MYDB)))",
"user" : "myuser",
"password" : "test123",
"sql" : "select work_id as "_id", person_id as "_parent",
work_name, genre, publisher from work",
"poll" : "100m"
},
"index" : {
"index" : "jdbcriver_work",
"type" : "jdbc"
}
}'

Can you tell me if I am missing something.

On Monday, 18 February 2013 13:25:03 UTC-5, Jörg Prante wrote:

The JDBC river uses a "pseudo column" method fpr prante/child aware
indexing. Just like you can state a _parent attribute in the bulk API,
you can name a column '_parent' in the JDBC source to use this attribute.

It does not matter in what order you index parent/children docs as long
the parent exists (I admit I'm not even sure about this), so, you should
create the parent docs first. You can create a parent run with the
default "oneshot" river, and then, add another river for the children,
or even more. It's as simple as that.

Yes, the _parent is the _id attribute of the parent document.

Best regards,

Jörg

Am 18.02.13 18:55, schrieb Yann Barraud:

Hi,

I'm currently trying to understand how to use JDBC River. I just got
some good results this afternoon. :slight_smile:

Now I have a few questions regarding the river :

  • if I have 2 tables with 1 for parent items & two which are more
    child items, should I have 3 different rivers ?
  • how to set the _parent filed ? Is it the id of the parent object ?
    In ES ? And then, how to make this coherent with preceding question ?

Thanks.

Kind regards,
Yann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Your data for a child document is not enough to fulfil the routing (the
parent id in that case).

Can you please gist the stacktrace of the exception? And some example
data so I can better reproduce?

Thanks,

Jörg

Am 16.05.13 20:13, schrieb A Daniel:

But when I run the below command, I get a
RoutingMissingException[routing is required for [_river]/[work]/[_meta].

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

Here is the data that I put into my tables.

Insert into PERSON (PERSON_ID, PERSON_NAME) Values (1, 'Person1');
Insert into PERSON (PERSON_ID, PERSON_NAME) Values (2, 'Person2');
COMMIT;

Insert into WORK (WORK_ID, PERSON_ID, WORK_NAME, GENRE, PUBLISHER)
Values (1, 1, 'P1Book1', 'Fiction', 'MyHouse');
Insert into WORK (WORK_ID, PERSON_ID, WORK_NAME, GENRE, PUBLISHER)
Values (2, 1, 'P1Book2', 'Non-Fiction', 'MyHouse');
Insert into WORK (WORK_ID, PERSON_ID, WORK_NAME, GENRE, PUBLISHER)
Values (3, 2, 'P2Book1', 'Fiction', 'MyHouse');
Insert into WORK (WORK_ID, PERSON_ID, WORK_NAME, GENRE, PUBLISHER)
Values (4, 2, 'P2Book2', 'Non-Fiction', 'MyHouse');
COMMIT;

The only entries that I see in the elasticsearch logs are shown below. I do
not get any exception stack trace.

[2013-05-17 10:19:59,123][INFO ][cluster.metadata ] [ESNode1]
[_river] creating index, cause [auto(index api)], shards [1]/[1], mappings
[]
[2013-05-17 10:19:59,295][INFO ][cluster.metadata ] [ESNode1]
[_river] update_mapping [person] (dynamic)
[2013-05-17 10:19:59,308][DEBUG][river.jdbc ] [ESNode1]
[jdbc][person] found river source
org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource for strategy
simple
[2013-05-17 10:19:59,310][DEBUG][river.jdbc ] [ESNode1]
[jdbc][person] found river target
org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth for strategy
simple
[2013-05-17 10:19:59,311][INFO
][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] waiting
for cluster...
[2013-05-17 10:19:59,313][DEBUG][river.jdbc ] [ESNode1]
[jdbc][person] found river task
org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow for strategy
simple
[2013-05-17 10:19:59,314][INFO ][river.jdbc ] [ESNode1]
[jdbc][person] starting JDBC river: URL
[jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=10000)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=MYDB)))],
driver [oracle.jdbc.OracleDriver], strategy [simple], index
[jdbcriver_person]/[jdbc]
[2013-05-17 10:19:59,377][INFO ][cluster.metadata ] [ESNode1]
[jdbcriver_person] creating index, cause [api], shards [5]/[1], mappings []
[2013-05-17 10:19:59,597][INFO ][cluster.metadata ] [ESNode1]
[_river] update_mapping [person] (dynamic)
[2013-05-17 10:19:59,633][INFO
][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource] merged 2
rows
[2013-05-17
10:19:59,633][DEBUG][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow]
{"jdbc":{"created":"2013-05-17T14:19:59.315Z","version":1,"digest":"2Mq+WYAP7eWwiikKMAlUPvrn8V3g65I855gkUWl0ZNE="}}
[2013-05-17 10:19:59,644][INFO
][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] next run,
waiting 1.6h
[2013-05-17 10:19:59,647][INFO ][cluster.metadata ] [ESNode1]
[_river] update_mapping [person] (dynamic)
[2013-05-17 10:20:00,313][INFO
][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new bulk
[1] of [2 items], 1 outstanding bulk requests
[2013-05-17 10:20:00,321][INFO
][org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] bulk [1]
success [2 items] [8ms]
[2013-05-17 10:20:00,324][INFO ][cluster.metadata ] [ESNode1]
[jdbcriver_person] update_mapping [jdbc] (dynamic)
[2013-05-17 10:20:12,268][INFO ][cluster.metadata ] [ESNode1]
[_river] create_mapping [work]

On Thursday, 16 May 2013 20:26:39 UTC-4, Jörg Prante wrote:

Your data for a child document is not enough to fulfil the routing (the
parent id in that case).

Can you please gist the stacktrace of the exception? And some example
data so I can better reproduce?

Thanks,

Jörg

Am 16.05.13 20:13, schrieb A Daniel:

But when I run the below command, I get a
RoutingMissingException[routing is required for [_river]/[work]/[_meta].

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks, just give me some time to let me reproduce.

Jörg

Am 17.05.13 16:27, schrieb A Daniel:

Here is the data that I put into my tables.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.