JDBC river query results collapsing to JSON issue


(jrizzi1) #1

I am having an issue with the jdbc river collapsing during the bulk insert

i have records that have some single value properties, and can have multiple value properties (names, addresses and emails)

there are a total of around 4.5 million rows that collapse down to 600k

if the river sql criteria is set to be where id="001", it works fine

but during the bulk process ie all of my rows, only one property that can have multiple values is correct, other properties are missing data

here is an example of what the query output that the river is using to collapse to JSON
it has 2 middle names, 2 last names, and 4 addresses

_id pref_mail_name pref_class_year record_status_code first_name middle_name last_name street1 street2 street3 city state_code zipcode email_address
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 13679 Stoney Springs Dr Chardon OH 44024-8918 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 1400 McDonald Investment Ctr 800 Superior Ave E Ste 1400 Cleveland OH 44114-2617 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 13156 Aldenshire Dr Chardon OH 44024-8921 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 100 7th Ave Ste 150 Chardon OH 44024-7808 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 13765 Equestrian Dr Burton OH 44021-9552 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 13679 Stoney Springs Dr Chardon OH 44024-8918 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 1400 McDonald Investment Ctr 800 Superior Ave E Ste 1400 Cleveland OH 44114-2617 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 13156 Aldenshire Dr Chardon OH 44024-8921 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 100 7th Ave Ste 150 Chardon OH 44024-7808 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 13765 Equestrian Dr Burton OH 44021-9552 kelly_a_draper@yahoo.com

after a river run, the indexed doc has 4 addresses, but only one middle name and one last name, the other never was indexed

"_source": {
"pref_mail_name": "Kelly A. Draper",
"street2": [
" ",
"800 Superior Ave E Ste 1400"
],
"street1": [
"13679 Stoney Springs Dr",
"1400 McDonald Investment Ctr",
"13156 Aldenshire Dr",
"100 7th Ave Ste 150",
"13765 Equestrian Dr"
],
"state_code": "OH",
"middle_name": "A.",
"zipcode": [
"44024-8918",
"44114-2617",
"44024-8921",
"44024-7808",
"44021-9552"
],
"pref_class_year": "1999",
"record_status_code": "A",
"city": [
"Chardon",
"Cleveland",
"Burton"
],
"first_name": "Kelly",
"last_name": "McElroy",
"street3": " ",
"email_address": "kelly_a_draper@yahoo.com"
}
}

I have attempted using bracket notation for creating objects, but the same issue exists, only now the properties are nested

my river looks like this

PUT /_river/matcher/_meta
{
"type" : "jdbc",
"jdbc" : {
"url" : "serverurl",
"user" : "USER",
"password" : "#########",
"sql" : "select e.id_number as "_id", e.pref_mail_name as "pref_mail_name", e.pref_class_year as "pref_class_year", e.record_status_code as "record_status_code", a.street1 as "street1", a.street2 as "street2", a.street3 as "street3", a.city as "city", a.state_code as "state_code", a.zipcode as "zipcode", n.first_name as "first_name", n.middle_name as "middle_name", n.last_name as "last_name", email.email_address as "email_address" from entity e left join name n on e.id_number = n.id_number left join email on e.id_number = email.id_number left join address a on e.id_number = a.id_number where e.person_or_org = 'P' and e.record_status_code IN ('A', 'L', 'D') ",
"index" : "matcher",
"type" : "entity",
"bulk_size" : 160,
"max_bulk_requests" : 5
}
}

let me know if i can provide additional info


(Jörg Prante) #2

From what I understand, you want a single ES document with name:address
relations as 1:N relation, where the only ID available is for the name
(here in the example: 0000003934 for Kelly A. Draper).

It would help to define more identifiers for each address also, so you
could index the addresses in one index, and person names in the other
index, with two rivers.

The support for nested objects in SQL pseudo column bracket notation is
somewhat limited in JDBC river. If anyone feels like improving this,
patches/pull requests would be very welcome!

At the moment I feel without any identifiers or given enumeration scheme,
it is impossible to identify a sequence of JSON objects in a nested
document that can be collapsed/grouped.

Jörg

On Tue, Apr 22, 2014 at 4:35 PM, jrizzi1 jrizzi1@nd.edu wrote:

I am having an issue with the jdbc river collapsing during the bulk insert

i have records that have some single value properties, and can have
multiple
value properties (names, addresses and emails)

there are a total of around 4.5 million rows that collapse down to 600k

if the river sql criteria is set to be where id="001", it works fine

but during the bulk process ie all of my rows, only one property that can
have multiple values is correct, other properties are missing data

here is an example of what the query output that the river is using to
collapse to JSON
it has 2 middle names, 2 last names, and 4 addresses

_id pref_mail_name pref_class_year record_status_code first_name
middle_name
last_name street1 street2 street3 city state_code zipcode
email_address
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper
13679 Stoney Springs Dr
Chardon OH 44024-8918 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper
1400 McDonald Investment
Ctr 800 Superior Ave E Ste 1400 Cleveland OH
44114-2617
kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper
13156 Aldenshire Dr
Chardon OH 44024-8921 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper
100 7th Ave Ste 150
Chardon OH 44024-7808 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper
13765 Equestrian Dr
Burton OH 44021-9552 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy
13679 Stoney Springs Dr
Chardon OH 44024-8918 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy
1400 McDonald Investment
Ctr 800 Superior Ave E Ste 1400 Cleveland OH
44114-2617
kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy
13156 Aldenshire Dr
Chardon OH 44024-8921 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy
100 7th Ave Ste 150
Chardon OH 44024-7808 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy
13765 Equestrian Dr
Burton OH 44021-9552 kelly_a_draper@yahoo.com

after a river run, the indexed doc has 4 addresses, but only one middle
name
and one last name, the other never was indexed

"_source": {
"pref_mail_name": "Kelly A. Draper",
"street2": [
" ",
"800 Superior Ave E Ste 1400"
],
"street1": [
"13679 Stoney Springs Dr",
"1400 McDonald Investment Ctr",
"13156 Aldenshire Dr",
"100 7th Ave Ste 150",
"13765 Equestrian Dr"
],
"state_code": "OH",
"middle_name": "A.",
"zipcode": [
"44024-8918",
"44114-2617",
"44024-8921",
"44024-7808",
"44021-9552"
],
"pref_class_year": "1999",
"record_status_code": "A",
"city": [
"Chardon",
"Cleveland",
"Burton"
],
"first_name": "Kelly",
"last_name": "McElroy",
"street3": " ",
"email_address": "kelly_a_draper@yahoo.com"
}
}

I have attempted using bracket notation for creating objects, but the same
issue exists, only now the properties are nested

my river looks like this

PUT /_river/matcher/_meta
{
"type" : "jdbc",
"jdbc" : {
"url" : "serverurl",
"user" : "USER",
"password" : "#########",
"sql" : "select e.id_number as "_id", e.pref_mail_name as
"pref_mail_name", e.pref_class_year as "pref_class_year",
e.record_status_code as "record_status_code", a.street1 as "street1",
a.street2 as "street2", a.street3 as "street3", a.city as "city",
a.state_code as "state_code", a.zipcode as "zipcode", n.first_name as
"first_name", n.middle_name as "middle_name", n.last_name as
"last_name", email.email_address as "email_address" from entity e left
join name n on e.id_number = n.id_number left join email on e.id_number =
email.id_number left join address a on e.id_number = a.id_number where
e.person_or_org = 'P' and e.record_status_code IN ('A', 'L', 'D') ",
"index" : "matcher",
"type" : "entity",
"bulk_size" : 160,
"max_bulk_requests" : 5
}
}

let me know if i can provide additional info

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398177305643-4054562.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHk7-%3Dj%2BQAFPPy%3Dw4%2BiXD%3D%3Dx2BT%2Bao%2BLQQ0DB-hjKiHgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(jrizzi1) #3

Hi Jorg,

I wanted a single ES document, i have a primary table that has unique id's and the names is a 1:N relation, and the address is a 1:N relation

reason being is we will need to search on names and addresses to try to find the unique ID for the individual so we can do further processing

how could I search over several indicies and merge those results back together to give the one unique id that best matches? my initial test of splitting these apart into different indicies is showing addresses and entities littered in the same result set, i havent any idea how to get a commonality between them

the addresses and names dont really have unique identifiers of their own, they are sequenced by the primary ID, example: if the primary table ID is '1001', and he has three addresses then the unqiue ID for those rows would be id='1001', sequence='1', id='1001', sequence='2', ... etc


(Jörg Prante) #4

Have you tried parent/child ?

The idea is to execute has_parent queries on address type to find parent ID
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html

I can prepare an example ...

Jörg

On Tue, Apr 22, 2014 at 8:53 PM, jrizzi1 jrizzi1@nd.edu wrote:

Hi Jorg,

I wanted a single ES document, i have a primary table that has unique id's
and the names is a 1:N relation, and the address is a 1:N relation

reason being is we will need to search on names and addresses to try to
find
the unique ID for the individual so we can do further processing

how could I search over several indicies and merge those results back
together to give the one unique id that best matches? my initial test of
splitting these apart into different indicies is showing addresses and
entities littered in the same result set, i havent any idea how to get a
commonality between them

the addresses and names dont really have unique identifiers of their own,
they are sequenced by the primary ID, example: if the primary table ID is
'1001', and he has three addresses then the unqiue ID for those rows would
be id='1001', sequence='1', id='1001', sequence='2', ... etc

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054576.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398192809993-4054576.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtndn4q2NZWHXA9mBiuA44Pc1NH4OZVn9C1u%3DSmfom4w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #5

Is this parent/child example sketching the challenge you are facing?

Jörg

On Tue, Apr 22, 2014 at 9:05 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Have you tried parent/child ?

The idea is to execute has_parent queries on address type to find parent
ID
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html

I can prepare an example ...

Jörg

On Tue, Apr 22, 2014 at 8:53 PM, jrizzi1 jrizzi1@nd.edu wrote:

Hi Jorg,

I wanted a single ES document, i have a primary table that has unique id's
and the names is a 1:N relation, and the address is a 1:N relation

reason being is we will need to search on names and addresses to try to
find
the unique ID for the individual so we can do further processing

how could I search over several indicies and merge those results back
together to give the one unique id that best matches? my initial test of
splitting these apart into different indicies is showing addresses and
entities littered in the same result set, i havent any idea how to get a
commonality between them

the addresses and names dont really have unique identifiers of their own,
they are sequenced by the primary ID, example: if the primary table ID is
'1001', and he has three addresses then the unqiue ID for those rows would
be id='1001', sequence='1', id='1001', sequence='2', ... etc

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054576.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398192809993-4054576.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFVBChBiNcCWFYU-WcPFnYCC1_jD1iu57ZvLwDnn-x%3DeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(jrizzi1) #6

Thank you for that gist, it really helps to see a full setup like that from start to finish

I had attempted originally a parent-child relationship for this data by using _parent in separate river(s), but decided to go to a single index because when I attempted to return child information you can only return parent properties not children properties as well in the hits, and wasnt sure how to additionally query to get child property data

but reality is I more than likely need to switch to this setup to retain all of my data, which is the least of all evils

Is this parent/child example sketching the challenge you are facing?

Jörg

On Tue, Apr 22, 2014 at 9:05 PM, joergprante@ <
joergprante@> wrote:

Have you tried parent/child ?

The idea is to execute has_parent queries on address type to find parent
ID
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html

I can prepare an example ...

Jörg

On Tue, Apr 22, 2014 at 8:53 PM, jrizzi1 <jrizzi1@> wrote:

Hi Jorg,

I wanted a single ES document, i have a primary table that has unique id's
and the names is a 1:N relation, and the address is a 1:N relation

reason being is we will need to search on names and addresses to try to
find
the unique ID for the individual so we can do further processing

how could I search over several indicies and merge those results back
together to give the one unique id that best matches? my initial test of
splitting these apart into different indicies is showing addresses and
entities littered in the same result set, i havent any idea how to get a
commonality between them

the addresses and names dont really have unique identifiers of their own,
they are sequenced by the primary ID, example: if the primary table ID is
'1001', and he has three addresses then the unqiue ID for those rows would
be id='1001', sequence='1', id='1001', sequence='2', ... etc

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054576.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398192809993-4054576.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFVBChBiNcCWFYU-WcPFnYCC1_jD1iu57ZvLwDnn-x%3DeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #7

I see the use case for building deep nested docs from JDBC river (and other
data sources).

In JDBC river, there are open questions about nested result sets, which is
similar.

I have to think more about multiple SQL statements and create "merge
points" to construct bigger JSON from them in a natural way...

Jörg

On Tue, Apr 22, 2014 at 10:05 PM, jrizzi1 jrizzi1@nd.edu wrote:

Thank you for that gist, it really helps to see a full setup like that from
start to finish

I had attempted originally a parent-child relationship for this data by
using _parent in separate river(s), but decided to go to a single index
because when I attempted to return child information you can only return
parent properties not children properties as well in the hits, and wasnt
sure how to additionally query to get child property data

but reality is I more than likely need to switch to this setup to retain
all
of my data, which is the least of all evils

joergprante@gmail.com wrote

Is this parent/child example sketching the challenge you are facing?

https://gist.github.com/jprante/11191387

Jörg

On Tue, Apr 22, 2014 at 9:05 PM,

joergprante@

<

joergprante@

wrote:

Have you tried parent/child ?

The idea is to execute has_parent queries on address type to find parent
ID

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html

I can prepare an example ...

Jörg

On Tue, Apr 22, 2014 at 8:53 PM, jrizzi1 <

jrizzi1@

> wrote:

Hi Jorg,

I wanted a single ES document, i have a primary table that has unique
id's
and the names is a 1:N relation, and the address is a 1:N relation

reason being is we will need to search on names and addresses to try to
find
the unique ID for the individual so we can do further processing

how could I search over several indicies and merge those results back
together to give the one unique id that best matches? my initial test
of

splitting these apart into different indicies is showing addresses and
entities littered in the same result set, i havent any idea how to get
a

commonality between them

the addresses and names dont really have unique identifiers of their
own,
they are sequenced by the primary ID, example: if the primary table ID
is
'1001', and he has three addresses then the unqiue ID for those rows
would
be id='1001', sequence='1', id='1001', sequence='2', ... etc

--
View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054576.html

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to

elasticsearch+unsubscribe@

.

To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/1398192809993-4054576.post%40n3.nabble.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFVBChBiNcCWFYU-WcPFnYCC1_jD1iu57ZvLwDnn-x%3DeA%40mail.gmail.com
.

For more options, visit https://groups.google.com/d/optout.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054581.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398197142689-4054581.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGuj9CnYeVvTuYtjrmb2U00seunUbCvrnEzwdojaEEOBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(jrizzi1) #8

That sounds like good news, looking forward to that

The only thing that really bothers me about the issue originally listed is that the _river results will collapse to JSON correctly with multiple 1:N relationships if i _river a smaller dataset

for instance , if i include in my _river sql criteria like "where id <= 4000" , the sql results are 25k rows that get collapsed into 3000 documents indexed, and 1:N data is correct, at least from spot checking over 40 docs with multiple 1:N data

It is only when i do larger sql results that some 1:N data goes missing, which leads me to believe that something is occurring in the bulk process , right?

I see the use case for building deep nested docs from JDBC river (and other data sources).

In JDBC river, there are open questions about nested result sets, which is
similar.

I have to think more about multiple SQL statements and create "merge
points" to construct bigger JSON from them in a natural way...

Jörg

On Tue, Apr 22, 2014 at 10:05 PM, jrizzi1 <jrizzi1@> wrote:

Thank you for that gist, it really helps to see a full setup like that from
start to finish

I had attempted originally a parent-child relationship for this data by
using _parent in separate river(s), but decided to go to a single index
because when I attempted to return child information you can only return
parent properties not children properties as well in the hits, and wasnt
sure how to additionally query to get child property data

but reality is I more than likely need to switch to this setup to retain
all
of my data, which is the least of all evils

joergprante@ wrote

Is this parent/child example sketching the challenge you are facing?

https://gist.github.com/jprante/11191387

Jörg

On Tue, Apr 22, 2014 at 9:05 PM,

joergprante@

<

joergprante@

wrote:

Have you tried parent/child ?

The idea is to execute has_parent queries on address type to find parent
ID

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html

I can prepare an example ...

Jörg

On Tue, Apr 22, 2014 at 8:53 PM, jrizzi1 <

jrizzi1@

> wrote:

Hi Jorg,

I wanted a single ES document, i have a primary table that has unique
id's
and the names is a 1:N relation, and the address is a 1:N relation

reason being is we will need to search on names and addresses to try to
find
the unique ID for the individual so we can do further processing

how could I search over several indicies and merge those results back
together to give the one unique id that best matches? my initial test
of

splitting these apart into different indicies is showing addresses and
entities littered in the same result set, i havent any idea how to get
a

commonality between them

the addresses and names dont really have unique identifiers of their
own,
they are sequenced by the primary ID, example: if the primary table ID
is
'1001', and he has three addresses then the unqiue ID for those rows
would
be id='1001', sequence='1', id='1001', sequence='2', ... etc

--
View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054576.html

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to

elasticsearch+unsubscribe@

.

To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/1398192809993-4054576.post%40n3.nabble.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFVBChBiNcCWFYU-WcPFnYCC1_jD1iu57ZvLwDnn-x%3DeA%40mail.gmail.com
.

For more options, visit https://groups.google.com/d/optout.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054581.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398197142689-4054581.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGuj9CnYeVvTuYtjrmb2U00seunUbCvrnEzwdojaEEOBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #9

I hesitate to add "oversophisticated" code to the JDBC river that collapses
without reason.

Somehow the definition should set "merge points" to control the zone of
JSON object/array growth. Maybe an extension of the bracket notation is all
that is needed.

At least, I will add extensive logging to the river so it can be traced
easily how the JSON docs are built from SQL.

Jörg

On Wed, Apr 23, 2014 at 3:44 PM, jrizzi1 jrizzi1@nd.edu wrote:

That sounds like good news, looking forward to that

The only thing that really bothers me about the issue originally listed is
that the _river results will collapse to JSON correctly with multiple 1:N
relationships if i _river a smaller dataset

for instance , if i include in my _river sql criteria like "where id <=
4000" , the sql results are 25k rows that get collapsed into 3000 documents
indexed, and 1:N data is correct, at least from spot checking over 40 docs
with multiple 1:N data

It is only when i do larger sql results that some 1:N data goes missing,
which leads me to believe that something is occurring in the bulk process ,
right?

joergprante@gmail.com wrote

I see the use case for building deep nested docs from JDBC river (and
other
data sources).

In JDBC river, there are open questions about nested result sets, which
is
similar.

I have to think more about multiple SQL statements and create "merge
points" to construct bigger JSON from them in a natural way...

Jörg

On Tue, Apr 22, 2014 at 10:05 PM, jrizzi1 <

jrizzi1@

> wrote:

Thank you for that gist, it really helps to see a full setup like that
from
start to finish

I had attempted originally a parent-child relationship for this data by
using _parent in separate river(s), but decided to go to a single index
because when I attempted to return child information you can only return
parent properties not children properties as well in the hits, and wasnt
sure how to additionally query to get child property data

but reality is I more than likely need to switch to this setup to retain
all
of my data, which is the least of all evils

joergprante@

wrote

Is this parent/child example sketching the challenge you are facing?

https://gist.github.com/jprante/11191387

Jörg

On Tue, Apr 22, 2014 at 9:05 PM,

joergprante@

<

joergprante@

wrote:

Have you tried parent/child ?

The idea is to execute has_parent queries on address type to find
parent

ID

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html

I can prepare an example ...

Jörg

On Tue, Apr 22, 2014 at 8:53 PM, jrizzi1 <

jrizzi1@

> wrote:

Hi Jorg,

I wanted a single ES document, i have a primary table that has
unique

id's
and the names is a 1:N relation, and the address is a 1:N relation

reason being is we will need to search on names and addresses to try
to

find
the unique ID for the individual so we can do further processing

how could I search over several indicies and merge those results
back

together to give the one unique id that best matches? my initial
test

of

splitting these apart into different indicies is showing addresses
and

entities littered in the same result set, i havent any idea how to
get
a

commonality between them

the addresses and names dont really have unique identifiers of their
own,
they are sequenced by the primary ID, example: if the primary table
ID

is
'1001', and he has three addresses then the unqiue ID for those rows
would
be id='1001', sequence='1', id='1001', sequence='2', ... etc

--
View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054576.html

Sent from the ElasticSearch Users mailing list archive at
Nabble.com.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send

an
email to

elasticsearch+unsubscribe@

.

To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/1398192809993-4054576.post%40n3.nabble.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to

elasticsearch+unsubscribe@

.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFVBChBiNcCWFYU-WcPFnYCC1_jD1iu57ZvLwDnn-x%3DeA%40mail.gmail.com

.

For more options, visit https://groups.google.com/d/optout.

--
View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054581.html

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google
Groups

"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an

email to

elasticsearch+unsubscribe@

.

To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/1398197142689-4054581.post%40n3.nabble.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGuj9CnYeVvTuYtjrmb2U00seunUbCvrnEzwdojaEEOBQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/d/optout.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562p4054631.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398260692324-4054631.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbv2SqbBmXgWAgZJZ4a090yKTQFfoN_62_57nbxhC67Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #10