I am having an issue with the jdbc river collapsing during the bulk insert
i have records that have some single value properties, and can have multiple value properties (names, addresses and emails)
there are a total of around 4.5 million rows that collapse down to 600k
if the river sql criteria is set to be where id="001", it works fine
but during the bulk process ie all of my rows, only one property that can have multiple values is correct, other properties are missing data
here is an example of what the query output that the river is using to collapse to JSON
it has 2 middle names, 2 last names, and 4 addresses
_id pref_mail_name pref_class_year record_status_code first_name middle_name last_name street1 street2 street3 city state_code zipcode email_address
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 13679 Stoney Springs Dr Chardon OH 44024-8918 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 1400 McDonald Investment Ctr 800 Superior Ave E Ste 1400 Cleveland OH 44114-2617 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 13156 Aldenshire Dr Chardon OH 44024-8921 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 100 7th Ave Ste 150 Chardon OH 44024-7808 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly Ann Draper 13765 Equestrian Dr Burton OH 44021-9552 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 13679 Stoney Springs Dr Chardon OH 44024-8918 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 1400 McDonald Investment Ctr 800 Superior Ave E Ste 1400 Cleveland OH 44114-2617 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 13156 Aldenshire Dr Chardon OH 44024-8921 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 100 7th Ave Ste 150 Chardon OH 44024-7808 kelly_a_draper@yahoo.com
0000003934 Kelly A. Draper 1999 A Kelly A. McElroy 13765 Equestrian Dr Burton OH 44021-9552 kelly_a_draper@yahoo.com
after a river run, the indexed doc has 4 addresses, but only one middle name and one last name, the other never was indexed
"_source": {
"pref_mail_name": "Kelly A. Draper",
"street2": [
" ",
"800 Superior Ave E Ste 1400"
],
"street1": [
"13679 Stoney Springs Dr",
"1400 McDonald Investment Ctr",
"13156 Aldenshire Dr",
"100 7th Ave Ste 150",
"13765 Equestrian Dr"
],
"state_code": "OH",
"middle_name": "A.",
"zipcode": [
"44024-8918",
"44114-2617",
"44024-8921",
"44024-7808",
"44021-9552"
],
"pref_class_year": "1999",
"record_status_code": "A",
"city": [
"Chardon",
"Cleveland",
"Burton"
],
"first_name": "Kelly",
"last_name": "McElroy",
"street3": " ",
"email_address": "kelly_a_draper@yahoo.com"
}
}
I have attempted using bracket notation for creating objects, but the same issue exists, only now the properties are nested
my river looks like this
PUT /_river/matcher/_meta
{
"type" : "jdbc",
"jdbc" : {
"url" : "serverurl",
"user" : "USER",
"password" : "#########",
"sql" : "select e.id_number as "_id", e.pref_mail_name as "pref_mail_name", e.pref_class_year as "pref_class_year", e.record_status_code as "record_status_code", a.street1 as "street1", a.street2 as "street2", a.street3 as "street3", a.city as "city", a.state_code as "state_code", a.zipcode as "zipcode", n.first_name as "first_name", n.middle_name as "middle_name", n.last_name as "last_name", email.email_address as "email_address" from entity e left join name n on e.id_number = n.id_number left join email on e.id_number = email.id_number left join address a on e.id_number = a.id_number where e.person_or_org = 'P' and e.record_status_code IN ('A', 'L', 'D') ",
"index" : "matcher",
"type" : "entity",
"bulk_size" : 160,
"max_bulk_requests" : 5
}
}
let me know if i can provide additional info