I am having an issue with the jdbc river collapsing during the bulk insert
i have records that have some single value properties, and can have multiple value properties (names, addresses and emails)
there are a total of around 4.5 million rows that collapse down to 600k
if the river sql criteria is set to be where id="001", it works fine
but during the bulk process ie all of my rows, only one property that can have multiple values is correct, other properties are missing data
here is an example of what the query output that the river is using to collapse to JSON
it has 2 middle names, 2 last names, and 4 addresses
_id	pref_mail_name	pref_class_year	record_status_code	first_name	middle_name	last_name	street1	street2	street3	city	state_code	zipcode	email_address
0000003934	Kelly A. Draper	1999	A	Kelly	Ann	Draper	13679 Stoney Springs Dr	 	 	Chardon	OH	44024-8918	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	Ann	Draper	1400 McDonald Investment Ctr	800 Superior Ave E Ste 1400	 	Cleveland	OH	44114-2617	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	Ann	Draper	13156 Aldenshire Dr	 	 	Chardon	OH	44024-8921	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	Ann	Draper	100 7th Ave Ste 150	 	 	Chardon	OH	44024-7808	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	Ann	Draper	13765 Equestrian Dr	 	 	Burton	OH	44021-9552	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	A.	McElroy	13679 Stoney Springs Dr	 	 	Chardon	OH	44024-8918	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	A.	McElroy	1400 McDonald Investment Ctr	800 Superior Ave E Ste 1400	 	Cleveland	OH	44114-2617	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	A.	McElroy	13156 Aldenshire Dr	 	 	Chardon	OH	44024-8921	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	A.	McElroy	100 7th Ave Ste 150	 	 	Chardon	OH	44024-7808	kelly_a_draper@yahoo.com
0000003934	Kelly A. Draper	1999	A	Kelly	A.	McElroy	13765 Equestrian Dr	 	 	Burton	OH	44021-9552	kelly_a_draper@yahoo.com
after a river run, the indexed doc has 4 addresses, but only one middle name and one last name, the other never was indexed
"_source": {
"pref_mail_name": "Kelly A. Draper",
"street2": [
" ",
"800 Superior Ave E Ste 1400"
],
"street1": [
"13679 Stoney Springs Dr",
"1400 McDonald Investment Ctr",
"13156 Aldenshire Dr",
"100 7th Ave Ste 150",
"13765 Equestrian Dr"
],
"state_code": "OH",
"middle_name": "A.",
"zipcode": [
"44024-8918",
"44114-2617",
"44024-8921",
"44024-7808",
"44021-9552"
],
"pref_class_year": "1999",
"record_status_code": "A",
"city": [
"Chardon",
"Cleveland",
"Burton"
],
"first_name": "Kelly",
"last_name": "McElroy",
"street3": " ",
"email_address": "kelly_a_draper@yahoo.com"
}
}
I have attempted using bracket notation for creating objects, but the same issue exists, only now the properties are nested
my river looks like this
PUT /_river/matcher/_meta
{
"type" : "jdbc",
"jdbc" : {
"url" : "serverurl",
"user" : "USER",
"password" : "#########",
"sql" : "select e.id_number as "_id", e.pref_mail_name as "pref_mail_name", e.pref_class_year as "pref_class_year", e.record_status_code as "record_status_code", a.street1 as "street1", a.street2 as "street2", a.street3 as "street3", a.city as "city", a.state_code as "state_code", a.zipcode as "zipcode", n.first_name  as "first_name", n.middle_name as "middle_name", n.last_name as "last_name", email.email_address as "email_address" from entity e  left join name n on e.id_number = n.id_number left join email on e.id_number = email.id_number left join address a on e.id_number = a.id_number where e.person_or_org = 'P' and e.record_status_code IN ('A', 'L', 'D') ",
"index" : "matcher",
"type" : "entity",
"bulk_size" : 160,
"max_bulk_requests" : 5
}
}
let me know if i can provide additional info