Spark 2.0.2 read data from ES 1.7.0 using spark-20_2.11:5.0.1 got error java.io.InvalidClassException: org.apache.spark.sql.execution.FilterExec


(Chanh Le) #1

Hi everyone,
I have a problem when using Zeppelin which is using Spark 2.0.2 is reading from ES 1.7.0 using > org.elasticsearch:elasticsearch-spark-20_2.11:5.0.1
I am using Spark to join data from 2 ES clusters
I create a DF from the ds1.
Schema like this.
|-- actionType: integer
|-- adServing: string
|-- ageRange: string
|-- ageYr1: long
|-- ageYr2: long
|-- antsPv: string
|-- author: integer
|-- browserId: integer
|-- browserVerId: integer
|-- clickAdServing: string
|-- clickBeacon: string
|-- coId: integer
|-- count: integer
|-- countryCode: string
|-- cuid: string
|-- devFamilyId: integer
|-- devTypeId: integer
|-- domainId: integer
|-- eventIds: string
|-- fAgeRange: string
|-- fGender: string
|-- gender: integer
|-- goals: string
|-- hasProfile: integer
|-- imkSum: string
|-- imkTime: string
|-- incrBounce: integer
|-- incrVisit: integer
|-- intSum: string
|-- intTime: string
|-- ip: string
|-- ispId: integer
|-- keywords: string
|-- loggedDate: timestamp
|-- loggedHour: integer
|-- loggedTime: timestamp
|-- newVisit: integer
|-- object: integer
|-- order: string
|-- orderStatus: integer
|-- osFamilyId: integer
|-- osId: integer
|-- register: integer
|-- resId: integer
|-- revenue: long
|-- section: integer
|-- sessionId: long
|-- siteId: long
|-- zoneId: integer

and ds2
root
|-- adId: integer
|-- ageRange: string
|-- blackList: long
|-- broserVerId: long
|-- browser: string
|-- browserId: long
|-- browserVer: string
|-- campaignId: long
|-- carrierId: long
|-- channelId: long
|-- conAgeRange: string
|-- coordinates: string
|-- countryCode: string
|-- creativeId: long
|-- devId: long
|-- devModel: string
|-- gender: integer
|-- gent: integer
|-- his: string
|-- hostname: string
|-- ip: string
|-- loggedTime: timestamp
|-- networkId: long
|-- os: string
|-- osId: long
|-- refUrlId: long
|-- refZone: string
|-- res: string
|-- secondPrice: double
|-- sectionId: long
|-- spendCost: double
|-- spentBuyCost: double
|-- urlId: long
|-- userId: long
|-- viewToClick: integer
|-- websiteId: integer
|-- zoneId: long

I join 2 ds together by using sql

select campaignId, count(b.*) as click, count(a.actionType) as pageview, avg(timeSpent) as timesSpent
from insight a
left join clickDetail b on b.his = a.clickBeacon
where a.loggedTime between '2016-11-30 00:00:00.0' and '2016-12-01 00:00:00.0'
and siteId=5384240
and campaignId IN (580473,583396,583768)
and timeSpent > 0
group by campaignId
order by timesSpent desc

and I got the error

java.io.InvalidClassException: org.apache.spark.sql.execution.FilterExec; local class incompatible: stream classdesc serialVersionUID = -3682127751384284016, local class serialVersionUID = -1001413048249876169

BTW If I using 2.4.2 with spark 1.x It query ok.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.