Hi all,
I have a table in Hive and it has: 1,412,444 records
with following fields:
timestamp TIMESTAMP, clientid STRING, kitnumber STRING, sessionid STRING, sessionduration INT, countofsession INT, dayssincelastsession INT, totalengegedtime INT, pagevisible INT, pagehidden INT, eventvalue INT, totalevents INT, hits INT, eventcategory STRING, eventaction STRING, eventlabel STRING, vimeouploaddate STRING, usertype STRING, page STRING, pagedepth INT, pagetitle STRING, screenname STRING, landingpage STRING, landingscreenname STRING, secondpage STRING, exitpage STRING, regionisocode STRING, city STRING, latitude DOUBLE, longitude DOUBLE, serviceprovider STRING, devicecategory STRING, javaenabled STRING, operatingsystem STRING, operatingsystemversion STRING, screencolors STRING, screenresolution STRING, browser STRING, browsersize STRING, browserversion STRING, mobiledeviceinfo STRING, mobileinputselector STRING
When I upload the same data to an external Hive table linked to Elasticsearch,
The record count becomes: 2,290,171 and many duplicates are created.
Can anyone help me figure-out why is this happening?
Thanks in advance,
Kishore.