Data in elastic search doesn't match while running the same job

So i have a hive script which creates multiple external table to one index and stores the data there. Later on we visualise that data on ES.

While converting the code from hive to pyspark i notice that while running the same pyspark job the data is not matching. not for all values but some values.

What could be the reason for it? I can guess that since in hive we have multiple tables so data is not mixed whereas in pyspark i am sending all datraframes t the same index. BUt if i run the same pyspark job twice i get data mismatch.
At first i thought my code was bad but the spark and hive code return same values if i run them manually in terminal.

Thanks for reaching out here. A few follow-up questions here:

  • What version of Elastic are you using?
  • Do you have a code example you can share with us?
  • I'd also like to learn more about exactly what isn't matching.

Best,

Jessica

this is duplicate post of

https://discuss.elastic.co/t/data-mismatches-happening-while-sending-data-to-elastic-search-index-using-pyspark/377188/4

1 Like

Thanks, @elasticforme. Let's use that post for any further conversation on that subject.