How to generate the same sampled dataset from spark?

In Spark, I want to sample a large dataset from my es index. I tried the following command:

df ="org.elasticsearch.spark.sql") \
            .option("query", myquery) \
            .option("pushdown", "true") \
            .load(spark.conf.get("")) \

where myquery is:

  "query": {
    "filtered": {
      "query": {
        "function_score": {
          "functions": [
              "random_score": {
                "seed": 1
          "score_mode": "sum"

amd mysize~10K. The problem is that I'm not getting the same result each time I execute the command (with the same seed). Why? What is the best way to sample the data without having first to match all the data and making a sample command on this huge dataframe?

Thank you for your help

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.