esRDD not returning SCRIPT_FIELDS values


(Kyunam Kim) #1

Returned hashMap doesn't include 'distance' key.
Do I need to use esJsonRDD instead?

{
  "query": {
    ...
  },
  "fields": [
    "_source"
  ],
  "script_fields": {
    "distance": {
      "params": {
        "lat": 26.622713,
        "lon": -81.857765
      },
      "script": "doc['point'].distance(lat,lon)"
    }
  }
}

Thanks,
Q


esJsonRDD only returns a single field rather than all fields
(Costin Leau) #2

added minor formatting to your post to make it readable - please do so yourself in the future. Thanks!

Sorry but I'm not sure what you are asking. Can you be a bit more explicit in what you are trying to achieve, what is expected result/outcome and the current one?

Is this a queryDSL question or ES-Hadoop specific?


Calling esRDD in a transformation method causes Task not serializable
(Kyunam Kim) #3

This is an ES-hadoop question.
Query looks like this.

{
  "query": {
    "filtered": {
      "query": {
        "term": {
          "coname.raw": {
            "value": "abc inc"
          }
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "city.raw": "fort myers"
              }
            },
            {
              "term": {
                "zip": "12345"
              }
            },
            {
              "query": {
                "match": {
                  "street": {
                    "query": "1234 xyz st",
                    "minimum_should_match": "75%"
                  }
                }
              }
            }
          ],
          "should": [
            {
              "geo_distance": {
                "distance": 500,
                "distance_unit": "m",
                "point": {
                  "lat": 12.34,
                  "lon": -56.78
                }
              }
            }
          ]
        }
      }
    }
  },
  "fields": [
    "_source"
  ],
  "script_fields": {
    "distance": {
      "params": {
        "lat": 12.34,
        "lon": -56.78
      },
      "script": "doc['point'].distance(lat,lon)"
    }
  }
}

In the Sense, I get the result back like this.

{
   "took": 9,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 16.71316,
      "hits": [
         {
            "_index": "myindex",
            "_type": "businesses",
            "_id": "AVHLPmOawjmarqgxrsIp",
            "_score": 16.71316,
            "_source": {
               "locnum": "871121299",
               "coname": "abc inc",
               "street": "xyz st",
               "city": "fort myers",
               "state": "fl",
               "state_name": "florida",
               "zip": "12345",
            },
            "fields": {
               "distance": [
                  109.94139695812709
               ]
            }
         }
      ]
   }
}

In SCRIPT_FIELDS, I specified the distance and I got it no problem using Sense.
When I run this query thru sc.esRDD(), I don't receive distance -> 109.94139695812709.

So, the question is, how do I tell sc.esRDD() to include distance -> 109.94139695812709 in the returned RDD?

Hope this helps.

Thanks,
Q


(Costin Leau) #4

Currently, ES-Hadoop supports either _source or fields but not both. This is a problem for dynamic/custom metadata such as highlighting and there's an issue opened for it:

A workaround until actually fixed in your case would be to remove the _source or break down the query into two parts; not ideal I know...


(Costin Leau) #5

P.S. Thanks for the follow-up; made the topic much more clearer.


(Kyunam Kim) #6

Thank you for your response.

  • Q

(system) #7