Painless QA query


(Colin) #1

I'm trying to QA some documents with painless, hoping I could get a little help on what I'm doing wrong why my query.

I have about a billion documents that are being used for application search. During the loading process, there were some errors on the codes and one of the fields wasn't populated with the correct values. Lucky for us, it's something we can abstract the value of from other fields. I realize that it would probably be better for the application to put this together but this is what I'm stuck making work. Here's an example of what the docs should look like before and after:

Before:

{
  "field_one": "foo",
  "field_two": "bar",
  "field_three": "error"
}

Before:

{
  "field_one": "foo",
  "field_two": "bar",
  "field_three": "foo bar"
}

I was able to fix the fields with an update by query and am very happy with those results. My issue is I don't always know what's in field three as the data that was incorrectly populated isn't consistent.

This may not be the best approach, but what I would like to do is write a query that finds all documents that do not match what the constructed field would be after the update. I was trying to use painless to build what the actual value would be and then do a bool query so I could include or exclude those values to push the updates. Any suggestions on what is wrong with my syntax below and why it's not working for me? I'd also be interested if anyone has a better way to go about doing this - thanks!

{
  "query": {
    "bool": {
      "must": {
        "script": {
          "script": {
            "source": "ctx._source.field_three = ctx._source.field_one + ctx._source.field_two",
            "lang": "painless"
          }
        }
      }
    }
  }
}

(Gordon Brown) #2

There are a few problems I see with your query.

  1. You're using ctx, which isn't present in the Script Query context - you need to use doc.
  2. You can't access _source from the Script Query context, only fields which have doc_values enabled, and use need to use doc.<field_name>.value to access their value.
  3. Your query uses a single equals (=), which is assignment in Painless. To check equality, you need to use a double equals (==).
  4. In your example, field_three includes a space between the foo and bar, but the script does not add a space.

The query works when you change it like so (although note that I did shorten the field names):

GET testindex/_search
{
  "query": {
    "bool": {
      "must": {
        "script": {
          "script": {
            "source": "doc.three.value == doc.one.value + \" \" + doc.two.value"
          }
        }
      }
    }
  }
}

However, this only works if one, two, and three are keyword fields in your mapping. The values of the fields are only accessible in scripts if they are mapped as keyword, rather than as text. See the doc_values documentation for more on that front.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.