I'm trying to QA some documents with painless, hoping I could get a little help on what I'm doing wrong why my query.
I have about a billion documents that are being used for application search. During the loading process, there were some errors on the codes and one of the fields wasn't populated with the correct values. Lucky for us, it's something we can abstract the value of from other fields. I realize that it would probably be better for the application to put this together but this is what I'm stuck making work. Here's an example of what the docs should look like before and after:
Before:
{
"field_one": "foo",
"field_two": "bar",
"field_three": "error"
}
Before:
{
"field_one": "foo",
"field_two": "bar",
"field_three": "foo bar"
}
I was able to fix the fields with an update by query and am very happy with those results. My issue is I don't always know what's in field three as the data that was incorrectly populated isn't consistent.
This may not be the best approach, but what I would like to do is write a query that finds all documents that do not match what the constructed field would be after the update. I was trying to use painless to build what the actual value would be and then do a bool query so I could include or exclude those values to push the updates. Any suggestions on what is wrong with my syntax below and why it's not working for me? I'd also be interested if anyone has a better way to go about doing this - thanks!
{
"query": {
"bool": {
"must": {
"script": {
"script": {
"source": "ctx._source.field_three = ctx._source.field_one + ctx._source.field_two",
"lang": "painless"
}
}
}
}
}
}