I've got a load of log data in Elasticsearch. Most of it is in ASCII, but very occasionally there's a non-ASCII character ... which breaks a downstream naive Python application.
What query can I use to find the documents that contain non-ASCII characters? - yes I'll need to fix the downstream code anyway, but it would be interesting to know what is generating non-ASCII log documents. Example: all I know about a document is that it contains a U+FFFD character - how do I search for that?