How do I search for non-ASCII characters?

I've got a load of log data in Elasticsearch. Most of it is in ASCII, but very occasionally there's a non-ASCII character ... which breaks a downstream naive Python application.

What query can I use to find the documents that contain non-ASCII characters? - yes I'll need to fix the downstream code anyway, but it would be interesting to know what is generating non-ASCII log documents. Example: all I know about a document is that it contains a U+FFFD character - how do I search for that?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.