Hi,
We've got a text field in Elasticsearch into which is being written a JSON document, which has been escaped. We're trying to find instances with a particular pattern of fields in, but cannot formulate the regexp required.
The ES field contains this precise string:-
,\"jobs\":{\"data\":[{\"type\":\"job\",\"id\":\"19029355471\"},
The ID number can change; we want to find documents with similar JSON structures.
Using the Dev Tools console in Kibana, no matter how we create the regexp, we either get a Bad string Syntax error
in the browser or an error from the API such as json_parse_exception
/ Unrecognized character escape ':' (code 58)\n
.
Example of a failing regexp:-
,\\\"jobs\\\"\:\{\\\"data\\\"\:\[\{\\\"type\\\"\:\\\"job\\\",\\\"id\\\"\:\\\"19029355471\\\"\},
We've even tried replacing each punctuation character after the comma with a .
. This does not return an error, but nor does it return any results.
This is what the full query looks like in this case:-
GET myindex*/_search { "query": { "regexp": { "fields.payload": ".*,..jobs......data.......type.....job.....id.....19029355471...,.*" } } }
I've done some searching, and some people seem to think that the json_parse_exception
problem is with ES's JSON parser rather than Lucene. Other people seem to say that Lucene's regexp language is a little bit specialised.
How can we formulate this search?
Thanks,
J.