I know I can modify the current ingestion process to include more fields to make this example easier. But that's not the real goal. I'm trying to demonstrate using REGEX to isolates some specific data within a general data field of an Index so other users within our teams can take advantage of Elasticsearch and Kibana.
We have a general index used for process auditing. It has some 25+ fields and is used across many processes and products. Most fields are easily leveraged (server name, IP address, application, specific activity, category, user, user ID, client name, client ID, success/fail, etc. We also have general 'data' field that captures some relevant detail for the event that was captured in this audit log.
This 'data' field can hold a wide spectrum of info, from an empty field to 2K of various relevant info. Often, there's multiple pieces of data, delimited in some way as K-V pairs.
My issue is the "=" between K-V pairs is acting like a string terminator, preventing me from including the key as part of the specific search.
My example is looking for activity associated with high asset value. There's an activity where some asset is being managed and in the 'data' field we have "|" delimited the various KV pairs and the "=" separating the Key from the Value. Like:
...|Units=1000|Price=631.04|UnitsValue=631040.00|...
For me, it's the "=" that giving me the headache. It's as if this "=" is treated as a terminator of the string and I cannot look beyond it once I isolate "UnitsValue".
With REGEX, I can find where we have [0-9]{6}.[0-9]{2} for 6 digits before the decimal and 2 after.
I'm unable to find "UnitsValue=[0-9]{6}.[0-9]{2}"
In fact, using REGEX, I can find "UnitsValue", but I cannot find "UnitsValue=".
I can find "UnitsValue?", but not "UnitsValue.". the "?" can be an empty char but the "." cannot.
I've unsuccessfully tried escaping the "=" with "=" and "\=".
I've tried enabling various options. I've tried this as a query in Kibana DEV Tools and as a query filter in the Kibana Discovery filter.
This one finds all 'data' fields with the key "UnitsValue"
GET <index>/_search
{
"query": {
"regexp": {
"data": {
"value": "UnitsValue",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 10000,
"rewrite": "constant_score_blended"
}
}
}
}
Verbs POST and GET have the same result
By changing the value to:
"value": "UnitsValue=",
or
"value": "UnitsValue=[0-9]{6}.[0-9]{2}",
or
"value": "UnitsValue.",
or
"value": "UnitsValue=",
it always fails.
Note if I use
"value": "[0-9]{5}.[0-9]{2}"
it does use REGEX to find large values within that 'data' field, but only when using other filters so it's the UnitsValue.
I'm able to use other filters to isolate to an action where the largest value (100K or more) should be the UnitsValue, but this is not ideal. (What if future 'data' fields includes other KV pairs that are not total value and manage to exceed 100k?)
Is there some setting to help my REGEX query to search beyond the "="
I know I could expand the ingestion process to parse these to specific fields.
but that's not the real goal here. I'm ultimately assembling some good examples to demonstrate to our users how to leverage REGEX to locate other event specific info and need to include that "=" as part of the string to test. We commonly have similar KV pairs in the data field so including the '=' as part of that filter will be highly useful.
Also note: using regex101.com, I can easily search my sample data using the '=' as part of the REGEX query. That site doesn't have the Lucene subset of REGEX commands to help figure out how to do this.