Regular Expression Query DSL

I have what I think should be easy to do, but am struggling b.c of how REGEX is handled in a DSL query.

I have records that look like this

Main App Name
sub-app1 - (Main App Name)
sub-app2 - (Main App Name)
sub-app3 - (Main App Name)

Main App Name2
sub-app1 - Main App Name2
sub-app2 - Main App Name2
sub-app3 - Main App Name2

I want to do a filter that allows me to only show
Main App Name
Main App Name2

...and anything else that isn't main app (but not the sub-apps).

I tried doing a filter that matched for the parenthesis, but that doesn't seem to work. I also tried doing a filter that matches for the hyphen in front of Main App Name2 sub apps but that doesn't work. Whatever I chose I end up filtering out everything that has main app name and main app name2 or nothing.

Hello @dandcp

How are you attempting the query? Could you post a screen shot?

Here's some info you might find helpful - Kibana filter regex 'string starts with' doesn't work

I have a data table visualization (for example) with the following records:

I want to query and remove the italicized records and keep the bolded ones.

Workplace Accommodation Request (WPA) - Appian BPM
Work Order Authorization - Workflow (Salesforce)
Work Order Authorization - Appian BPM
Random App Here
Salesforce Apps
Appian BPM Apps
Random App 2

I want to filter out the lines that end with (Salesforce) or "- Appian BPM" but I want to include the two lines that say Salesforce Apps and Appian BPM Apps and anything else (i.e. random app here).

I have tried the following query:

{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "name": "*(Salesforce)"
          }
        },
        {
          "match": {
            "name": "*Appian BPM"
          }
        }
      ]
    }
  }
}

...that takes everything that has Salesforce or Appian out of the query (even the two that aren't in parenthesis or preceded by a hyphen.

@mattkime any thoughts?

Apologies for the delay, this is trickier than it looks because the supported wildcard queries match based on terms and don't recognize the end of a field.

What potential values might the field have? So far I've been treating it like a freeform text field which is proving difficult to work with. I wonder if a keyword field would work better.

@mattkime I am kind of happy that it isn't as simple as 1-2-3 because I spent a good amount of time trying to get it to work. I don't have a ton of Kibana experience but I've worked with regex's enough to think that this would have been relatively easy going into it but with the way things are anchored (or not anchored) it's proving difficult. The field is a freeform text field. When you say a "keyword" field would work better - how do I go about creating that?

Here's a good explanation - https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html

I'll try to summarize the problem, please confirm if its accurate -

You can search by *(Salesforce) but it will find something (Salesforce) something so the wildcard isn't very meaningful. This is with a text field. Its searching for terms in the text regardless of their position.

With a keyword field, the text isn't analyzed. It sees the whole field as a single term/token. This time *(Salesforce) will only match the end of keywords because there's no trailing *.

I must be doing something wrong b/c even when I select the name.keyword field I can't do a wildcard search.

I tried:

{
	"query": {
	  "bool": {
	      "must_not":[
		  {"match": {"name": "*(Salesforce)"}},
		  {"match": {"name": "*- Appian BPM"}}
		  ]
		}
}
}

...but once again, that filters out anything with Salesforce in the name as well.

try querying name.keyword instead of the name field directly. elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.