Wildcard queries and custom templates


(Trondhindenes) #1

hi,
We're in the process of moving from the logstash builtin template to our own in order to have more control of our fields and data types. I'm seeing some problems with wildcard searches using our new index that I'd very much like to some expert advise on.
Firstly, our es template looks like this:

{
  "company_app_log": {
    "order": 0,
    "template": "company_app_log*",
    "settings": {
      "index": {
        "refresh_interval": "5s"
      }
    },
    "mappings": {
      "formattederror": {
        "dynamic_templates": [
          {
            "notanalyzed": {
              "mapping": {
                "index": "not_analyzed",
                "type": "string"
              },
              "match_mapping_type": "string",
              "match": "*"
            }
          }
        ],
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "errorid": {
            "type": "integer"
          }
        }
      },
      "telemetry": {
        "dynamic_templates": [
          {
            "notanalyzed": {
              "mapping": {
                "index": "not_analyzed",
                "type": "string"
              },
              "match_mapping_type": "string",
              "match": "*"
            }
          }
        ],
        "properties": {
          "@timestamp": {
            "type": "date"
          }
        }
      },
      "http_measurement": {
        "dynamic_templates": [
          {
            "notanalyzed": {
              "mapping": {
                "index": "not_analyzed",
                "type": "string"
              },
              "match_mapping_type": "string",
              "match": "*"
            }
          }
        ],
        "properties": {
          "httpstatuscode": {
            "type": "integer"
          },
          "@timestamp": {
            "type": "date"
          },
          "elapsedmilliseconds": {
            "type": "integer"
          }
        }
      }
    },
    "aliases": {}
  }
}

One document in the index looks like this:

{
	"_index": "company_log-uat-2016.11",
	"_type": "http_measurement",
	"_id": "AVhTY6gacmfALXNtkch5",
	"_score": null,
	"_source": {
		"Level": "Information",
		"@version": "1",
		"@timestamp": "2016-11-11T11:42:48.481Z",
		"count": 1,
		"fields": null,
		"beat": {
			"hostname": "Z49OS2SWB114T",
			"name": "Z49OS2SWB114T"
		},
		"input_type": "log",
		"tags": [
			"environment:uat",
			"servicefamily:services-external",
			"servicegroup:api",
			"beats_input_codec_json_applied",
			"http_measurement"
		],
		"offset": 3615,
		"type": "log",
		"host": "Z49OS2SWB114T",
		"logtype": "http_measurement",
		"computername": "Z49OS2SWB114T",
		"httpmethod": "GET",
		"requesturi": "https://company.com/api/web/asset/1595262/play?protocol=HLS",
		"httpstatusstring": "OK",
		"httpstatuscode": 200,
		"elapsedmilliseconds": 221
	}
}

I'm struggling to understand how I can wildcard search in Kibana using this data.
In regular "logstashed" indices I could for eample use this query:

requesturi.raw: "*web*"

but the corresponding

requesturi: "*web*"

doesnt provide any results from this index. If I remove the quotation marks it works, but that leaves me unable to search using any terms containing special characters, such as:

requesturi: "*api/web/asset/*"

Am I doing something wrong in my template?
For the record, we're not on ELK 5.0, we're still running the previous version of the stack.


(Trondhindenes) #2

I dug a bit deeper into this, and it turns out queries on an index where the default logstash template it used will have an analysed field "thing" and a non-analyzed field representing the same data ("thing.raw"). The non-analysed field shows the same behaviors as fields where my custom template is used, which is to be expected since they're both un-analyzed string fields.

What I struggle to understand is: I seem to be able to perform wildcard queries on a non-analyzed field, as long as I don't use any special characters. So, my question is: Do I have to use an analyzed field just to have support for wildcard queries containing special characters (like "/")?


(Michael McCandless) #3

Are you sure your template is resulting in an un-analyzed string field for requesturi? You can ask ES for the mappings on the index to confirm what the template actually did.

I ask that because the default analyzer would split on characters like / which might explain why wildcards with / are not working.

You must use an un-analyzed field if you want wildcards like *api/web/asset* to work.

Mike McCandless


(Trondhindenes) #4

Thanks for responding Mike,
I can't seen to find anything in the documentation regarding how to list templates for a given index. I can list them globally, that's all.

However, like I wrote above a "regular" index using Logstash's built-in template shows the same symptom on the "raw" field (which is also non-analyzed) - I'm unable to use special characters together with wildcard searches on a "raw" field.

For example, this doesn't work (this is an index using the defualt logstash template):

uriStem.raw: "/client/1/*"

However, on the same index, this works fine:

uriStem.raw: "/client/1/catchup"

On the analyzed "version" of the same field, this works:

uriStem: "/client/1/*"

This seems to be the opposite of your explanation? (Not saying it's wrong, just trying to understand the behavior)


(Trondhindenes) #5

So I managed to find the mappings for the index, and everything looks correct there:
(subset):

          "messagelevel": {
            "type": "string",
            "index": "not_analyzed"
          },
          "offset": {
            "type": "long"
          },
          "requesturi": {
            "type": "string",
            "index": "not_analyzed"
          },
          "source": {
            "type": "string",
            "index": "not_analyzed"
          },
          "sourcecontext": {
            "type": "string",
            "index": "not_analyzed"
          },
          "tags": {
            "type": "string",
            "index": "not_analyzed"
          },
          "type": {
            "type": "string",
            "index": "not_analyzed"

(Trondhindenes) #6

...and Kibana agrees aswell:


(Trondhindenes) #7

So, I guess it boils down to: How come this works:

          "query": "sourcecontext: Api.Play.Agents.PlayAgent"

but this doesnt (eg, it gives zero hits?

          "query": "sourcecontext: Api.Play.Agents.Play*"

The field is confirmed string and "not analyzed"


(Michael McCandless) #8

Hmm, instead of passing a query string at search time, can you use the DSL queries to explicitly construct a wildcard query? This just simplifies the problem to bypass any analysis the query parser may be doing.

And yes the situation is indeed opposite of what I expect :slight_smile:

Mike McCandless


(Trondhindenes) #9

I can, but I can't expect my users do do use DSL queries instead of Kibana. It seems that (part of) the problem is how Kibana constructs it's queries. But you're right, the wildcard query type using the query api works perfectly.

Still not sure how to solve this one.


(Trondhindenes) #10

Just for ref, this seems to force Kibana to issue a wildcard query:

{"wildcard":{"sourcecontext": "Stuff*"}

However, leading wildcards do not seem to work:

{"wildcard":{"sourcecontext": "*uff*"}

(Michael McCandless) #11

OK the fact that the DSL query produces the correct results is good news ... it means something is going wrong when parsing your query.

Does anyone know if ES gives you a way to see what exact (Lucene level) query was created by parsing the query string?

Anyway, by default, wildcard terms are not analyzed by the query parser behind query string queries, so it's odd it's not working with query string.

But / is a reserved character ... try escaping it by prefixing with \?

Separately, I'm not sure why you see leading wildcard NOT working w/ a DSL query. Does it also not work if you run the query directly against ES (not through Kibana)?

Note that leading wildcard queries are exceptionally costly to run ... it requires Lucene to scan every unique term in the index.

Mike McCandless


(Trondhindenes) #12

Just to keep updating this thread in case anyone else stumbles upon the same: The following query seems to work in Kibana, and caters for special characters:

 "query": "requesturi: *\"web/asset*\"

That is, keep the wildcard outside the quotation marks which represents your query
In Kibana, that would be typed as:

requesturi: *"web/asset*"

This also seems to work:

requesturi: *"web/asset"*

(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.