Wildcard queries and custom templates

trondhindenes · November 11, 2016, 2:28pm

hi,
We're in the process of moving from the logstash builtin template to our own in order to have more control of our fields and data types. I'm seeing some problems with wildcard searches using our new index that I'd very much like to some expert advise on.
Firstly, our es template looks like this:

{
  "company_app_log": {
    "order": 0,
    "template": "company_app_log*",
    "settings": {
      "index": {
        "refresh_interval": "5s"
      }
    },
    "mappings": {
      "formattederror": {
        "dynamic_templates": [
          {
            "notanalyzed": {
              "mapping": {
                "index": "not_analyzed",
                "type": "string"
              },
              "match_mapping_type": "string",
              "match": "*"
            }
          }
        ],
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "errorid": {
            "type": "integer"
          }
        }
      },
      "telemetry": {
        "dynamic_templates": [
          {
            "notanalyzed": {
              "mapping": {
                "index": "not_analyzed",
                "type": "string"
              },
              "match_mapping_type": "string",
              "match": "*"
            }
          }
        ],
        "properties": {
          "@timestamp": {
            "type": "date"
          }
        }
      },
      "http_measurement": {
        "dynamic_templates": [
          {
            "notanalyzed": {
              "mapping": {
                "index": "not_analyzed",
                "type": "string"
              },
              "match_mapping_type": "string",
              "match": "*"
            }
          }
        ],
        "properties": {
          "httpstatuscode": {
            "type": "integer"
          },
          "@timestamp": {
            "type": "date"
          },
          "elapsedmilliseconds": {
            "type": "integer"
          }
        }
      }
    },
    "aliases": {}
  }
}

One document in the index looks like this:

{
	"_index": "company_log-uat-2016.11",
	"_type": "http_measurement",
	"_id": "AVhTY6gacmfALXNtkch5",
	"_score": null,
	"_source": {
		"Level": "Information",
		"@version": "1",
		"@timestamp": "2016-11-11T11:42:48.481Z",
		"count": 1,
		"fields": null,
		"beat": {
			"hostname": "Z49OS2SWB114T",
			"name": "Z49OS2SWB114T"
		},
		"input_type": "log",
		"tags": [
			"environment:uat",
			"servicefamily:services-external",
			"servicegroup:api",
			"beats_input_codec_json_applied",
			"http_measurement"
		],
		"offset": 3615,
		"type": "log",
		"host": "Z49OS2SWB114T",
		"logtype": "http_measurement",
		"computername": "Z49OS2SWB114T",
		"httpmethod": "GET",
		"requesturi": "https://company.com/api/web/asset/1595262/play?protocol=HLS",
		"httpstatusstring": "OK",
		"httpstatuscode": 200,
		"elapsedmilliseconds": 221
	}
}

I'm struggling to understand how I can wildcard search in Kibana using this data.
In regular "logstashed" indices I could for eample use this query:

requesturi.raw: "*web*"

but the corresponding

requesturi: "*web*"

doesnt provide any results from this index. If I remove the quotation marks it works, but that leaves me unable to search using any terms containing special characters, such as:

requesturi: "*api/web/asset/*"

Am I doing something wrong in my template?
For the record, we're not on ELK 5.0, we're still running the previous version of the stack.

trondhindenes · November 11, 2016, 5:42pm

I dug a bit deeper into this, and it turns out queries on an index where the default logstash template it used will have an analysed field "thing" and a non-analyzed field representing the same data ("thing.raw"). The non-analysed field shows the same behaviors as fields where my custom template is used, which is to be expected since they're both un-analyzed string fields.

What I struggle to understand is: I seem to be able to perform wildcard queries on a non-analyzed field, as long as I don't use any special characters. So, my question is: Do I have to use an analyzed field just to have support for wildcard queries containing special characters (like "/")?

mikemccand · November 11, 2016, 5:54pm

Are you sure your template is resulting in an un-analyzed string field for requesturi? You can ask ES for the mappings on the index to confirm what the template actually did.

I ask that because the default analyzer would split on characters like / which might explain why wildcards with / are not working.

You must use an un-analyzed field if you want wildcards like *api/web/asset* to work.

Mike McCandless

trondhindenes · November 11, 2016, 6:24pm

Thanks for responding Mike,
I can't seen to find anything in the documentation regarding how to list templates for a given index. I can list them globally, that's all.

However, like I wrote above a "regular" index using Logstash's built-in template shows the same symptom on the "raw" field (which is also non-analyzed) - I'm unable to use special characters together with wildcard searches on a "raw" field.

For example, this doesn't work (this is an index using the defualt logstash template):

uriStem.raw: "/client/1/*"

However, on the same index, this works fine:

uriStem.raw: "/client/1/catchup"

On the analyzed "version" of the same field, this works:

uriStem: "/client/1/*"

This seems to be the opposite of your explanation? (Not saying it's wrong, just trying to understand the behavior)

trondhindenes · November 11, 2016, 6:38pm

So I managed to find the mappings for the index, and everything looks correct there:
(subset):

          "messagelevel": {
            "type": "string",
            "index": "not_analyzed"
          },
          "offset": {
            "type": "long"
          },
          "requesturi": {
            "type": "string",
            "index": "not_analyzed"
          },
          "source": {
            "type": "string",
            "index": "not_analyzed"
          },
          "sourcecontext": {
            "type": "string",
            "index": "not_analyzed"
          },
          "tags": {
            "type": "string",
            "index": "not_analyzed"
          },
          "type": {
            "type": "string",
            "index": "not_analyzed"

trondhindenes · November 11, 2016, 6:46pm

...and Kibana agrees aswell:

trondhindenes · November 11, 2016, 6:53pm

So, I guess it boils down to: How come this works:

          "query": "sourcecontext: Api.Play.Agents.PlayAgent"

but this doesnt (eg, it gives zero hits?

          "query": "sourcecontext: Api.Play.Agents.Play*"

The field is confirmed string and "not analyzed"

mikemccand · November 11, 2016, 7:03pm

Hmm, instead of passing a query string at search time, can you use the DSL queries to explicitly construct a wildcard query? This just simplifies the problem to bypass any analysis the query parser may be doing.

And yes the situation is indeed opposite of what I expect

Mike McCandless

trondhindenes · November 11, 2016, 7:06pm

I can, but I can't expect my users do do use DSL queries instead of Kibana. It seems that (part of) the problem is how Kibana constructs it's queries. But you're right, the wildcard query type using the query api works perfectly.

Still not sure how to solve this one.

trondhindenes · November 11, 2016, 7:10pm

Just for ref, this seems to force Kibana to issue a wildcard query:

{"wildcard":{"sourcecontext": "Stuff*"}

However, leading wildcards do not seem to work:

{"wildcard":{"sourcecontext": "*uff*"}

mikemccand · November 11, 2016, 11:30pm

OK the fact that the DSL query produces the correct results is good news ... it means something is going wrong when parsing your query.

Does anyone know if ES gives you a way to see what exact (Lucene level) query was created by parsing the query string?

Anyway, by default, wildcard terms are not analyzed by the query parser behind query string queries, so it's odd it's not working with query string.

But / is a reserved character ... try escaping it by prefixing with \?

Separately, I'm not sure why you see leading wildcard NOT working w/ a DSL query. Does it also not work if you run the query directly against ES (not through Kibana)?

Note that leading wildcard queries are exceptionally costly to run ... it requires Lucene to scan every unique term in the index.

Mike McCandless

trondhindenes · November 14, 2016, 8:27am

Just to keep updating this thread in case anyone else stumbles upon the same: The following query seems to work in Kibana, and caters for special characters:

 "query": "requesturi: *\"web/asset*\"

That is, keep the wildcard outside the quotation marks which represents your query
In Kibana, that would be typed as:

requesturi: *"web/asset*"

This also seems to work:

requesturi: *"web/asset"*

system · December 12, 2016, 8:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does custom template work in elasticsearch? Elasticsearch	5	567	May 27, 2019
Template Problems Elasticsearch	7	4679	June 23, 2017
[closed] Trying to use a dynamic template / analyzed and not_analyzed fields Elasticsearch	2	488	July 5, 2017
Index Templates - Help Needed Elasticsearch	4	718	December 15, 2016
Default some fields to not_analyzed - Having problems Elasticsearch	3	504	July 6, 2017

Wildcard queries and custom templates

Related topics