Serching requested files in Kibana which ends in a number and the file extension

YvorL · March 16, 2017, 4:23pm

I bumped into a strange issue. When I tried to look for the most requested documents (pdf) on one of the analyzed sites, I saw that there are some docs definitely missing. First I used "request:*.pdf" then I checked "request:pdf". That was the time I noticed that the wildcard request missed ALL documents which request ended in any number before '.pdf' such as 'calendar_2017.pdf'. Which is odd, because it is a string field so I don't understand how a number can cause this issue. Is there something I can do without reindexing the data?

Brandon_Kobel · March 16, 2017, 4:43pm

@YvorL would you mind posting the mapping for the specific field that you'd having issues searching on?

YvorL · March 16, 2017, 4:49pm

Is this what you're looking for?
"request": {
"type": "text",
"norms": false,
"fields": {
"keyword": {
"type": "keyword"
}
}
}

Brandon_Kobel · March 16, 2017, 7:16pm

@YvorL if you're using request:*.pdf in kibana in the querybar, it's translated into a query_string query against an analyzed text string.

The standard analyzer splits the following text into the following keywords:

calendar_2017.pdf -> calendar_2017, pdf
01302017.pdf -> 01302017, pdf
something.pdf -> something.pdf

The standard analyzer is generally meant for text fields, but it explains why *.pdf doesn't return anything for 1 and 2 above.

You should be able to use request.keyword: *.pdf which isn't executing the query against the analyzed field and should return what you're looking for.

If you are able to reindex your data, pulling out the extension either using a pattern analyzer or some other mechanism during ingest would be much more performant.

YvorL · March 16, 2017, 7:35pm

@Brandon_Kobel
I still don't see why the first two are separated to keywords if the last one isn't. My understanding is that if the text is continuous (and in this case, it'll be the URI) then a dot or an underscore won't act as keyword separator. It's a text field, and it should handle numbers as any other character.
Regardless, it seems that this is the intended way. I was also avoiding searching in a unanalyzed field with a leading asterisk because it won't be a one-time query. It leaves me to reindexing the data.

Thank you for taking your time!

Brandon_Kobel · March 16, 2017, 8:02pm

@YvorL unfortunately, the details of the tokenizer are out of my expertise, but I'll move this to the Elasticsearch forum and hopefully they can enlighten us both

system · April 13, 2017, 8:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Serching fields which ends in a number and a file extension Elasticsearch	1	527	May 1, 2017
Search query issues in Kibana Kibana	4	698	July 5, 2017
Search in HTTP request field Kibana	3	4116	November 10, 2017
Those wildcard searches... or is it me? Elasticsearch	7	944	November 26, 2018
Search on kibana discover dashboard Kibana	4	1155	December 28, 2017

Serching requested files in Kibana which ends in a number and the file extension

Related topics