I am using ES 5.5.1. I have data which includes a path attribute. That attribute has values like foo/bar/file.txt. I want to find files by folder. For example, I'd like to search for foo/bar/ and find foo/bar/file.txt (as well as foo/bar/file2.txt, but not files in sub-folders like foo/bar/baz/file.txt). I would also like to aggregate and sort by folder. I think the most efficient way to do this is to have a keyword sub-field that is indexed with the folder path. However, I'm having trouble searching on such a value.
Consider the following index create command.
PUT path_index
{
"settings": {
"index": {
"analysis": {
"char_filter": {
"folder_filter": {
"pattern": "(.*/)[^/]+",
"type": "pattern_replace",
"replacement": "$1"
}
},
"analyzer": {
"folder": {
"tokenizer": "keyword",
"char_filter": [
"folder_filter"
]
}
},
"normalizer": {
"folder": {
"char_filter": [
"folder_filter"
]
}
}
}
}
},
"mappings": {
"pathType": {
"properties": {
"path": {
"type": "keyword",
"fields": {
"folder": {
"type": "text",
"analyzer": "folder",
"fielddata": true
},
"folderKeyword": {
"type": "keyword",
"normalizer": "folder"
}
}
}
}
}
}
}
The above creates a "folder" text sub-field which keeps a folder value. (foo/bar/ for foo/bar/file.txt). That does what I want and searches work correctly. I also have a sub-field "folderKeyword" which does the same thing but with a keyword. "folderKeyword" does not work as I expect.
I add one document to my index
PUT path_index/pathType/0
{
"path": "foo/bar/file.txt"
}
Then this search finds that document.
POST path_index/_search
{
"query": {
"term": {"path.folder": "foo/bar/"}
}
}
But this search fails to find the document.
POST path_index/_search
{
"query": {
"term": {"path.folderKeyword": "foo/bar/"}
}
}
I don't see the difference. Why does "folderKeyword" fail?
Additional info: A prefix search of "path.folder": "foo/bar/" works but "path.folderKeyword": "foo/bar/" fails. However a prefix search without the slash "path.folderKeyword": "foo/bar" succeeds. A term search without the trailing slash "path.folderKeyword": "foo/bar" fails. A sort on "path.folder" and on "path.folderKeyword" both show the same sort value: "foo/bar/".