I have a field that stores the full path of a file on disk. The file's basename is unique and I'd like to be able to query for just the basename as well.
The simplest solution is to add a second field which contains the basename but I tried using an analyzer:
I created a path_tree_rev_tokenizer of type path_hierarchy with delimiter / and set it to reverse the order of tokenization, so /home/bob is tokenized as [home/bob, bob].
I created a path_tree_rev analyzer that uses this tokenizer.
I made a Text field called file.path with the path_tree_rev analyzer.
In a simple test separate from my main application code, I created a document with /home/bob in the file.path field and queried it with
"query": {
"match": {
"file.path": "bob"
}
}
and it matched my document successfully.
However when I literally copied and pasted the code into my main application and re-created my index with the new field and analyzer definitions, Elasticsearch is unable to find a "real" document by querying for the basename. The only difference I can see is that in my application the field name is source.file.path.
Is there something I'm doing wrong here, or is there a way to diagnose why the query is not succeeding?
So I discovered that I can dump all of the terms indexed for a given field. So I tried it out on my full application where searching by base name is not working.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.