These files represent a series, so the next file name in the same series(sequence) might be abc-01_10, and the one after abc-01_15 and so on. Using the file name I can know that these files are related to the sequence abc-01.
Files of another sequence will have another prefix for example: bsa-02_00. In general, the first part (some letter - some numbers) represents the sequence name and the rest (_some numbers) represents some id for the file in the sequence.
What I want to do is to search for files related to the same sequence without the need to add an extra field to my index; if I search in Kibana's search bar for abc-01_* it should return all files of the sequence abc-01.
However, having "-" in the file name breaks this search and I get everything starts with abc not abc-01. How can I solve this? Should I create some analyzer for the file name or what is the way to go?
Hello @M.alsioufi - your suspicion is right, it's more of an Elasticsearch question. In any case please check out this discussion as it might help with your problem.
I would indeed recommending parsing this out at index time and storing it separately. Doing this a query time is likely to scale badly, and you could end up seeing bad performance for larger data volumes.
But this is a simple textual search for a field of type text, which is the main task of Elasticsearch, I think it should not be hard to be done online, or?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.