Thank you, but edge_ngram won't help me because hyphens might be several times anywhere in my texts.
When I was indexing my texts I set mappings like so:
Did you try my example with hyphens?
If so, please share what works and what does not as a full example which can be ran in Kibana Dev Console like I provided.
Yes, it works for the "U-298" example. However, if I index text like "aaaaaa-uuuuuu", the engine even can't find the "aaaaa" (or "uuu") part from it. =(
I just want to be able to delete some symbols (-_.,) from a string, and then search a request (also without such symbols) starting from any symbol in my string.
Yes, and that needs whitespace to work, which is why I suggested replacing with space rather than remove the characters.
When you remove the characters aaaaa-uuuuu will be tokenised as aaaaauuuuu, which means you can not search for either component. If you instead replace with space the whitespace tokenizer will tokenize it as aaaaa and uuuuu.
If aaaaa-uuuuu will be tokenised as aaaaauuuuu I can search any type as aaa, uuu, aauu, isn't it?
If I get separately aaaaa and uuuuu I won't be able to find aauu for example.
I just want to dismiss some characters from vendor codes because people don't type them at most, but I want to show them results regardless aaaaa, uuuuu, or aauu. Only the order is matter.
p.s. As the next step, by allowing 1-2 fuzzyness symbols I'm going to expand this feature.
Not necessarily. It depends on the mapping of the field. You could find substrings like in your example, but that would require a wildcard query, which is one of the most expensive and inefficient query type you can use in Elasticsearch.
If this is how you want to query your data, you might want to look into the wildcard field type in order to make these queries more efficient.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.