We have ElasticSearch 1.5 set up with a very simple mapping to perform full
text search in our docs (https://docs.giantswarm.io/). When searching for
"swarmvars" we get no hits, although "swarmvars.json" appears in documents.
The field "text" is used as a catch-all field for all searchable content
(title, document body, keywords). Here is the mapping:
When using the "english" analyzer on the text "Text containing
swarmvars.json and more", the result are these tokens:
text
contain
swarmvars.json
more
Having the token "swarmvars.json" is fine. What I need are two additional
tokens "swarmvars" and "json". How can I achieve that?
I was looking into creating a custom tokenizer, but I was unable to get it
to work (errors when applying the settings) and also I was unable to find
an example, no matter how I searched.
The standard one is more for english text which means that a dot need to have a space after it in order to be considered as a break between two tokens.
We have Elasticsearch 1.5 set up with a very simple mapping to perform full text search in our docs (https://docs.giantswarm.io/). When searching for "swarmvars" we get no hits, although "swarmvars.json" appears in documents.
The field "text" is used as a catch-all field for all searchable content (title, document body, keywords). Here is the mapping:
When using the "english" analyzer on the text "Text containing swarmvars.json and more", the result are these tokens:
text
contain
swarmvars.json
more
Having the token "swarmvars.json" is fine. What I need are two additional tokens "swarmvars" and "json". How can I achieve that?
I was looking into creating a custom tokenizer, but I was unable to get it to work (errors when applying the settings) and also I was unable to find an example, no matter how I searched.
The standard one is more for english text which means that a dot need to
have a space after it in order to be considered as a break between two
tokens.
Yes. Because « Hello. How are you? » is a sentence that can be broken in « hello », « how », « are », « you ».
But in « I paid it 2.50 euros », I would most likely keep « 2.50 » as a whole token.
The standard one is more for english text which means that a dot need to have a space after it in order to be considered as a break between two tokens.
Am Freitag, 29. Mai 2015 11:02:25 UTC+2 schrieb David Pilato:
Yes. Because « Hello. How are you? » is a sentence that can be broken in
« hello », « how », « are », « you ».
But in « I paid it 2.50 euros », I would most likely keep « 2.50 » as a
whole token.
So far, so easy. And my question is now: From a text "foo.bar", how can I
generate ALL of the following tokens?
Am Freitag, 29. Mai 2015 11:02:25 UTC+2 schrieb David Pilato:
Yes. Because « Hello. How are you? » is a sentence that can be broken in « hello », « how », « are », « you ».
But in « I paid it 2.50 euros », I would most likely keep « 2.50 » as a whole token.
So far, so easy. And my question is now: From a text "foo.bar", how can I generate ALL of the following tokens?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.