On Tue, 2013-02-26 at 12:09 +0100, Per Ekman wrote:
Alright, that is pretty much what we've done so far, but I'm looking
at getting "bro", "f", "jump"..... into the index, instead of the
endings,
You specified that you wanted ngrams of the last two characters, which
is why I set "side" to "back".
And possibly the original words as well.
Just make the edge ngrams long enough.
You may want to use a multi-field to have one field indexed with (eg)
the standard analyzer, and another indexed with edge-ngrams, and you can
query both of them in a single query, giving different boosts to each
clause
clint
On Tue, Feb 26, 2013 at 12:02 PM, Clinton Gormley
clint@traveljury.com wrote:
On Tue, 2013-02-26 at 02:45 -0800, Per Ekman wrote:
> Hi
>
>
> We are discussing building an index where possible
misspellings at the
> end of a word are getting hits.
>
>
> We were looking at using the EdgeNGram and making ngrams of
the last
> two characters, but that gives us an index of just the
2-character
> variations of the word endings.
>
>
> How would we best do this? Is it possible to configure the
inverse of
> that? Should we tokenize it with a regexp? Any other ideas?
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"filter" : {
"end_grams" : {
"max_gram" : 2,
"side" : "back",
"min_gram" : 2,
"type" : "edge_ngram"
}
},
"analyzer" : {
"end_grams" : {
"filter" : [
"standard",
"lowercase",
"stop",
"end_grams"
],
"tokenizer" : "standard"
}
}
}
}
}
'
curl -XGET
'http://127.0.0.1:9200/test/_analyze?pretty=1&text=The+quick
+brown+fox+jumped+over+the+lazy+dog&analyzer=end_grams'
# {
# "tokens" : [
# {
# "end_offset" : 9,
# "position" : 1,
# "start_offset" : 7,
# "type" : "word",
# "token" : "ck"
# },
# {
# "end_offset" : 15,
# "position" : 2,
# "start_offset" : 13,
# "type" : "word",
# "token" : "wn"
# },
# {
# "end_offset" : 19,
# "position" : 3,
# "start_offset" : 17,
# "type" : "word",
# "token" : "ox"
# },
# {
# "end_offset" : 26,
# "position" : 4,
# "start_offset" : 24,
# "type" : "word",
# "token" : "ed"
# },
# {
# "end_offset" : 31,
# "position" : 5,
# "start_offset" : 29,
# "type" : "word",
# "token" : "er"
# },
# {
# "end_offset" : 40,
# "position" : 6,
# "start_offset" : 38,
# "type" : "word",
# "token" : "zy"
# },
# {
# "end_offset" : 44,
# "position" : 7,
# "start_offset" : 42,
# "type" : "word",
# "token" : "og"
# }
# ]
# }
clint
--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearch
+unsubscribe@googlegroups.com.
For more options, visit
https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.