After a bit of playing I did this:
$ curl -XPUT 'http://127.0.0.1:9200/test/website4/_mapping?pretty=1' -d '
{
"website" : {
"properties" : {
"uri" : {
"type" : "string",
"include_in_all" : 0,
"index_analyzer" : "ascii_ngram",
"search_analyzer" : "ascii_std"
}
}
}
}
'
{
"ok" : true,
"acknowledged" : true
}
$ curl -XPOST 'http://127.0.0.1:9200/test/website4?pretty=1' -d '
{
"uri" : "http://www.heise.de"
}
'
{
"ok" : true,
"_index" : "test",
"_type" : "website4",
"_id" : "j3KS0Py1TWC7JjnOWeeCIg",
"_version" : 1
}
$ curl -XPOST 'http://127.0.0.1:9200/test/_refresh'
{"ok":true,"_shards":{"total":10,"successful":5,"failed":0}}
$ curl -XGET 'http://127.0.0.1:9200/test/website4/_search?pretty=1' -d '
{
"query" : {
"field" : {
"uri" : "heis"
}
}
}
'
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
*"hits" : {
"total" : 1,
"max_score" : 0.28602687,
"hits" : [ {
"_index" : "test",
"_type" : "website4",
"_id" : "j3KS0Py1TWC7JjnOWeeCIg",
"_score" : 0.28602687, "_source" :
{
"uri" : "http://www.heise.de"
}
} ]
}*
}
Putting a refresh in there works.
On Sun, May 8, 2011 at 4:17 PM, hukl jpbader@gmail.com wrote:
Hey there,
I have already spent half the day in the irc channel and karmi as well
as clintongormley were really helpful but in the end my example still
does not work as expected.
What I want to achieve:
I have docs which include an uri field like
{ uri : "http://www.foobar.com" }
And I want to be able to find that doc by searching for "foo"
{ uri : "http://www.mylatestwebsite.com" }
And I want to be able to find it by searching for "latest"
Now with the standard analyzer / tokenizer it doesn't split the domain
after the dots as there is no whitespace and therefore the dots are
considered as part of the token. Also its quite common for domains to
be not seperated by tokens like the one above.
Now collin, karmi and kimchy suggested using ngram for this and collin
even provided an example which unfortunately did not work. So I
produced a minimal example based on collins which I would expect to
work but doesn't and I'd love to hear about any suggestions how to
make this work.
My ES Session for this Problem looks like this:
https://gist.github.com/gists/961418
Kind regards, John
--
Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy