Hi Alex,
html_strip is a char filter and I was mentioning standard_html_strip which
I believe it is an analyzer with html_strip.
Anyway, I tried your suggestion.
$ curl -XPUT http://localhost:9200/foo/bar/_mapping -d '{
"bar":
{ "properties" : {
"body": {"type":"string", "analyzer":"strip_html_analyzer" }
}
}
}'
{"ok":true,"acknowledged":true}
$ curl -XPUT localhost:9200/foo/bar/1 -d '{
"body" : "
"
}
'
{"ok":true,"_index":"foo","_type":"bar","_id":"1","_version":1}
$ curl -XGET localhost:9200/foo/bar/_search?q=color
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.06780553,"hits":[{"_index":"foo","_type":"bar","_id":"1","_score":0.06780553,
"_source" : {
"body" : "
"
}
}]}}
Calling to _analyze handler directly seems to be working fine.
$ curl -XPOST localhost:9200/foo/_analyze?analyzer=strip_html_analyzer -d
'
hello world
'
{"tokens":[{"token":"hello","start_offset":17,"end_offset":22,"type":"","position":1},{"token":"world","start_offset":23,"end_offset":28,"type":"","position":2}]}
The list of token does not include HTML tags.
Am I still doing something wrong?
Thanks,
2012년 6월 24일 일요일 오전 4시 32분 25초 UTC-4, Alexander Reelsen 님의 말:
Hi
there are two bugs in your configuration. You are treating the html_strip
filter as an analyzer, which does not work and you are indexing the mapping
wrong.
Put this in your elasticsearch.yml:
index:
analysis:
analyzer:
default:
type: standard
strip_html_analyzer:
type: custom
tokenizer: standard
filter: [standard]
char_filter: html_strip
Then correct setting the mapping (you need to add the type as well as
setting the right analyzer:)
curl -XPUT http://localhost:9200/foo/bar/_mapping -d '{
"bar":
{ "properties" : {
"body": {"type":"string", "analyzer":"strip_html_analyzer" }
}
}
}'
By checking the mapping with a GET on the above URL you will also see that
it is empty with your current configuration, so the mapping was not applied
(without getting back an error message, which is not too nice..)
Now index your document again and start searching for "strong" and "test"
and it should work.
--Alexander