I also used Clint's example and tried to map it to a document and search
the field, but still getting html in query results... Here is my code. I
appreciate the help.
//Tokenizer
PUT /foo/
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"test_1" : {
"char_filter" : [
"html_strip"
],
"tokenizer" : "standard"
}
}
}
}
}
}
//Mapping
PUT /foo/foo_type/_mapping
{
"foo_type":{
"properties" : {
"title": {
"type":"string",
"index": "analyzed",
"analyzer":"test_1"
}
}
}
}
Get /foo/foo_type/_mapping
{
"foo": {
"mappings": {
"foo_type": {
"properties": {
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"title": {
"type": "string",
"analyzer": "test_1"
}
}
}
}
}
}
////Index/////////////
PUT /foo/foo_type/1
{
"date" : "2009-11-15T14:12:12",
"title" : "The quick & brown fox"
}
//Search //////////
GET /foo/_search?pretty:true
{
"fields": ["title"],
"query": {
"query_string": {
"query": "brown",
"analyzer": "test_1"
}
}
}
//Results showing html tags still//////
"hits": [
{
"_index": "foo",
"_type": "foo_type",
"_id": "1",
"_score": 0.076713204,
"fields": {
"title": [
"The quick & brown fox"
]
}
On Thursday, August 7, 2014 6:06:56 PM UTC-4, Jörg Prante wrote:
Have you checked Clint's example?
HTML Strip charfilter test for ElasticSearch · GitHub
Jörg
On Thu, Aug 7, 2014 at 8:23 PM, IronMike <sabda...@gmail.com <javascript:>
wrote:
I would like to strip html tags for indexing. Here is a simple example I
tried so far, but doesn't seem to strip html tags. Any ideas what's missing?
//settings & Mappings
POST twitter
{
"mappings": {
"tweet" : {
"properties" : {
"message" : {
"type" : "string",
"analyzer": "strip_html_analyzer"
},
"date" : {
"type" : "date"
},
"name" : {
"type" : "string"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"strip_html_analyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":"standard",
"char_filter":"my_html"
}
},
"char_filter": {
"my_html":{
"type":"html_strip"
}
}
}
}
}
//Index a document
PUT /twitter/tweet/1
{
"name" : "mike",
"date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch, This is an html
test"
}
//query result for "html", I expect the query to return nothing since it
is supposed to strip the tag?
"hits": {
"total": 1,
"max_score": 0.11626227,
"hits": [
{
"_index": "twitter",
"_type": "tweet",
"_id": "1",
"_score": 0.11626227,
"fields": {
"message": [
"trying out Elasticsearch, This is an html
test"
]
},
"highlight": {
"message": [
"trying out Elasticsearch, This is an
html test"
]
}
}
]
}
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.