Hi!
I have a piece of text, let's say:
"<p><br/>台灣人,奧運直播,使用PPStream,(PPS網路電視),觀看同步奧運實況</b>!"
I succeed to index this text without the html tags, using "strip_html". Now I'm trying to store this text without the HTML tags:
PUT test
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"analysis": {
"analyzer": {
"ch_analyzer": {
"tokenizer": "icu_tokenizer",
"char_filter": [ "html_strip" ]
}
}
}
},
"mappings": {
"qa": {
"properties": {
"comment_desc": {
"type": "text",
"analyzer": "ch_analyzer"
},
"article_title": {
"type": "text",
"analyzer": "ch_analyzer"
},
"article_desc": {
"type": "text",
"analyzer": "ch_analyzer"
}
}
},
"sport": {
"properties": {
"title": {
"type": "text",
"analyzer": "ch_analyzer"
},
"content": {
"type": "text",
"analyzer": "ch_analyzer"
}
}
}
}
}
This is my mappings and settings. What should I change to be able to store this text without its HTML tags? Is that possible? Should I preprocess my document beforehand?