11 index :
12 analysis :
13 analyzer:
14 descriptionAnalyzer:
15 type: custom
16 tokenizer: standard
17 filter: standard
18 char_filter: html_strip
And in the mappings I pointed the field that I wanted to this
analyzer
"description" : { "type" : "string", "index" : "analyzed",
"analyzer" : "descriptionAnalyzer" }
I confirmed that it was used after few indexed docs
What do you mean that you get HTML in your search results? You get them as
part of the _source? If so, then it makes sense, since the _source is just
the document you indexed.
11 index :
12 analysis :
13 analyzer:
14 descriptionAnalyzer:
15 type: custom
16 tokenizer: standard
17 filter: standard
18 char_filter: html_strip
And in the mappings I pointed the field that I wanted to this
analyzer
"description" : { "type" : "string", "index" : "analyzed",
"analyzer" : "descriptionAnalyzer" }
I confirmed that it was used after few indexed docs
I submitted something like this a few months back. The HTML script character filter just removes the items from the index, but not from the stored _source / value.
We use JSoup to remove HTML entries before indexing on the client side.
I spoke too soon, seems that I had a brain fart. After posting the
message I started playing with the analyzer and indeed the analyzer
does its job just right. I gisted it here for the record
For a moment I thought that playing with the analyzer + setting
store=yes will also get rid off the html on my source, which was a
simple dirty way to remove unwanted formatting for my view. But indeed
this doesn't make any sense
What do you mean that you get HTML in your search results? You get them as
part of the _source? If so, then it makes sense, since the _source is just
the document you indexed.
11 index :
12 analysis :
13 analyzer:
14 descriptionAnalyzer:
15 type: custom
16 tokenizer: standard
17 filter: standard
18 char_filter: html_strip
And in the mappings I pointed the field that I wanted to this
analyzer
"description" : { "type" : "string", "index" : "analyzed",
"analyzer" : "descriptionAnalyzer" }
I confirmed that it was used after few indexed docs
I submitted something like this a few months back. The HTML script character
filter just removes the items from the index, but not from the stored
_source / value.
We use JSoup to remove HTML entries before indexing on the client side.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.