Hi! I'm new to ElasticSearch and having trouble trying to get the
standard_html_strip analyzer to work.
As a simplified example of what I'm trying to do, take a look at
In that example, I create a mapping with a single string field "body"
using the standard_html_strip analyzer. Then I index one document
with a "strong" tag in the body.
When I do a search for "strong", I expect that I'll get zero hits, but
instead I get a hit.
I'm using the latest stable release (0.18.5).
Thanks for any help or advice!
In that example, I create a mapping with a single string field "body"
using the standard_html_strip analyzer. Then I index one document
with a "strong" tag in the body.
When I do a search for "strong", I expect that I'll get zero hits, but
instead I get a hit.
I'm using the latest stable release (0.18.5).
Thanks for any help or advice!
In that example, I create a mapping with a single string field "body"
using the standard_html_strip analyzer. Then I index one document
with a "strong" tag in the body.
When I do a search for "strong", I expect that I'll get zero hits, but
instead I get a hit.
I'm using the latest stable release (0.18.5).
Thanks for any help or advice!
I tried adding the suggested configuration to elasticsearch.yml, but
it still seems that HTML isn't being stripped.
I'm stumped enough to give up on this approach for now and extract the
plain text from my HTML fields before indexing.
I still wish I knew how to make this work. I like the idea of off-
loading the html stripping from my application.
In that example, I create a mapping with a single string field "body"
using the standard_html_strip analyzer. Then I index one document
with a "strong" tag in the body.
When I do a search for "strong", I expect that I'll get zero hits, but
instead I get a hit.
I'm using the latest stable release (0.18.5).
Thanks for any help or advice!
Just to make sure: The html striping provided as as part of the analysis
process will not cause them to be stripeed from the _source, just the
indexed terms will be stripeed from html.
I tried adding the suggested configuration to elasticsearch.yml, but
it still seems that HTML isn't being stripped.
I'm stumped enough to give up on this approach for now and extract the
plain text from my HTML fields before indexing.
I still wish I knew how to make this work. I like the idea of off-
loading the html stripping from my application.
In that example, I create a mapping with a single string field "body"
using the standard_html_strip analyzer. Then I index one document
with a "strong" tag in the body.
When I do a search for "strong", I expect that I'll get zero hits,
but
instead I get a hit.
I'm using the latest stable release (0.18.5).
Thanks for any help or advice!
Right. I'm not surprised to see the html markup in the stored values.
But I am surprised to get a search hit for "strong" when that string
only occurs as an element name.
Just to make sure: The html striping provided as as part of the analysis
process will not cause them to be stripeed from the _source, just the
indexed terms will be stripeed from html.
I tried adding the suggested configuration to elasticsearch.yml, but
it still seems that HTML isn't being stripped.
I'm stumped enough to give up on this approach for now and extract the
plain text from my HTML fields before indexing.
I still wish I knew how to make this work. I like the idea of off-
loading the html stripping from my application.
In that example, I create a mapping with a single string field "body"
using the standard_html_strip analyzer. Then I index one document
with a "strong" tag in the body.
When I do a search for "strong", I expect that I'll get zero hits,
but
instead I get a hit.
I'm using the latest stable release (0.18.5).
Thanks for any help or advice!
I need to strip HTML before inserting into source and just hooked into
the filter directly in my app. I did this because I needed to
guarantee the highlight blurbs I returned contained valid HTML.
However, it is probably a relatively common case for users to want to
strip HTML before adding to source. Might be a good feature?
Right. I'm not surprised to see the html markup in the stored values.
But I am surprised to get a search hit for "strong" when that string
only occurs as an element name.
Just to make sure: The html striping provided as as part of the analysis
process will not cause them to be stripeed from the _source, just the
indexed terms will be stripeed from html.
I tried adding the suggested configuration to elasticsearch.yml, but
it still seems that HTML isn't being stripped.
I'm stumped enough to give up on this approach for now and extract the
plain text from my HTML fields before indexing.
I still wish I knew how to make this work. I like the idea of off-
loading the html stripping from my application.
In that example, I create a mapping with a single string field "body"
using the standard_html_strip analyzer. Then I index one document
with a "strong" tag in the body.
When I do a search for "strong", I expect that I'll get zero hits,
but
instead I get a hit.
I'm using the latest stable release (0.18.5).
Thanks for any help or advice!
Stripping before "adding" the source on elasticsearch side might get into
the area where elasticsearch starts doing things that are not in its realm,
which complicates it.
I need to strip HTML before inserting into source and just hooked into
the filter directly in my app. I did this because I needed to
guarantee the highlight blurbs I returned contained valid HTML.
However, it is probably a relatively common case for users to want to
strip HTML before adding to source. Might be a good feature?
Right. I'm not surprised to see the html markup in the stored values.
But I am surprised to get a search hit for "strong" when that string
only occurs as an element name.
Just to make sure: The html striping provided as as part of the
analysis
process will not cause them to be stripeed from the _source, just the
indexed terms will be stripeed from html.
I tried adding the suggested configuration to elasticsearch.yml, but
it still seems that HTML isn't being stripped.
I'm stumped enough to give up on this approach for now and extract
the
plain text from my HTML fields before indexing.
I still wish I knew how to make this work. I like the idea of off-
loading the html stripping from my application.
In that example, I create a mapping with a single string field
"body"
using the standard_html_strip analyzer. Then I index one
document
with a "strong" tag in the body.
When I do a search for "strong", I expect that I'll get zero
hits,
but
instead I get a hit.
I'm using the latest stable release (0.18.5).
Thanks for any help or advice!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.