Html stripped highlighted text from html Content field

Hey,

you could use the analyze API and the char_filter to get the extract text
back in parts, see HTML Strip charfilter test for ElasticSearch · GitHub
However elasticsearch does not store the text without the HTML somewhere as
a complete block, which you could read out. If you want to do that, you
would need to do it before indexing.

The char_filter is basically to make sure that a search for 'title' will
not include any web page which contains a '' tag.

Not a hundred percent sure if this was your question, so feel free to ask
further and where I might have misunderstood you.

--Alex

On Thu, Dec 19, 2013 at 8:55 PM, Adolfo Rodriguez pellyadolfo@yahoo.eswrote:

Hi, I searched documentation and internet but could not find any accurate
information on this.

I have a highlight query which is working properly:

SearchResponse response = getClient().prepareSearch()
.setIndices("myindex")
.setTypes("mytype")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders
.boolQuery()
.should(QueryBuilders.matchQuery("myfield", "house"))
)
.addHighlightedField("myfield", 250, 1)
.setFrom(0)
.setSize(25)
.execute()
.actionGet();

The query is fetching results from myfield which contains indexed HTML
content. Highlighted result contains HTML tags and would like to trip out
the HTML content response. I found the HTML Strip Char Filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html but
do not know what is the syntax to add it as a request analyzer in Java.

I have found examples in Java to create indices including the analyzerhttp://jaibeermalik.wordpress.com/2013/03/26/elasticsearch-text-analysis-for-content-enrichment/ but
none to include the analyzer in a java request which documentations sayshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.htmlis possible:

The index analysis module acts as a configurable registry of Analyzers
that can be used in order to both break indexed (analyzed) fields when a
document is indexed and process query strings

Any pointer to an example would be very appreciated.

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d83716b0-1461-4796-9d03-b7d7cb268ef7%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM840fyahaGQhXQR0nfWf0Y9z8kSXEQJbVETi6rb6R5tdg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.