Using tika's html mapper in attachment plugin


(elyrank) #1

Hi,

I read that Tika has an ability to discard some elements , by using:
*isDiscardElementhttp://tika.apache.org/1.2/api/org/apache/tika/parser/html/HtmlMapper.html#isDiscardElement(java.lang.String)
*
in HtmlMapper

I was wondering if there is any way of adding this as a property to the
attachment plugin to discard certain elements in html

for example - when indexing some html files that all have the same menu or
headers

--
Thanks,
Elyran

--
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2