you should check attachment type:
Note that as of 0.12.0 the plugin needs to be installed in extracted form.
The best option is to use bin/plugin script to install plugins (or you can
do it manually, just create a "plugins" directory in ES HOME, unpack
particular plugin into this folder and you are done ... start ES).
But this way you will not get raw HTML in _source (it will be kept in base64
form). So either you can try decode it from result hits on client side or
you need to extract raw HTML before indexing, then escaping it to make it
JSON valid (shouldn't be that hard) and using html_strip filter (see
for more details:
http://github.com/elasticsearch/elasticsearch/issues/issue/315). However, I
did not try it myself yet.
On Mon, Oct 25, 2010 at 4:29 PM, Albin Stigo firstname.lastname@example.org wrote:
I have a bunch of html documents that I would like to index (around
3000, so not so many). I put the title as well as some other metadata
in separate properties but I would like to make the content searchable
as well, and I would also like to be able to display the orignal
document... and I would like to do this over JSON... But:
"JSON does not look like XML, so HTML text fed to a JSON parser will
produce an error."
So im having problem parsing my hits back so...
How do you guys solve this... do you strip out the html out of the
document and only index the plain text content and then pull the
original from another database (based on an indexed id) or are there
Sorry for the rather long post.