I have been trying to use the attachment plugin to index HTML documents. I
want to do a couple things to start to build an understanding of: a) how to
control a mapping b) control what gets stored and what does not. Seems
like this should be trivial, but after several hours of searching and
experimenting with various mapping definitions, nothing seems to have
affected the index.
I would like to store the Base64 Un-encoded document, NOT the Base64
Encoded document, and using the char_filter, strip out the markup.
Here is what I have been using to create my mapping:
curl -XPUT http://localhost:9200/myindex/htmldoc/_mapping -d '{
"htmldoc": {
"properties": {
"_source": {
"enabled": "false"
},
"contents": {
"type": "string",
"analyzer": "htmlContentAnalyzer",
"store": "yes"
},
"file": {
"type": "attachment",
"fields": {
"file": {
"store": "no"
},
"date": {
"store": "yes"
},
"author": {
"store": "yes"
}
}
},
"header-Connection": {
"type": "string"
},
"header-Content-Length": {
"type": "string"
},
"header-Content-Type": {
"type": "string"
},
"header-Keep-Alive": {
"type": "string"
},
"header-Server": {
"type": "string"
},
"header-Transfer-Encoding": {
"type": "string"
},
"header-Vary": {
"type": "string"
}
}
}
}
I have the attachment plugin installed, and I know its installed because
the first time I tried to create this mapping, I got an exception that is
well documented that basically meant that I did not have the attachment
plugin installed. I installed the 1.6 version of the plugin because the
github site seemed to indicate that this was the supported version for my
ES instance (0.20.5). After installing the plugin, this exception went
away, so I know the plugin is installed.
For now when I search, when results are returned, I want some of the HTML
to show in the results set.
--mike
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.