I'm trying to make use of the attachments plugin. I've got the
following Mapping:
{
"docs":{
"properties" : {
"contents" : {
"type" : "attachment",
"fields" : {
"contents" : {"store" : "no"}
}
},
"lastModified": { "type" : "long", "index" :
"analyzed", "store" : "no"}
}
}
}
And the following index code:
XContentBuilder objectBuilder
= jsonBuilder().startObject();
objectBuilder.startObject(
Index.CONTENTS);
if (extension.equals("xml")){
objectBuilder.field("_content_type", MimeTypes.XML);
}
else {
objectBuilder.field("_content_type", MimeTypes.PLAIN_TEXT);
}
objectBuilder.field("_name",
file.getName());
objectBuilder.field("content",
Base64.encodeBase64(FileUtils.readFileToString(file).getBytes()));
objectBuilder.endObject();
objectBuilder.field(Index.LAST_MODIFIED, file.lastModified());
objectBuilder.endObject();
IndexRequestBuilder setSource
= client.prepareIndex(Index.INDEX,
Index.TYPE, file.getAbsolutePath()).setSource(objectBuilder);
setSource.execute().actionGet();
But when I look at the indexing on the server I see:
{
doc: {
properties: {
lastModified: {
index: "analyzed"
type: "long"
}
contents: {
path: "full"
type: "attachment"
fields: {
author: {
type: "string"
}
title: {
type: "string"
}
keywords: {
type: "string"
}
contents: {
type: "string"
}
date: {
format: "dateOptionalTime"
type: "date"
}
content_type: {
type: "string"
}
}
}
}
}
}
Basically, I don't really want to store the contents, just index the
documents and be able to search on them. I'm indexing files that are
on the computer already so I don't need the contents, and in fact it's
taking up a ton of space to have the contents in there.
Another question is, the contents seem to just be the base64. Is that
correct or am I doing something incorrectly.
I'm using this as a local machine file search mechanism for a large
art / document tree that each user has locally on their machines.
My results look like this (sorry for the redactions, it's proprietary info:
{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1663,"max_score":1.0,"hits":[{"_index":"docs","_type":"doc","_id":"the_art_redacted","_score":1.0,"fields":{"contents":{"content":"REALLY_LONG_BASE64_STRING","_name":"the_file_name_redacted","_content_type":"text/plain"}}}]}}
Any additional explanation of attachments will be quite helpful.
Thanks,
Mike