I am trying to index an attachment PDF. I am using 0.17.0 (built from source). My index and mapping is created like this:
curl -XPOST 10.181.18.66:9200/test -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"docitem" : {
"properties" : {
"my_attachment" : { "type" : "attachment" }
}
}
}
}'
I have installed the mapper-attachment and analysis-icu plugins.
[2011-06-02 17:43:45,571][INFO ][plugins ] [Day, Wilbur] loaded [mapper-attachments, analysis-icu], sites []
I then wrote a small java program to feed the document of interest. I tried sending in raw bytes and also sending in base64 encoded bytes when I called the API. Here is the relevant code:
public void index(Client client, String filename) throws ElasticSearchException, IOException {
IndexResponse response = client.prepareIndex("test", "docitem", "1")
.setSource(jsonBuilder()
.startObject()
.field("_content_type", "application/pdf")
.field("_name", filename)
.field("attachment", getBytes(filename)) // getBase64(filename)
.endObject()
)
.execute()
.actionGet();
}
When I search for the document, I don't see any of the attachment meta-data being created. Also, searches for any words in the content don't result in any hits. Am I missing something?
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "docitem",
"_id" : "1",
"_score" : 1.0, "_source" : {"_content_type":"application/pdf","_name":"test.pdf","attachment":"JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovUHJvZHVjZXIgKEFwYWN