Hello,
I'm running an instance of elasticsearch 1.3.2 on ubuntu server 14.04 on a
imac. I have the mapper-attachments plugin installed and elasticsearch gui
which I'm using for my front end.
It's possible that I am missing something here are all the things I've
tried so far:
I got the mapper-attachments plugin installed.
Then I created the index with mapping:
curl -XPUT 'http://localhost:9200/historicdata' -d
'{"mappings":{"docs":{"properties":{"content":{"type":"attachment"}}}}}'
now I use a php script to take the documents and convert the docs and
contents to base64
<?php
$root = '/home/aharmon/test';
$iters = new RecursiveIteratorIterator(new
RecursiveDirectoryIterator($root),
RecursiveIteratorIterator::CHILD_FIRST);
try {
foreach( $iters as $fullFileName => $iter ) {
$base64 = base64_encode($iter);
$indexarray = array ("File" => $base64);
$jsonarray = json_encode($indexarray);
file_put_contents("/home/aharmon/data.json", $jsonarray, FILE_APPEND);
}
}
catch (UnexpectedValueException $e) {
printf("Directory [%s] contained a directory we can not recurse into",
$root);
}
?>
Then I take my data.json file and implement the bulk API:
{"index": {"_index": "historicdata", "_type": "docs" } }
{"File":"L2hvbWUvYWhhcm1vbi90ZXN0L0EgUGx1cyAtIFN1bW1hcnkgYnkgVmVudWUucGRm"}
{"index": {"_index": "historicdata", "_type": "docs" } }
{"File":"L2hvbWUvYWhhcm1vbi90ZXN0L0EgUGx1cyAtIE1lZGlhIFBsYW4gU3VtbWFyeS54bHM="}
{"_index": "historicdata", "_type": "docs" } }
{"File":"L2hvbWUvYWhhcm1vbi90ZXN0L0EgUGx1cyAtIFN1bW1hcnkgYnkgVmVudWUueGxz"}
{"_index": "historicdata", "_type": "docs" } }
{"File":"L2hvbWUvYWhhcm1vbi90ZXN0L0FnZW5jaWVzIE1hc3RlciBMaXN0Lnhsc3g="}
This is in a separate folder called bulk-requests
Then I run this command:
curl -s -XPOST localhost:9200/_bulk --data-binary @bulk-requests; echo
I got a successful message back so it is all indexed.
Then I run this command:
curl -XGET 'http://localhost:9200/historicdata/docs/_search' '{"fields": [
"content.content_type" ], "query":{"match":{"content.content_type":"text
plain"}}}'
{"took":2,
"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},
"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"historicdata","_type":"docs","_id":"LMkqzKbyWTGffNtr1mGPZA","_score":1.0,"_source":{"File":"L2hvbWUvYWhhcm1vbi9-ZXN0L)EgUGx1cyAtIFN1bW1hcnkgYnkgVmVudWUucGRM"}},
{"_index":"historicdata","_type":"docs","_id":"GBEIWECwRgiUbYB6pnq7dQ","_score":1.0,"_source":{"File":"L2hvbWUvYWhhcm1vbi90ZXN0L0EgUGx1cyAtIE1lZGlhIFBsYW4gU3VtbWFyeS54bHM="}
}]}}
So it is indexing the documents and the search works but the contents isn't
being decoded from base64. Maybe there is a general rule with base64 that I
don't know that is assumed? I have followed the documentation religiously
on github and elasticsearch's site. Also when I decode the base64 within
the php script before I put it into the json array, it all says null. These
are .xlsx, .xls, and .pdf documents.
Thanks for your help guys, It is greatly appreciated.
Let me know if you need any more information than what I have provided.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c06948a0-5822-475e-9725-411fddaba903%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.