I have seen that once. The common error is that you don’t start with a clean
index and mapping definition.
You should check that your mapping is the right one.
If your field is analyzed as a String, it won’t work.
OK, if I encode using this instead, elasticsearch seems to accept the
document OK:
coded=base64 -w 0 fn6742.pdf | perl -pe 's/\n/\\n/g'
...but when I search I get zero hits, same as when I tried indexing a simple
string in one of my previous posts.
It really looks like the attachment plugin isn't working any magic with the
document - is there any way to determine this?
On Thursday, October 18, 2012 8:48:04 AM UTC+1, cocowalla wrote:
I get the same Unexpected end-of-input in VALUE_STRING\n error when running
in Windows (which is the only option available to me) 
Would you be able to attach a copy of the json.file generated by the script?
I'd like to compare it to the output I get when running the script, as I
think the problem may be in the Base64 encoding of the PDF file on Windows.
On Wednesday, October 17, 2012 7:02:29 PM UTC+1, David Pilato wrote:
I just tested the gist and everything is working fine with ES 0.19.10 and
attachment 1.6.0.
I updated the gist a little here with the installation process of ES and
plugin: https://gist.github.com/3907010
I ran it on a Linux VM (ubuntu) under windows which is better than using
cygwin (IMHO).
Much appreciated, thank you 
On Thursday, October 11, 2012 8:59:17 PM UTC+1, David Pilato wrote:
Sorry. I didn't find spare time to work on it today.
I will try to test it before monday.
Le 11 oct. 2012 à 10:01, David Pilato da...@pilato.fr a écrit :
As far as I remember, I was able to play it in the past.
I will try it again in some hours and will report back here.
Stay tuned 
Le 11 octobre 2012 à 09:49, cocowalla colin.an...@googlemail.com a écrit :
Just tried it with an older version of the attachment plugin, 1.4.0 (which
uses an older version of Tika), and got the same result 
Any ideas how I can try to diagnose the problem? Do the steps in the guide
work for anyone else?
On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:
Hi David,
This was the query used in the sample, so I had expected it to work:
"fields" : [
"query" : {
"query_string" : {
"query" : "elephant"
"highlight" : {
"fields" : {
"file" : {}
If I remove the "fields", I get the same result. Same if I try highlighting
the "title" field.
I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:
"file" : "VGVzdA=="
Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the same
result; ElasticSearch accepts it, and nothing is logged in the log file. So
when searching for anything, like 'elephant' it always gives me this same
document. If I actually search for 'Test', it does the same, and there is no
highlighting like there is in the example
.html> .
"query" : {
"query_string" : {
"query" : "elephant"
"highlight" : {
"file" : {}
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
hits : {
total : 1
max_score : 1
hits : [
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
Is there anything I can configure to get more out of the logs to try and
find out what is wrong?
On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:
What you see is not what you get 
_source will always contain your document as you sent it to ES.
Can you remove "fields" in your query?
I'm wondering if you can highlight a field ("file") that you don't ask for
(only "title")?
Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a écrit
Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).
If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:
base64 - w 0 fn6742 . pdf
ElasticSearch also seems to hoover up the file OK, but when I try searching
"fields" : [
"query" : {
"query_string" : {
"query" : "elephant"
"highlight" : {
"fields" : {
"file" : {}
I always get returned the document, without any highlighting, regardless of
what query I use (not it is "elphant" above!). Here is what the result look
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
hits : {
total : 1
max_score : 1
hits : [
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:
file (which is a string containing the Base64 encoded file)
It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?
On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:
Sorry for my previous answer. I did not see that you have encoded in base 64
That said, does you json.file looks correct?
I mean: are you able to decode it? http://decode.urih.com/
Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :
You have to encode your file in base64 and put the encoded string in a
See mapper attachment plugin docs.
Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a écrit :
I'm running on Windows, and using Cygwin I've been trying the attachment
.html> , and have tried using the supplied example script
https://gist.github.com/1075067 .
Everything is fine up until:
curl - X POST "${host}/test/attachment/" - d @json . file
Which gives this error:
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
Looking in elasticsearch.log I see:
[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index ]
[ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ], s [ STARTED
]: Failed to execute [ index {[ test ][ attachment ][ IU5tOrzySKylO -
ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard .
prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction .
performOnPrimary ( TransportShardReplicationOperationAction . java : 532 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction$1 . run (
TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase .
loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406 )
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json . JsonXContentParser .
binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment . AttachmentMapper
. parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper . parse
( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more
I've also attached the file.json that is generated by the script.
Any ideas what could be wrong?
