Sending Attachments: Unexpected end-of-input in VALUE_STRING


(cocowalla) #1

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorialhttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html,
and have tried using the supplied example scripthttps://gist.github.com/1075067
.

Everything is fine up until:

curl -X POST "${host}/test/attachment/" -d @json.file

Which gives this error:

{
"error":"MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; ","status":400
}

Looking in elasticsearch.log I see:

[2012-10-10 11:10:36,239][DEBUG][action.index ] [Ahura] [test][0
], node[sZrRrdUZQASK0wZD1AcU3Q], [P], s[STARTED]: Failed to execute [index
{[test][attachment][IU5tOrzySKylO-ZBNQO_Og], source[{"file":"JVBERi0...BASE64
CROPPED FOR BREVITY!"]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.
java:509)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.
java:438)
at org.elasticsearch.index.shard.service.InternalIndexShard.
prepareCreate(InternalIndexShard.java:288)
at org.elasticsearch.action.index.TransportIndexAction.
shardOperationOnPrimary(TransportIndexAction.java:210)
at org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction.
performOnPrimary(TransportShardReplicationOperationAction.java:532)
at org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(
TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException:
Unexpected end-of-input in VALUE_STRING
at [Source: [B@195fb8e; line: 1, column: 1020531]
at org.elasticsearch.common.jackson.core.JsonParser._constructError(
JsonParser.java:1284)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase.
_reportError(ParserMinimalBase.java:588)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase.
_reportInvalidEOF(ParserMinimalBase.java:521)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase.
_reportInvalidEOF(ParserMinimalBase.java:515)
at org.elasticsearch.common.jackson.core.base.ParserBase.
loadMoreGuaranteed(ParserBase.java:432)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
_decodeBase64(UTF8StreamJsonParser.java:2875)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
getBinaryValue(UTF8StreamJsonParser.java:406)
at org.elasticsearch.common.jackson.core.JsonParser.getBinaryValue(
JsonParser.java:1029)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.binaryValue
(JsonXContentParser.java:138)
at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(
AttachmentMapper.java:276)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(
ObjectMapper.java:598)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper
.java:459)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.
java:494)
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--


(David Pilato) #2

You have to encode your file in base64 and put the encoded string in a field.

See mapper attachment plugin docs.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla colin.anderson333@googlemail.com a écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment tutorial, and have tried using the supplied example script.

Everything is fine up until:

curl -X POST "${host}/test/attachment/" -d @json.file

Which gives this error:

{
"error":"MapperParsingException[Failed to parse]; nested: JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source: [B@195fb8e; line: 1, column: 1020531]]; ","status":400
}

Looking in elasticsearch.log I see:

[2012-10-10 11:10:36,239][DEBUG][action.index ] [Ahura] [test][0], node[sZrRrdUZQASK0wZD1AcU3Q], [P], s[STARTED]: Failed to execute [index {[test][attachment][IU5tOrzySKylO-ZBNQO_Og], source[{"file":"JVBERi0...BASE64 CROPPED FOR BREVITY!"]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:509)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:438)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:288)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:210)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException: Unexpected end-of-input in VALUE_STRING
at [Source: [B@195fb8e; line: 1, column: 1020531]
at org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1284)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:588)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:521)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:515)
at org.elasticsearch.common.jackson.core.base.ParserBase.loadMoreGuaranteed(ParserBase.java:432)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._decodeBase64(UTF8StreamJsonParser.java:2875)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.getBinaryValue(UTF8StreamJsonParser.java:406)
at org.elasticsearch.common.jackson.core.JsonParser.getBinaryValue(JsonParser.java:1029)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.binaryValue(JsonXContentParser.java:138)
at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:276)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:598)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:494)
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

<json.file>

--


(David Pilato) #3

Sorry for my previous answer. I did not see that you have encoded in base 64
before.

That said, does you json.file looks correct?
I mean: are you able to decode it? http://decode.urih.com/
http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato david@pilato.fr a écrit :

You have to encode your file in base64 and put the encoded string in a field.

See mapper attachment plugin docs.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.anderson333@googlemail.com
mailto:colin.anderson333@googlemail.com > a écrit :

 > >      I'm running on Windows, and using Cygwin I've been trying the
 > > attachment tutorial
 > > <http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html>
 > > , and have tried using the supplied example script
 > > <https://gist.github.com/1075067> .
 Everything is fine up until:

 curl - X POST "${host}/test/attachment/" - d @json . file

 Which gives this error:

 {
     "error" : "MapperParsingException[Failed to parse]; nested:

JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

 Looking in elasticsearch.log I see:

 [ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index
        ] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P

], s [ STARTED ]: Failed to execute [ index {[ test ][ attachment ][
IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED FOR
BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed
to parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service .
InternalIndexShard . prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction .
performOnPrimary ( TransportShardReplicationOperationAction . java : 532 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction$1 . run (
TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base .
ParserBase . loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406 )
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json .
JsonXContentParser . binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment .
AttachmentMapper . parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more

 I've also attached the file.json that is generated by the script.

 Any ideas what could be wrong?



 --
 <json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(cocowalla) #4

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 -w 0 fn6742.pdf

ElasticSearch also seems to hoover up the file OK, but when I try searching
using:

{
"fields": [
"title"
],
"query": {
"query_string": {
"query": "elephant"
}
},
"highlight": {
"fields": {
"file": {}
}
}
}

I always get returned the document, without any highlighting, regardless of
what query I use (not it is "elphant" above!). Here is what the result look
like:

{
took: 1
timed_out: false
_shards: {
total: 1
successful: 1
failed: 0
}
hits: {
total: 1
max_score: 1
hits: [
{
_index: test
_type: attachment
_id: -NEqgDIcTIy403EWQ4uwVQ
_score: 1
_source: {
file: BASE64-FILE-CONTENTS-HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in
base 64 before.

That said, does you json.file looks correct?
I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato <da...@pilato.fr <javascript:>>
a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com<javascript:>>
a écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorialhttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html,
and have tried using the supplied example scripthttps://gist.github.com/1075067.

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index
] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ],
s [ STARTED ]: Failed to execute [ index {[ test ][ attachment ][
IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED
FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard
. prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction .
performOnPrimary ( TransportShardReplicationOperationAction . java : 532 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction$1 . run
( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase .
loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875
)
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406
)
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json . JsonXContentParser
. binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment .
AttachmentMapper . parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(David Pilato) #5

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.
Can you remove "fields" in your query?
I'm wondering if you can highlight a field ("file") that you don't ask for (only
"title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla colin.anderson333@googlemail.com a écrit
:

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try searching
using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless of
what query I use (not it is "elphant" above!). Here is what the result look
like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in
base 64 before.

That said, does you json.file looks correct?
I mean: are you able to decode it? http://decode.urih.com/
http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

> > >     You have to encode your file in base64 and put the encoded
> > > string in a field.
See mapper attachment plugin docs.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a

écrit :

    > > > >         I'm running on Windows, and using Cygwin I've been
    > > > > trying the attachment tutorial
    > > > > <http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html>
    > > > > , and have tried using the supplied example script
    > > > > <https://gist.github.com/1075067> .
    Everything is fine up until:

    curl - X POST "${host}/test/attachment/" - d @json . file

    Which gives this error:

    {
        "error" : "MapperParsingException[Failed to parse]; nested:

JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

    Looking in elasticsearch.log I see:

    [ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index
        ] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ],

[ P ], s [ STARTED ]: Failed to execute [ index {[ test ][ attachment ][
IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED
FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException :
Failed to parse
at org . elasticsearch . index . mapper . DocumentMapper .
parse ( DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper .
parse ( DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service .
InternalIndexShard . prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index .
TransportIndexAction . shardOperationOnPrimary ( TransportIndexAction .
java : 210 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction .
performOnPrimary ( TransportShardReplicationOperationAction . java : 532
)
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction$1 .
run ( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker
( ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker .
run ( ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core .
JsonParser . _constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base .
ParserBase . loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java :
2875 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java :
406 )
at org . elasticsearch . common . jackson . core .
JsonParser . getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json .
JsonXContentParser . binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment .
AttachmentMapper . parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object .
ObjectMapper . serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object .
ObjectMapper . parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper .
parse ( DocumentMapper . java : 494 )
... 8 more

    I've also attached the file.json that is generated by the

script.

    Any ideas what could be wrong?



    --



> > > 
    <json.file>
> > > 
--

--
David Pilato
http://www.scrutmydocs.org/ http://www.scrutmydocs.org/
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(Christian Th.) #6

I had problems using the attachment plugin, too. I think because i hadn't
used it the right way, but couldn't find the right documentation for me.
So i decided not to use it.
If you are a java-developer, you can use tika to extract content on the
clientside. Then you can send the file as base64 and the extracted content
to ElasticSearch.
This works for me. I like to have more control over the process of
extracting, storing or splitting the file over indices. If the returned
base64 file-String (on client side) contains "\r\n" you have to replace it
on your own.

If you decide to go this way i can give you further hints.

--


(David Pilato) #7

Heya,

I have been using attachment plugin for more than 6 months in production and it
works like a charm.
I also use it embedded in scrutmydocs.org project. It's open source. You can
look on how we use it : http://www.scrutmydocs.org/
http://www.scrutmydocs.org/

For sure, if you want to have a fine tuning approach, yes you can use Tika as
Christian decribed it.

What I miss in the attachment plugin is the content-type autodetection. I tried
to work on it here
https://github.com/dadoonet/elasticsearch-mapper-attachments/tree/mimetype_detection
https://github.com/dadoonet/elasticsearch-mapper-attachments/tree/mimetype_detection
but with no success :frowning:

My 2 cents
David.

Le 10 octobre 2012 à 16:17, "Christian Th." chth.exensio@gmail.com a écrit :

I had problems using the attachment plugin, too. I think because i hadn't used
it the right way, but couldn't find the right documentation for me.
So i decided not to use it.
If you are a java-developer, you can use tika to extract content on the
clientside. Then you can send the file as base64 and the extracted content to
ElasticSearch.
This works for me. I like to have more control over the process of
extracting, storing or splitting the file over indices. If the returned base64
file-String (on client side) contains "\r\n" you have to replace it on your
own.

If you decide to go this way i can give you further hints.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(cocowalla) #8

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields": [
"title"
],
"query": {
"query_string": {
"query": "elephant"
}
},
"highlight": {
"fields": {
"file": {}
}
}
}

If I remove the "fields", I get the same result. Same if I try highlighting
the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file": "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the
same result; ElasticSearch accepts it, and nothing is logged in the log
file. So when searching for anything, like 'elephant' it always gives me
this same document. If I actually search for 'Test', it does the same, and
there is no highlighting like there is in the examplehttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html
.

Query:
{
"query": {
"query_string": {
"query": "elephant"
}
},
"highlight": {
"file": {}
}
}

Result:

{
took: 0
timed_out: false
_shards: {
total: 1
successful: 1
failed: 0
}
hits: {
total: 1
max_score: 1
hits: [
{
_index: test
_type: attachment
_id: AuR9XczdSlSLsW1mY0ZbFA
_score: 1
_source: {
file: VGVzdA==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.
Can you remove "fields" in your query?
I'm wondering if you can highlight a field ("file") that you don't ask
for (only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla <colin.an...@googlemail.com<javascript:>>
a écrit :

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try
searching using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless
of what query I use (not it is "elphant" above!). Here is what the result
look like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in
base 64 before.

That said, does you json.file looks correct?
I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a
écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorialhttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html,
and have tried using the supplied example scripthttps://gist.github.com/1075067.

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index
] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ],
s [ STARTED ]: Failed to execute [ index {[ test ][ attachment ][
IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED
FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard
. prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction
. performOnPrimary ( TransportShardReplicationOperationAction . java : 532
)
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction$1
. run ( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase .
loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875
)
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406
)
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json . JsonXContentParser
. binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment .
AttachmentMapper . parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(cocowalla) #9

Just tried it with an older version of the attachment plugin, 1.4.0 (which
uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the guide
work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields": [
"title"
],
"query": {
"query_string": {
"query": "elephant"
}
},
"highlight": {
"fields": {
"file": {}
}
}
}

If I remove the "fields", I get the same result. Same if I try
highlighting the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file": "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the
same result; ElasticSearch accepts it, and nothing is logged in the log
file. So when searching for anything, like 'elephant' it always gives me
this same document. If I actually search for 'Test', it does the same, and
there is no highlighting like there is in the examplehttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html
.

Query:
{
"query": {
"query_string": {
"query": "elephant"
}
},
"highlight": {
"file": {}
}
}

Result:

{
took: 0
timed_out: false
_shards: {
total: 1
successful: 1
failed: 0
}
hits: {
total: 1
max_score: 1
hits: [
{
_index: test
_type: attachment
_id: AuR9XczdSlSLsW1mY0ZbFA
_score: 1
_source: {
file: VGVzdA==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.
Can you remove "fields" in your query?
I'm wondering if you can highlight a field ("file") that you don't ask
for (only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla colin.an...@googlemail.com a
écrit :

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try
searching using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless
of what query I use (not it is "elphant" above!). Here is what the result
look like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the
document only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in
base 64 before.

That said, does you json.file looks correct?
I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a
écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorialhttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html,
and have tried using the supplied example scripthttps://gist.github.com/1075067.

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index
] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ],
s [ STARTED ]: Failed to execute [ index {[ test ][ attachment ][
IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED
FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard
. prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction
. performOnPrimary ( TransportShardReplicationOperationAction . java :
532 )
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction$1
. run ( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase
. loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875
)
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406
)
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json .
JsonXContentParser . binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment .
AttachmentMapper . parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(David Pilato) #10

As far as I remember, I was able to play it in the past.
I will try it again in some hours and will report back here.

Stay tuned :wink:

Le 11 octobre 2012 à 09:49, cocowalla colin.anderson333@googlemail.com a écrit
:

Just tried it with an older version of the attachment plugin, 1.4.0 (which
uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the guide
work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

If I remove the "fields", I get the same result. Same if I try
highlighting the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file" : "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the
same result; ElasticSearch accepts it, and nothing is logged in the log
file. So when searching for anything, like 'elephant' it always gives me
this same document. If I actually search for 'Test', it does the same, and
there is no highlighting like there is in the example
http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html
.

Query:
{
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"file" : {}
}
}

Result:

{
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:
> > > What you see is not what you get :wink:

 _source will always contain your document as you sent it to ES.
 Can you remove "fields" in your query?
 I'm wondering if you can highlight a field ("file") that you don't

ask for (only "title")?

 David.


 Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a

écrit :

  > > > > Hmm, that online decoder would not decode the base64 encoded
  > > > > file (I had attached it to my first post).
  If I encode using this command instead (which outputs without

wrapping lines), that online decoder will decode it fine:

  base64 - w 0 fn6742 . pdf

  ElasticSearch also seems to hoover up the file OK, but when I try

searching using:

  {
    "fields" : [
      "title"
    ],
    "query" : {
      "query_string" : {
        "query" : "elephant"
      }
    },
    "highlight" : {
      "fields" : {
        "file" : {}
      }
    }
  }



  I always get returned the document, without any highlighting,

regardless of what query I use (not it is "elphant" above!). Here is
what the result look like:

  {
      took : 1
      timed_out : false
      _shards : {
          total : 1
          successful : 1
          failed : 0
      }
      hits : {
          total : 1
          max_score : 1
          hits : [
              {
                  _index : test
                  _type : attachment
                  _id : - NEqgDIcTIy403EWQ4uwVQ
                  _score : 1
                  _source : {
                      file : BASE64 - FILE - CONTENTS - HERE
                  }
              }
          ]
      }
  }

  If I look using the browser in ElasticSearch Head, I see that the

document only has these fields:

  _index
  _type
  _id
  _score
  file (which is a string containing the Base64 encoded file)

  It's as if it hasn't been processed by the attachment plugin at

all, but there is nothing in the log file. Any ideas on where to go next
with this?

  On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato

wrote:
> > > > > Sorry for my previous answer. I did not see
> > > > > that you have encoded in base 64 before.

    That said, does you json.file looks correct?
    I mean: are you able to decode it?  http://decode.urih.com/

http://decode.urih.com/

    Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a

écrit :

     > > > > > >          You have to encode your file in base64
     > > > > > > and put the encoded string in a field.
     See mapper attachment plugin docs.

     --
     David ;-)
     Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

     Le 10 oct. 2012 à 12:27, cocowalla <

colin.an...@googlemail.com> a écrit :

         > > > > > > >              I'm running on Windows, and
         > > > > > > > using Cygwin I've been trying the
         > > > > > > > attachment tutorial
         > > > > > > > <http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html>
         > > > > > > > , and have tried using the supplied
         > > > > > > > example script
         > > > > > > > <https://gist.github.com/1075067> .
         Everything is fine up until:

         curl - X POST "${host}/test/attachment/" - d @json .

file

         Which gives this error:

         {
             "error" : "MapperParsingException[Failed to

parse]; nested: JsonParseException[Unexpected end-of-input in
VALUE_STRING\n at [Source: [B@195fb8e; line: 1, column: 1020531]];
" , "status" : 400
}

         Looking in elasticsearch.log I see:

         [ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][

action . index ] [ Ahura ] [ test ][ 0 ], node [
sZrRrdUZQASK0wZD1AcU3Q ], [ P ], s [ STARTED ]: Failed to execute
[ index {[ test ][ attachment ][ IU5tOrzySKylO - ZBNQO_Og ],
source [{ "file" : "JVBERi0...BASE64 CROPPED FOR BREVITY!" ]
org . elasticsearch . index . mapper .
MapperParsingException : Failed to parse
at org . elasticsearch . index . mapper .
DocumentMapper . parse ( DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper .
DocumentMapper . parse ( DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service
. InternalIndexShard . prepareCreate ( InternalIndexShard . java :
288 )
at org . elasticsearch . action . index .
TransportIndexAction . shardOperationOnPrimary (
TransportIndexAction . java : 210 )
at org . elasticsearch . action . support .
replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction
. performOnPrimary ( TransportShardReplicationOperationAction .
java : 532 )
at org . elasticsearch . action . support .
replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction$1
. run ( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor
. runWorker ( ThreadPoolExecutor . java : 1110 )
at java . util . concurrent .
ThreadPoolExecutor$Worker . run ( ThreadPoolExecutor . java : 603
)
at java . lang . Thread . run ( Thread . java :
722 )
Caused by : org . elasticsearch . common . jackson .
core . JsonParseException : Unexpected end - of - input in
VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column :
1020531 ]
at org . elasticsearch . common . jackson . core
. JsonParser . _constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core
. base . ParserMinimalBase . _reportError ( ParserMinimalBase .
java : 588 )
at org . elasticsearch . common . jackson . core
. base . ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase
. java : 521 )
at org . elasticsearch . common . jackson . core
. base . ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase
. java : 515 )
at org . elasticsearch . common . jackson . core
. base . ParserBase . loadMoreGuaranteed ( ParserBase . java : 432
)
at org . elasticsearch . common . jackson . core
. json . UTF8StreamJsonParser . _decodeBase64 (
UTF8StreamJsonParser . java : 2875 )
at org . elasticsearch . common . jackson . core
. json . UTF8StreamJsonParser . getBinaryValue (
UTF8StreamJsonParser . java : 406 )
at org . elasticsearch . common . jackson . core
. JsonParser . getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json
. JsonXContentParser . binaryValue ( JsonXContentParser . java :
138 )
at org . elasticsearch . index . mapper .
attachment . AttachmentMapper . parse ( AttachmentMapper . java :
276 )
at org . elasticsearch . index . mapper . object
. ObjectMapper . serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object
. ObjectMapper . parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper .
DocumentMapper . parse ( DocumentMapper . java : 494 )
... 8 more

         I've also attached the file.json that is generated by

the script.

         Any ideas what could be wrong?



         --



             > > > > > > > >                  <json.file>
         > > > > > > > 
         --



         --
         David Pilato
         http://www.scrutmydocs.org/

http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

     > > > > > > 
  > > > > 
  --



 > > > 
 --
 David Pilato
 http://www.scrutmydocs.org/ <http://www.scrutmydocs.org/>
 http://dev.david.pilato.fr/ <http://dev.david.pilato.fr/>
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(David Pilato) #11

Sorry. I didn't find spare time to work on it today.
I will try to test it before monday.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 oct. 2012 à 10:01, David Pilato david@pilato.fr a écrit :

As far as I remember, I was able to play it in the past.
I will try it again in some hours and will report back here.

Stay tuned :wink:

Le 11 octobre 2012 à 09:49, cocowalla colin.anderson333@googlemail.com a écrit :

Just tried it with an older version of the attachment plugin, 1.4.0 (which uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the guide work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:
Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

If I remove the "fields", I get the same result. Same if I try highlighting the "title" field.

I just tried indexing a fixed string, instead of actually doing it 'properly' and indexing a PDF - so the JSON looked like this:

{
"file" : "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the same result; ElasticSearch accepts it, and nothing is logged in the log file. So when searching for anything, like 'elephant' it always gives me this same document. If I actually search for 'Test', it does the same, and there is no highlighting like there is in the example.

Query:
{
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"file" : {}
}
}

Result:

{
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:
What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.
Can you remove "fields" in your query?
I'm wondering if you can highlight a field ("file") that you don't ask for (only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a écrit :

Hmm, that online decoder would not decode the base64 encoded file (I had attached it to my first post).

If I encode using this command instead (which outputs without wrapping lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try searching using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless of what query I use (not it is "elphant" above!). Here is what the result look like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:
Sorry for my previous answer. I did not see that you have encoded in base 64 before.

That said, does you json.file looks correct?
I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a field.

See mapper attachment plugin docs.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment tutorial, and have tried using the supplied example script.

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested: JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source: [B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index ] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ], s [ STARTED ]: Failed to execute [ index {[ test ][ attachment ][ IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to parse
at org . elasticsearch . index . mapper . DocumentMapper . parse ( DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse ( DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard . prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction . shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction . performOnPrimary ( TransportShardReplicationOperationAction . java : 532 )
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction$1 . run ( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker ( ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run ( ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core . JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser . _constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base . ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base . ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base . ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase . loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json . UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875 )
at org . elasticsearch . common . jackson . core . json . UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406 )
at org . elasticsearch . common . jackson . core . JsonParser . getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json . JsonXContentParser . binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment . AttachmentMapper . parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper . serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper . parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse ( DocumentMapper . java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(cocowalla) #12

Much appreciated, thank you :smiley:

On Thursday, October 11, 2012 8:59:17 PM UTC+1, David Pilato wrote:

Sorry. I didn't find spare time to work on it today.
I will try to test it before monday.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 oct. 2012 à 10:01, David Pilato <da...@pilato.fr <javascript:>> a
écrit :

As far as I remember, I was able to play it in the past.
I will try it again in some hours and will report back here.

Stay tuned :wink:

Le 11 octobre 2012 à 09:49, cocowalla <colin.an...@googlemail.com<javascript:>>
a écrit :

Just tried it with an older version of the attachment plugin, 1.4.0 (which
uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the guide
work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

If I remove the "fields", I get the same result. Same if I try
highlighting the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file" : "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the
same result; ElasticSearch accepts it, and nothing is logged in the log
file. So when searching for anything, like 'elephant' it always gives me
this same document. If I actually search for 'Test', it does the same, and
there is no highlighting like there is in the examplehttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html.

Query:
{
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"file" : {}
}
}

Result:

{
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.
Can you remove "fields" in your query?
I'm wondering if you can highlight a field ("file") that you don't ask
for (only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a
écrit :

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try
searching using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless
of what query I use (not it is "elphant" above!). Here is what the result
look like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in
base 64 before.

That said, does you json.file looks correct?
I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a
écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorialhttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html,
and have tried using the supplied example scripthttps://gist.github.com/1075067.

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index
] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ],
s [ STARTED ]: Failed to execute [ index {[ test ][ attachment ][
IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED
FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard
. prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction
. performOnPrimary ( TransportShardReplicationOperationAction . java : 532
)
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction$1
. run ( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase .
loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875
)
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406
)
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json . JsonXContentParser
. binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment .
AttachmentMapper . parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--


(David Pilato) #13

Hi,

I just tested the gist and everything is working fine with ES 0.19.10 and
attachment 1.6.0.

I updated the gist a little here with the installation process of ES and
plugin: https://gist.github.com/3907010

I ran it on a Linux VM (ubuntu) under windows which is better than using
cygwin (IMHO).

HTH

David

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de cocowalla
Envoyé : jeudi 11 octobre 2012 22:22
À : elasticsearch@googlegroups.com
Objet : Re: Sending Attachments: Unexpected end-of-input in VALUE_STRING

Much appreciated, thank you :smiley:

On Thursday, October 11, 2012 8:59:17 PM UTC+1, David Pilato wrote:

Sorry. I didn't find spare time to work on it today.

I will try to test it before monday.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 oct. 2012 à 10:01, David Pilato <da...@pilato.fr <javascript:> > a
écrit :

As far as I remember, I was able to play it in the past.

I will try it again in some hours and will report back here.

Stay tuned :wink:

Le 11 octobre 2012 à 09:49, cocowalla <colin.an...@googlemail.com
<javascript:> > a écrit :

Just tried it with an older version of the attachment plugin, 1.4.0 (which
uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the guide
work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

If I remove the "fields", I get the same result. Same if I try highlighting
the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file" : "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the same
result; ElasticSearch accepts it, and nothing is logged in the log file. So
when searching for anything, like 'elephant' it always gives me this same
document. If I actually search for 'Test', it does the same, and there is no
highlighting like there is in the example
<http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action
.html> .

Query:

{
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"file" : {}
}
}

Result:

{
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.

Can you remove "fields" in your query?

I'm wondering if you can highlight a field ("file") that you don't ask for
(only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a écrit
:

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try searching
using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless of
what query I use (not it is "elphant" above!). Here is what the result look
like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in base 64
before.

That said, does you json.file looks correct?

I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorial
<http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action
.html> , and have tried using the supplied example script
https://gist.github.com/1075067 .

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index ]
[ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ], s [ STARTED
]: Failed to execute [ index {[ test ][ attachment ][ IU5tOrzySKylO -
ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard .
prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction .
performOnPrimary ( TransportShardReplicationOperationAction . java : 532 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction$1 . run (
TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase .
loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406 )
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json . JsonXContentParser .
binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment . AttachmentMapper
. parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper . parse
( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

--


(cocowalla) #14

I get the same Unexpected end-of-input in VALUE_STRING\n error when running
in Windows (which is the only option available to me) :frowning:

Would you be able to attach a copy of the json.file generated by the
script? I'd like to compare it to the output I get when running the script,
as I think the problem may be in the Base64 encoding of the PDF file on
Windows.

On Wednesday, October 17, 2012 7:02:29 PM UTC+1, David Pilato wrote:

Hi,

I just tested the gist and everything is working fine with ES 0.19.10 and
attachment 1.6.0.

I updated the gist a little here with the installation process of ES and
plugin: https://gist.github.com/3907010

I ran it on a Linux VM (ubuntu) under windows which is better than using
cygwin (IMHO).

HTH

David

De : elasti...@googlegroups.com <javascript:> [mailto:
elasti...@googlegroups.com <javascript:>] De la part de cocowalla
Envoyé : jeudi 11 octobre 2012 22:22
À : elasti...@googlegroups.com <javascript:>
Objet : Re: Sending Attachments: Unexpected end-of-input in VALUE_STRING

Much appreciated, thank you :smiley:

On Thursday, October 11, 2012 8:59:17 PM UTC+1, David Pilato wrote:

Sorry. I didn't find spare time to work on it today.

I will try to test it before monday.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 oct. 2012 à 10:01, David Pilato da...@pilato.fr a écrit :

As far as I remember, I was able to play it in the past.

I will try it again in some hours and will report back here.

Stay tuned :wink:

Le 11 octobre 2012 à 09:49, cocowalla colin.an...@googlemail.com a
écrit :

Just tried it with an older version of the attachment plugin, 1.4.0 (which
uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the guide
work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

If I remove the "fields", I get the same result. Same if I try
highlighting the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file" : "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the
same result; ElasticSearch accepts it, and nothing is logged in the log
file. So when searching for anything, like 'elephant' it always gives me
this same document. If I actually search for 'Test', it does the same, and
there is no highlighting like there is in the examplehttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html.

Query:

{
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"file" : {}
}
}

Result:

{
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.

Can you remove "fields" in your query?

I'm wondering if you can highlight a field ("file") that you don't ask for
(only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a
écrit :

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try
searching using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless
of what query I use (not it is "elphant" above!). Here is what the result
look like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in base
64 before.

That said, does you json.file looks correct?

I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a
écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorialhttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html,
and have tried using the supplied example scripthttps://gist.github.com/1075067.

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index
] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ], s [
STARTED ]: Failed to execute [ index {[ test ][ attachment ][
IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED
FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard
. prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction
. performOnPrimary ( TransportShardReplicationOperationAction . java : 532
)
at org . elasticsearch . action . support . replication .TransportShardReplicationOperationAction$AsyncShardOperationAction$1
. run ( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (ThreadPoolExecutor
. java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (ThreadPoolExecutor
. java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .JsonParseException
: Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .ParserMinimalBase
. _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .ParserMinimalBase
. _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .ParserMinimalBase
. _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase .
loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875
)
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406
)
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json .JsonXContentParser
. binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment .AttachmentMapper
. parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (DocumentMapper
. java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

--


(cocowalla) #15

OK, if I encode using this instead, elasticsearch seems to accept the
document OK:

coded=base64 -w 0 fn6742.pdf | perl -pe 's/\n/\\n/g'

...*but *when I search I get zero hits, same as when I tried indexing a
simple string in one of my previous posts.

It really looks like the attachment plugin isn't working any magic with the
document - is there any way to determine this?

On Thursday, October 18, 2012 8:48:04 AM UTC+1, cocowalla wrote:

I get the same Unexpected end-of-input in VALUE_STRING\n error when
running in Windows (which is the only option available to me) :frowning:

Would you be able to attach a copy of the json.file generated by the
script? I'd like to compare it to the output I get when running the script,
as I think the problem may be in the Base64 encoding of the PDF file on
Windows.

On Wednesday, October 17, 2012 7:02:29 PM UTC+1, David Pilato wrote:

Hi,

I just tested the gist and everything is working fine with ES 0.19.10 and
attachment 1.6.0.

I updated the gist a little here with the installation process of ES and
plugin: https://gist.github.com/3907010

I ran it on a Linux VM (ubuntu) under windows which is better than using
cygwin (IMHO).

HTH

David

De : elasti...@googlegroups.com [mailto:elasti...@googlegroups.com] De
la part de
cocowalla
Envoyé : jeudi 11 octobre 2012 22:22
À : elasti...@googlegroups.com
Objet : Re: Sending Attachments: Unexpected end-of-input in
VALUE_STRING

Much appreciated, thank you :smiley:

On Thursday, October 11, 2012 8:59:17 PM UTC+1, David Pilato wrote:

Sorry. I didn't find spare time to work on it today.

I will try to test it before monday.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 oct. 2012 à 10:01, David Pilato da...@pilato.fr a écrit :

As far as I remember, I was able to play it in the past.

I will try it again in some hours and will report back here.

Stay tuned :wink:

Le 11 octobre 2012 à 09:49, cocowalla colin.an...@googlemail.com a
écrit :

Just tried it with an older version of the attachment plugin, 1.4.0
(which uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the
guide work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

If I remove the "fields", I get the same result. Same if I try
highlighting the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file" : "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the
same result; ElasticSearch accepts it, and nothing is logged in the log
file. So when searching for anything, like 'elephant' it always gives me
this same document. If I actually search for 'Test', it does the same, and
there is no highlighting like there is in the examplehttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html.

Query:

{
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"file" : {}
}
}

Result:

{
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.

Can you remove "fields" in your query?

I'm wondering if you can highlight a field ("file") that you don't ask
for (only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a
écrit :

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try
searching using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless
of what query I use (not it is "elphant" above!). Here is what the result
look like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the
document only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in base
64 before.

That said, does you json.file looks correct?

I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a
écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorialhttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html,
and have tried using the supplied example scripthttps://gist.github.com/1075067.

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index
] [ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ], s [
STARTED ]: Failed to execute [ index {[ test ][ attachment ][
IU5tOrzySKylO - ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED
FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard
. prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication . TransportShardReplicationOperationAction$AsyncShardOperationAction
. performOnPrimary ( TransportShardReplicationOperationAction . java :
532 )
at org . elasticsearch . action . support . replication .TransportShardReplicationOperationAction$AsyncShardOperationAction$1
. run ( TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (ThreadPoolExecutor
. java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (ThreadPoolExecutor
. java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .JsonParseException
: Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .ParserMinimalBase
. _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .ParserMinimalBase
. _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .ParserMinimalBase
. _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase
. loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875
)
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406
)
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json .JsonXContentParser
. binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment .AttachmentMapper
. parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
parse ( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (DocumentMapper
. java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

--


(David Pilato) #16

Here it is : https://gist.github.com/3907010#file_attachment_file.json

David

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de cocowalla
Envoyé : jeudi 18 octobre 2012 09:48
À : elasticsearch@googlegroups.com
Objet : Re: Sending Attachments: Unexpected end-of-input in VALUE_STRING

I get the same Unexpected end-of-input in VALUE_STRING\n error when running
in Windows (which is the only option available to me) :frowning:

Would you be able to attach a copy of the json.file generated by the script?
I'd like to compare it to the output I get when running the script, as I
think the problem may be in the Base64 encoding of the PDF file on Windows.

On Wednesday, October 17, 2012 7:02:29 PM UTC+1, David Pilato wrote:

Hi,

I just tested the gist and everything is working fine with ES 0.19.10 and
attachment 1.6.0.

I updated the gist a little here with the installation process of ES and
plugin: https://gist.github.com/3907010

I ran it on a Linux VM (ubuntu) under windows which is better than using
cygwin (IMHO).

HTH

David

De : elasti...@googlegroups.com <javascript:>
[mailto:elasti...@googlegroups.com <javascript:> ] De la part de cocowalla
Envoyé : jeudi 11 octobre 2012 22:22
À : elasti...@googlegroups.com <javascript:>
Objet : Re: Sending Attachments: Unexpected end-of-input in VALUE_STRING

Much appreciated, thank you :smiley:

On Thursday, October 11, 2012 8:59:17 PM UTC+1, David Pilato wrote:

Sorry. I didn't find spare time to work on it today.

I will try to test it before monday.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 oct. 2012 à 10:01, David Pilato da...@pilato.fr a écrit :

As far as I remember, I was able to play it in the past.

I will try it again in some hours and will report back here.

Stay tuned :wink:

Le 11 octobre 2012 à 09:49, cocowalla colin.an...@googlemail.com a écrit :

Just tried it with an older version of the attachment plugin, 1.4.0 (which
uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the guide
work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

If I remove the "fields", I get the same result. Same if I try highlighting
the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file" : "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the same
result; ElasticSearch accepts it, and nothing is logged in the log file. So
when searching for anything, like 'elephant' it always gives me this same
document. If I actually search for 'Test', it does the same, and there is no
highlighting like there is in the example
<http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action
.html> .

Query:

{
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"file" : {}
}
}

Result:

{
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.

Can you remove "fields" in your query?

I'm wondering if you can highlight a field ("file") that you don't ask for
(only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a écrit
:

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try searching
using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless of
what query I use (not it is "elphant" above!). Here is what the result look
like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in base 64
before.

That said, does you json.file looks correct?

I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorial
<http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action
.html> , and have tried using the supplied example script
https://gist.github.com/1075067 .

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index ]
[ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ], s [ STARTED
]: Failed to execute [ index {[ test ][ attachment ][ IU5tOrzySKylO -
ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard .
prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction .
performOnPrimary ( TransportShardReplicationOperationAction . java : 532 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction$1 . run (
TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase .
loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406 )
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json . JsonXContentParser .
binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment . AttachmentMapper
. parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper . parse
( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

--

--


(David Pilato) #17

I have seen that once. The common error is that you don’t start with a clean
index and mapping definition.

You should check that your mapping is the right one.
http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.h
tml

If your field is analyzed as a String, it won’t work.

HTH

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de cocowalla
Envoyé : jeudi 18 octobre 2012 10:18
À : elasticsearch@googlegroups.com
Objet : Re: Sending Attachments: Unexpected end-of-input in VALUE_STRING

OK, if I encode using this instead, elasticsearch seems to accept the
document OK:

coded=base64 -w 0 fn6742.pdf | perl -pe 's/\n/\\n/g'

...but when I search I get zero hits, same as when I tried indexing a simple
string in one of my previous posts.

It really looks like the attachment plugin isn't working any magic with the
document - is there any way to determine this?

On Thursday, October 18, 2012 8:48:04 AM UTC+1, cocowalla wrote:

I get the same Unexpected end-of-input in VALUE_STRING\n error when running
in Windows (which is the only option available to me) :frowning:

Would you be able to attach a copy of the json.file generated by the script?
I'd like to compare it to the output I get when running the script, as I
think the problem may be in the Base64 encoding of the PDF file on Windows.

On Wednesday, October 17, 2012 7:02:29 PM UTC+1, David Pilato wrote:

Hi,

I just tested the gist and everything is working fine with ES 0.19.10 and
attachment 1.6.0.

I updated the gist a little here with the installation process of ES and
plugin: https://gist.github.com/3907010

I ran it on a Linux VM (ubuntu) under windows which is better than using
cygwin (IMHO).

HTH

David

De : elasti...@googlegroups.com [mailto:elasti...@googlegroups.com] De la
part de cocowalla
Envoyé : jeudi 11 octobre 2012 22:22
À : elasti...@googlegroups.com
Objet : Re: Sending Attachments: Unexpected end-of-input in VALUE_STRING

Much appreciated, thank you :smiley:

On Thursday, October 11, 2012 8:59:17 PM UTC+1, David Pilato wrote:

Sorry. I didn't find spare time to work on it today.

I will try to test it before monday.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 oct. 2012 à 10:01, David Pilato da...@pilato.fr a écrit :

As far as I remember, I was able to play it in the past.

I will try it again in some hours and will report back here.

Stay tuned :wink:

Le 11 octobre 2012 à 09:49, cocowalla colin.an...@googlemail.com a écrit :

Just tried it with an older version of the attachment plugin, 1.4.0 (which
uses an older version of Tika), and got the same result :frowning:

Any ideas how I can try to diagnose the problem? Do the steps in the guide
work for anyone else?

On Wednesday, October 10, 2012 3:37:54 PM UTC+1, cocowalla wrote:

Hi David,

This was the query used in the sample, so I had expected it to work:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

If I remove the "fields", I get the same result. Same if I try highlighting
the "title" field.

I just tried indexing a fixed string, instead of actually doing it
'properly' and indexing a PDF - so the JSON looked like this:

{
"file" : "VGVzdA=="
}

Note that 'VGVzdA==' is just 'Test' Base64 encoded. And it gives me the same
result; ElasticSearch accepts it, and nothing is logged in the log file. So
when searching for anything, like 'elephant' it always gives me this same
document. If I actually search for 'Test', it does the same, and there is no
highlighting like there is in the example
<http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action
.html> .

Query:

{
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"file" : {}
}
}

Result:

{
took : 0
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : AuR9XczdSlSLsW1mY0ZbFA
_score : 1
_source : {
file : VGVzdA ==
}
}
]
}
}

Is there anything I can configure to get more out of the logs to try and
find out what is wrong?

On Wednesday, October 10, 2012 2:25:23 PM UTC+1, David Pilato wrote:

What you see is not what you get :wink:

_source will always contain your document as you sent it to ES.

Can you remove "fields" in your query?

I'm wondering if you can highlight a field ("file") that you don't ask for
(only "title")?

David.

Le 10 octobre 2012 à 14:12, cocowalla < colin.an...@googlemail.com> a écrit
:

Hmm, that online decoder would not decode the base64 encoded file (I had
attached it to my first post).

If I encode using this command instead (which outputs without wrapping
lines), that online decoder will decode it fine:

base64 - w 0 fn6742 . pdf

ElasticSearch also seems to hoover up the file OK, but when I try searching
using:

{
"fields" : [
"title"
],
"query" : {
"query_string" : {
"query" : "elephant"
}
},
"highlight" : {
"fields" : {
"file" : {}
}
}
}

I always get returned the document, without any highlighting, regardless of
what query I use (not it is "elphant" above!). Here is what the result look
like:

{
took : 1
timed_out : false
_shards : {
total : 1
successful : 1
failed : 0
}
hits : {
total : 1
max_score : 1
hits : [
{
_index : test
_type : attachment
_id : - NEqgDIcTIy403EWQ4uwVQ
_score : 1
_source : {
file : BASE64 - FILE - CONTENTS - HERE
}
}
]
}
}

If I look using the browser in ElasticSearch Head, I see that the document
only has these fields:

_index
_type
_id
_score
file (which is a string containing the Base64 encoded file)

It's as if it hasn't been processed by the attachment plugin at all, but
there is nothing in the log file. Any ideas on where to go next with this?

On Wednesday, October 10, 2012 12:42:30 PM UTC+1, David Pilato wrote:

Sorry for my previous answer. I did not see that you have encoded in base 64
before.

That said, does you json.file looks correct?

I mean: are you able to decode it? http://decode.urih.com/

Le 10 octobre 2012 à 13:15, David Pilato < da...@pilato.fr> a écrit :

You have to encode your file in base64 and put the encoded string in a
field.

See mapper attachment plugin docs.

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 oct. 2012 à 12:27, cocowalla < colin.an...@googlemail.com> a écrit :

I'm running on Windows, and using Cygwin I've been trying the attachment
tutorial
<http://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action
.html> , and have tried using the supplied example script
https://gist.github.com/1075067 .

Everything is fine up until:

curl - X POST "${host}/test/attachment/" - d @json . file

Which gives this error:

{
"error" : "MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected end-of-input in VALUE_STRING\n at [Source:
[B@195fb8e; line: 1, column: 1020531]]; " , "status" : 400
}

Looking in elasticsearch.log I see:

[ 2012 - 10 - 10 11 : 10 : 36 , 239 ][ DEBUG ][ action . index ]
[ Ahura ] [ test ][ 0 ], node [ sZrRrdUZQASK0wZD1AcU3Q ], [ P ], s [ STARTED
]: Failed to execute [ index {[ test ][ attachment ][ IU5tOrzySKylO -
ZBNQO_Og ], source [{ "file" : "JVBERi0...BASE64 CROPPED FOR BREVITY!" ]
org . elasticsearch . index . mapper . MapperParsingException : Failed to
parse
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 509 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 438 )
at org . elasticsearch . index . shard . service . InternalIndexShard .
prepareCreate ( InternalIndexShard . java : 288 )
at org . elasticsearch . action . index . TransportIndexAction .
shardOperationOnPrimary ( TransportIndexAction . java : 210 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction .
performOnPrimary ( TransportShardReplicationOperationAction . java : 532 )
at org . elasticsearch . action . support . replication .
TransportShardReplicationOperationAction$AsyncShardOperationAction$1 . run (
TransportShardReplicationOperationAction . java : 430 )
at java . util . concurrent . ThreadPoolExecutor . runWorker (
ThreadPoolExecutor . java : 1110 )
at java . util . concurrent . ThreadPoolExecutor$Worker . run (
ThreadPoolExecutor . java : 603 )
at java . lang . Thread . run ( Thread . java : 722 )
Caused by : org . elasticsearch . common . jackson . core .
JsonParseException : Unexpected end - of - input in VALUE_STRING
at [ Source : [ B@195fb8e ; line : 1 , column : 1020531 ]
at org . elasticsearch . common . jackson . core . JsonParser .
_constructError ( JsonParser . java : 1284 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportError ( ParserMinimalBase . java : 588 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 521 )
at org . elasticsearch . common . jackson . core . base .
ParserMinimalBase . _reportInvalidEOF ( ParserMinimalBase . java : 515 )
at org . elasticsearch . common . jackson . core . base . ParserBase .
loadMoreGuaranteed ( ParserBase . java : 432 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . _decodeBase64 ( UTF8StreamJsonParser . java : 2875 )
at org . elasticsearch . common . jackson . core . json .
UTF8StreamJsonParser . getBinaryValue ( UTF8StreamJsonParser . java : 406 )
at org . elasticsearch . common . jackson . core . JsonParser .
getBinaryValue ( JsonParser . java : 1029 )
at org . elasticsearch . common . xcontent . json . JsonXContentParser .
binaryValue ( JsonXContentParser . java : 138 )
at org . elasticsearch . index . mapper . attachment . AttachmentMapper
. parse ( AttachmentMapper . java : 276 )
at org . elasticsearch . index . mapper . object . ObjectMapper .
serializeValue ( ObjectMapper . java : 598 )
at org . elasticsearch . index . mapper . object . ObjectMapper . parse
( ObjectMapper . java : 459 )
at org . elasticsearch . index . mapper . DocumentMapper . parse (
DocumentMapper . java : 494 )
... 8 more

I've also attached the file.json that is generated by the script.

Any ideas what could be wrong?

--

<json.file>

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

--

--


(cocowalla) #18

OK, I've now tried it with your json file, and still get the same results :frowning:

Here is the mapping (which looks OK, AFAICS):

$ curl -XGET 'http://localhost:9200/test/_mapping?pretty=1'

{
"test" : {
"default" : {
"_all" : {
"enabled" : false
},
"properties" : { }
},
"attachment" : {
"_all" : {
"enabled" : false
},
"properties" : {
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"store" : "yes",
"term_vector" : "with_positions_offsets"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string",
"store" : "yes"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
}
}
}
}
}
}
}

On Thursday, October 18, 2012 7:33:16 PM UTC+1, David Pilato wrote:

I have seen that once. The common error is that you don’t start with a
clean index and mapping definition.

You should check that your mapping is the right one.
http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html

If your field is analyzed as a String, it won’t work.

HTH

David.

--


(David Pilato) #19

Yes. It looks ok.

I suspect something wrong with your curl command under cygwin.

I suppose that your use case is to use ES with another language than curl ???
May be, you can try to run your code from elsewhere than cygwin (jvm, PHP, ...). I suppose that you will use in production a real UNIX box?

May I suggest that you try to use your target machine or your target platform (JVM, ...) ?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 oct. 2012 à 10:21, cocowalla colin.anderson333@googlemail.com a écrit :

OK, I've now tried it with your json file, and still get the same results :frowning:

Here is the mapping (which looks OK, AFAICS):

$ curl -XGET 'http://localhost:9200/test/_mapping?pretty=1'

{
"test" : {
"default" : {
"_all" : {
"enabled" : false
},
"properties" : { }
},
"attachment" : {
"_all" : {
"enabled" : false
},
"properties" : {
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"store" : "yes",
"term_vector" : "with_positions_offsets"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string",
"store" : "yes"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
}
}
}
}
}
}
}

On Thursday, October 18, 2012 7:33:16 PM UTC+1, David Pilato wrote:

I have seen that once. The common error is that you don’t start with a clean index and mapping definition.

You should check that your mapping is the right one. http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html

If your field is analyzed as a String, it won’t work.

HTH

David.

--

--


(cocowalla) #20

I tried using a Win32 build of cygwin and got the same result. I then tried
from C# code and got the same result :frowning:

I also tried using ElasticSearch Head, and still got the same result :frowning:

We are strictly Microsoft, server-wise, so would need to use a Microsoft
sevrer in production too.

I'm really not sure what else there is to try here though?

On Friday, October 19, 2012 9:29:44 AM UTC+1, David Pilato wrote:

Yes. It looks ok.

I suspect something wrong with your curl command under cygwin.

I suppose that your use case is to use ES with another language than curl
???
May be, you can try to run your code from elsewhere than cygwin (jvm, PHP,
...). I suppose that you will use in production a real UNIX box?

May I suggest that you try to use your target machine or your target
platform (JVM, ...) ?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 oct. 2012 à 10:21, cocowalla <colin.an...@googlemail.com<javascript:>>
a écrit :

OK, I've now tried it with your json file, and still get the same results
:frowning:

Here is the mapping (which looks OK, AFAICS):

$ curl -XGET 'http://localhost:9200/test/_mapping?pretty=1'

{
"test" : {
"default" : {
"_all" : {
"enabled" : false
},
"properties" : { }
},
"attachment" : {
"_all" : {
"enabled" : false
},
"properties" : {
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"store" : "yes",
"term_vector" : "with_positions_offsets"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string",
"store" : "yes"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
}
}
}
}
}
}
}

On Thursday, October 18, 2012 7:33:16 PM UTC+1, David Pilato wrote:

I have seen that once. The common error is that you don’t start with a
clean index and mapping definition.

You should check that your mapping is the right one.
http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html

If your field is analyzed as a String, it won’t work.

HTH

David.

--

--