Problem with Attachment-Plugin and Highlighting


(maximilian.brodhun) #1

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting
fuction for this field I got in some cases a "error:500". The error message
indicates that there is a problem with UTF-8 decoding from the Base64
encoded text.

I actually don't know if there is anything I can do, but I want to report
this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Is there any way to reproduce your error?

Could you gist a curl recreation?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 13 sept. 2013 à 12:39, maximilian.brodhun@googlemail.com a écrit :

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting fuction for this field I got in some cases a "error:500". The error message indicates that there is a problem with UTF-8 decoding from the Base64 encoded text.

I actually don't know if there is anything I can do, but I want to report this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(maximilian.brodhun) #3

Thanks for reply

Yes, I can reproduce the error.

The mapping part of the attachment field is:

        "ftattach": {
            "fields": {
                "author": {
                    "type": "string"
                },
                "content_type": {
                    "type": "string"
                },
                "date": {
                    "format": "dateOptionalTime",
                    "type": "date"
                },
                "ftattach": {
                    "store": "yes",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                },
                "keywords": {
                    "type": "string"
                },
                "name": {
                    "type": "string"
                },
                "title": {
                    "store": "yes",
                    "type": "string"
                }
            },
            "path": "full",
            "type": "attachment"
        },

And my query is (sorry for the german part in the query):

{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"ftattach" : {
"query" : ""Die Hände dir zu reichen, schauert's den Reinen"",
"type" : "phrase",
"slop" : 0
}
}
},
"fields" : [ "textgridUri", "title" ],
"highlight" : {
"fields" : {
"ftattach" : {
"fragment_size" : 100
}
}
}
}

Am Freitag, 13. September 2013 13:02:19 UTC+2 schrieb David Pilato:

Is there any way to reproduce your error?

Could you gist a curl recreation?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 13 sept. 2013 à 12:39, maximilia...@googlemail.com <javascript:> a
écrit :

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting
fuction for this field I got in some cases a "error:500". The error message
indicates that there is a problem with UTF-8 decoding from the Base64
encoded text.

I actually don't know if there is anything I can do, but I want to report
this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(maximilian.brodhun) #4

Thanks for reply

Yes, I can reproduce the error.

The mapping part of the attachment field is:

        "ftattach": {
            "fields": {
                "author": {
                    "type": "string"
                },
                "content_type": {
                    "type": "string"
                },
                "date": {
                    "format": "dateOptionalTime",
                    "type": "date"
                },
                "ftattach": {
                    "store": "yes",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                },
                "keywords": {
                    "type": "string"
                },
                "name": {
                    "type": "string"
                },
                "title": {
                    "store": "yes",
                    "type": "string"
                }
            },
            "path": "full",
            "type": "attachment"
        },

And my query is (sorry for the german part in the query):

{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"ftattach" : {
"query" : ""Die Hände dir zu reichen, schauert's den Reinen"",
"type" : "phrase",
"slop" : 0
}
}
},
"fields" : ["title" ],
"highlight" : {
"fields" : {
"ftattach" : {
"fragment_size" : 100
}
}
}
}

Am Freitag, 13. September 2013 13:02:19 UTC+2 schrieb David Pilato:

Is there any way to reproduce your error?

Could you gist a curl recreation?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 13 sept. 2013 à 12:39, maximilia...@googlemail.com <javascript:> a
écrit :

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting
fuction for this field I got in some cases a "error:500". The error message
indicates that there is a problem with UTF-8 decoding from the Base64
encoded text.

I actually don't know if there is anything I can do, but I want to report
this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #5

And could you gist a full curl recreation? See http://www.elasticsearch.org/help/
It could help a lot to reproduce your issue.

Thanks

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 13 sept. 2013 à 16:16, maximilian.brodhun@googlemail.com a écrit :

Thanks for reply

Yes, I can reproduce the error.

The mapping part of the attachment field is:

        "ftattach": {
            "fields": {
                "author": {
                    "type": "string"
                },
                "content_type": {
                    "type": "string"
                },
                "date": {
                    "format": "dateOptionalTime",
                    "type": "date"
                },
                "ftattach": {
                    "store": "yes",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                },
                "keywords": {
                    "type": "string"
                },
                "name": {
                    "type": "string"
                },
                "title": {
                    "store": "yes",
                    "type": "string"
                }
            },
            "path": "full",
            "type": "attachment"
        },

And my query is (sorry for the german part in the query):

{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"ftattach" : {
"query" : ""Die Hände dir zu reichen, schauert's den Reinen"",
"type" : "phrase",
"slop" : 0
}
}
},
"fields" : ["title" ],
"highlight" : {
"fields" : {
"ftattach" : {
"fragment_size" : 100
}
}
}
}

Am Freitag, 13. September 2013 13:02:19 UTC+2 schrieb David Pilato:
Is there any way to reproduce your error?

Could you gist a curl recreation?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 13 sept. 2013 à 12:39, maximilia...@googlemail.com a écrit :

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting fuction for this field I got in some cases a "error:500". The error message indicates that there is a problem with UTF-8 decoding from the Base64 encoded text.

I actually don't know if there is anything I can do, but I want to report this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Sentient6) #6

Thanks for your help.

I made a GIST, hope it is helpful:


(maximilian.brodhun) #7

Thanks for your help.

I made a GIST, hope it is helpful:

Am Sonntag, 15. September 2013 22:08:35 UTC+2 schrieb David Pilato:

And could you gist a full curl recreation? See
http://www.elasticsearch.org/help/
It could help a lot to reproduce your issue.

Thanks

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 13 sept. 2013 à 16:16, maximilia...@googlemail.com <javascript:> a
écrit :

Thanks for reply

Yes, I can reproduce the error.

The mapping part of the attachment field is:

        "ftattach": {
            "fields": {
                "author": {
                    "type": "string"
                },
                "content_type": {
                    "type": "string"
                },
                "date": {
                    "format": "dateOptionalTime",
                    "type": "date"
                },
                "ftattach": {
                    "store": "yes",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                },
                "keywords": {
                    "type": "string"
                },
                "name": {
                    "type": "string"
                },
                "title": {
                    "store": "yes",
                    "type": "string"
                }
            },
            "path": "full",
            "type": "attachment"
        },

And my query is (sorry for the german part in the query):

{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"ftattach" : {
"query" : ""Die Hände dir zu reichen, schauert's den Reinen"",
"type" : "phrase",
"slop" : 0
}
}
},
"fields" : ["title" ],
"highlight" : {
"fields" : {
"ftattach" : {
"fragment_size" : 100
}
}
}
}

Am Freitag, 13. September 2013 13:02:19 UTC+2 schrieb David Pilato:

Is there any way to reproduce your error?

Could you gist a curl recreation?

--
David Pilato | Technical Advocate | *Elasticsearch.comhttp://elasticsearch.com/
*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 13 sept. 2013 à 12:39, maximilia...@googlemail.com a écrit :

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting
fuction for this field I got in some cases a "error:500". The error message
indicates that there is a problem with UTF-8 decoding from the Base64
encoded text.

I actually don't know if there is anything I can do, but I want to report
this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(maximilian.brodhun) #8

Just to make it complete:

I'm using ElasticSearch Verion 0.90.3 and Attachment-Plugin Version 1.9.0

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #9

Heya

I tried to play with your example but no luck.
First I had to modify your script as it contains some errors.
Second, I don't get any result.

Could you check it and update your gist (based on this fork: https://gist.github.com/dadoonet/58e6787b78621790fd36)?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 16 sept. 2013 à 13:56, maximilian.brodhun@googlemail.com a écrit :

Just to make it complete:

I'm using ElasticSearch Verion 0.90.3 and Attachment-Plugin Version 1.9.0

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(maximilian.brodhun) #10

Sorry for the errors in my script.

In your modifications you try to search for the query term in the field
"ftattach" but you index the field "mdattach". I know that the field
ftattach iy really big, but I need it so big.

I modify this and make a new Gist. Hopefully this time without error.

The error message is everytime.

{
"error" : "SearchPhaseExecutionException[Failed to execute phase
[query_fetch], all shards failed; shardFailures
{StackOverflowError[null]}]",
"status" : 500
}

And ElasticSearch throws the exception:

java.lang.StackOverflowError
at java.util.LinkedList.removeFirst(LinkedList.java:266)
at java.util.LinkedList.pop(LinkedList.java:799)
at
org.apache.lucene.search.vectorhighlight.XFieldPhraseList.extractPhrases(XFieldPhraseList.java:81)
at
org.apache.lucene.search.vectorhighlight.XFieldPhraseList.extractPhrases(XFieldPhraseList.java:99)

     .......

In DEBUG mode for the rootlogger ElasticSearch also points out:

[2013-09-16 15:39:13,202][DEBUG][indices.memory ] [Joe Cartelli]
recalculating shard indexing buffer (reason=active/inactive[false]
created/deleted[true]), total is [99mb] with [3] active shards, each shard
set to indexing=[33mb], translog=[64kb]
[2013-09-16 15:39:13,202][DEBUG][index.engine.robin ] [Joe Cartelli]
[testindex][0] updating index_buffer_size from [64mb] to [33mb]

So maybe the file is to big for the memory?

Am Montag, 16. September 2013 14:56:00 UTC+2 schrieb David Pilato:

Heya

I tried to play with your example but no luck.
First I had to modify your script as it contains some errors.
Second, I don't get any result.

Could you check it and update your gist (based on this fork:
https://gist.github.com/dadoonet/58e6787b78621790fd36)?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 16 sept. 2013 à 13:56, maximilia...@googlemail.com <javascript:> a
écrit :

Just to make it complete:

I'm using ElasticSearch Verion 0.90.3 and Attachment-Plugin Version 1.9.0

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #11

You're right.

I updated the gist here: https://gist.github.com/dadoonet/58e6787b78621790fd36 using fattach everywhere.
You can update that gist with your actual binary document.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 13 sept. 2013 à 16:16, maximilian.brodhun@googlemail.com a écrit :

Thanks for reply

Yes, I can reproduce the error.

The mapping part of the attachment field is:

        "ftattach": {
            "fields": {
                "author": {
                    "type": "string"
                },
                "content_type": {
                    "type": "string"
                },
                "date": {
                    "format": "dateOptionalTime",
                    "type": "date"
                },
                "ftattach": {
                    "store": "yes",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                },
                "keywords": {
                    "type": "string"
                },
                "name": {
                    "type": "string"
                },
                "title": {
                    "store": "yes",
                    "type": "string"
                }
            },
            "path": "full",
            "type": "attachment"
        },

And my query is (sorry for the german part in the query):

{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"ftattach" : {
"query" : ""Die Hände dir zu reichen, schauert's den Reinen"",
"type" : "phrase",
"slop" : 0
}
}
},
"fields" : ["title" ],
"highlight" : {
"fields" : {
"ftattach" : {
"fragment_size" : 100
}
}
}
}

Am Freitag, 13. September 2013 13:02:19 UTC+2 schrieb David Pilato:
Is there any way to reproduce your error?

Could you gist a curl recreation?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 13 sept. 2013 à 12:39, maximilia...@googlemail.com a écrit :

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting fuction for this field I got in some cases a "error:500". The error message indicates that there is a problem with UTF-8 decoding from the Base64 encoded text.

I actually don't know if there is anything I can do, but I want to report this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(maximilian.brodhun) #12

Hello again,

that's really interessting.
If I execute the script with the content of the field ftattach inside the
script, it gives no results with the query. But the execution says that
there are to many arguments with the field ftattach.

If I make it with

"curl -XPUT "${host}/testindex/metadata/1?refresh" -d @faustattach.json"

the result is the same as before. The highlighting throws the error:

java.lang.StackOverflowError
at java.util.LinkedList.removeFirst(LinkedList.java:266)
at java.util.LinkedList.pop(LinkedList.java:799)
at org.apache.lucene.search.vectorhighlight.XFieldPhraseList.
extractPhrases(XFieldPhraseList.java:81)
at org.apache.lucene.search.vectorhighlight.XFieldPhraseList.
extractPhrases(XFieldPhraseList.java:99)

And another thing is strange for me, if I change the query just a little
bit and delete the word "Die" it is working. I make a gist for this:

Maybe any UTF-8 Errors, I guess.

Am Montag, 16. September 2013 16:00:19 UTC+2 schrieb David Pilato:

You're right.

I updated the gist here:
https://gist.github.com/dadoonet/58e6787b78621790fd36 using fattach
everywhere.
You can update that gist with your actual binary document.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 13 sept. 2013 à 16:16, maximilia...@googlemail.com <javascript:> a
écrit :

Thanks for reply

Yes, I can reproduce the error.

The mapping part of the attachment field is:

        "ftattach": {
            "fields": {
                "author": {
                    "type": "string"
                },
                "content_type": {
                    "type": "string"
                },
                "date": {
                    "format": "dateOptionalTime",
                    "type": "date"
                },
                "ftattach": {
                    "store": "yes",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                },
                "keywords": {
                    "type": "string"
                },
                "name": {
                    "type": "string"
                },
                "title": {
                    "store": "yes",
                    "type": "string"
                }
            },
            "path": "full",
            "type": "attachment"
        },

And my query is (sorry for the german part in the query):

{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"ftattach" : {
"query" : ""Die Hände dir zu reichen, schauert's den Reinen"",
"type" : "phrase",
"slop" : 0
}
}
},
"fields" : ["title" ],
"highlight" : {
"fields" : {
"ftattach" : {
"fragment_size" : 100
}
}
}
}

Am Freitag, 13. September 2013 13:02:19 UTC+2 schrieb David Pilato:

Is there any way to reproduce your error?

Could you gist a curl recreation?

--
David Pilato | Technical Advocate | *Elasticsearch.comhttp://elasticsearch.com/
*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 13 sept. 2013 à 12:39, maximilia...@googlemail.com a écrit :

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting
fuction for this field I got in some cases a "error:500". The error message
indicates that there is a problem with UTF-8 decoding from the Base64
encoded text.

I actually don't know if there is anything I can do, but I want to report
this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #13

I'm afraid your gist does not have content in https://gist.github.com/MaxBro/cccddc6539992b4adeeb#file-faustattach-json.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 17 sept. 2013 à 10:24, maximilian.brodhun@googlemail.com a écrit :

Hello again,

that's really interessting.
If I execute the script with the content of the field ftattach inside the script, it gives no results with the query. But the execution says that there are to many arguments with the field ftattach.

If I make it with

"curl -XPUT "${host}/testindex/metadata/1?refresh" -d @faustattach.json"

the result is the same as before. The highlighting throws the error:

java.lang.StackOverflowError
at java.util.LinkedList.removeFirst(LinkedList.java:266)
at java.util.LinkedList.pop(LinkedList.java:799)
at org.apache.lucene.search.vectorhighlight.XFieldPhraseList.extractPhrases(XFieldPhraseList.java:81)
at org.apache.lucene.search.vectorhighlight.XFieldPhraseList.extractPhrases(XFieldPhraseList.java:99)

And another thing is strange for me, if I change the query just a little bit and delete the word "Die" it is working. I make a gist for this:

https://gist.github.com/MaxBro/cccddc6539992b4adeeb

Maybe any UTF-8 Errors, I guess.

Am Montag, 16. September 2013 16:00:19 UTC+2 schrieb David Pilato:
You're right.

I updated the gist here: https://gist.github.com/dadoonet/58e6787b78621790fd36 using fattach everywhere.
You can update that gist with your actual binary document.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 13 sept. 2013 à 16:16, maximilia...@googlemail.com a écrit :

Thanks for reply

Yes, I can reproduce the error.

The mapping part of the attachment field is:

        "ftattach": {
            "fields": {
                "author": {
                    "type": "string"
                },
                "content_type": {
                    "type": "string"
                },
                "date": {
                    "format": "dateOptionalTime",
                    "type": "date"
                },
                "ftattach": {
                    "store": "yes",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                },
                "keywords": {
                    "type": "string"
                },
                "name": {
                    "type": "string"
                },
                "title": {
                    "store": "yes",
                    "type": "string"
                }
            },
            "path": "full",
            "type": "attachment"
        },

And my query is (sorry for the german part in the query):

{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"ftattach" : {
"query" : ""Die Hände dir zu reichen, schauert's den Reinen"",
"type" : "phrase",
"slop" : 0
}
}
},
"fields" : ["title" ],
"highlight" : {
"fields" : {
"ftattach" : {
"fragment_size" : 100
}
}
}
}

Am Freitag, 13. September 2013 13:02:19 UTC+2 schrieb David Pilato:
Is there any way to reproduce your error?

Could you gist a curl recreation?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 13 sept. 2013 à 12:39, maximilia...@googlemail.com a écrit :

Hello dear all,

I didn't find much about this topic so I don't how good it is known.

I have an index where I store text in a field of the type attachement.

The normal search is working really good! But If I use the highlighting fuction for this field I got in some cases a "error:500". The error message indicates that there is a problem with UTF-8 decoding from the Base64 encoded text.

I actually don't know if there is anything I can do, but I want to report this problem, my be more user have the same problem.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(maximilian.brodhun) #14

Ok, that's also strange.

I add the file in the normal source code of the script. (even if it is kind
of weird) Hope you see it now.