[ES 5.0] Simple Query String - highlight issue

(evert) #1

I have some pdf file indexed and I am using simple_query_string in order to use the very nice features of +, -, | etc.

So here is my search script:

    'stored_fields'    => ['name', 'author', 'editor', 'url'],
    'query' => [
        'simple_query_string' => [
            'query'=> $search_text,
            'fields'=> ['attachment.content^5','_all'],
            'default_operator'=> 'and',
'highlight' => [
    'number_of_fragments' => $fragnumber,
    'fragment_size' => $fragsize,
    'pre_tags' => [
    'post_tags' => [
    'fields' => [

        'attachment.content' => new \stdClass()

The example in the docs it uses double quotes (") in the query.

So, when I search for a term without double quotes, as of: paulo de tarso it brings the results acoordingly with the hightlights.

If I use the double quotes as of: "paulo de tarso" it does not show the highlights....

Has anyone faced this same issue?


(evert) #2

Would it be because of the stop word "de" of the portuguese language?

Shouldn´t the quotes means that to do not ignore any word?

(evert) #3

For the record, the problem was my attachment.content was mapped to use the option:

"term_vector" : "with_positions_offsets",

Which does not work with that type of query somehow... not sure if this is a bug or is the expected.


(David Pilato) #4

Hi @evert

What is wrong with using this option? I mean this is what we advice at https://www.elastic.co/guide/en/elasticsearch/plugins/5.1/mapper-attachments-highlighting.html

Can you reproduce the problem entirely using a pure REST script?

(evert) #5

Hello @dadoonet,

I can send you full report later on this weekend, but here is in a few word... when using brazilian analyzer in the attachment.content field, using this option, in my search it will not bring the highlight... even using the simple query in the highlight_query.

This is what happened.

I will report this issue with all examples and reproduce test, so you guys can check.

A little exhastive after 4 days of research and tests.. I read ALL documentation so far... and will have to re index all my data, 500 pdf books, site will get in production by next week.


(David Pilato) #6

May be this is something related to the analyzer. Did you check what is produced exactly by running a _analyze API?

(evert) #7

Hello @dadoonet,

Just see your message today. Merry Christmas!

I will check that and reply as well.

I am preparing to send as an issue in github, so you guys can check everything.


(evert) #8

Hi @dadoonet,

I opened an issue with the most complete sample as I could. Please let me know if I can improve it.


(system) #9

