[ES 5.0] Simple Query String - highlight issue

I have some pdf file indexed and I am using simple_query_string in order to use the very nice features of +, -, | etc.

So here is my search script:

[
    'stored_fields'    => ['name', 'author', 'editor', 'url'],
    'query' => [
        'simple_query_string' => [
            'query'=> $search_text,
            'fields'=> ['attachment.content^5','_all'],
            'default_operator'=> 'and',
        ]
],
'highlight' => [
    'number_of_fragments' => $fragnumber,
    'fragment_size' => $fragsize,
    'pre_tags' => [
        "<em><mark>"
    ],
    'post_tags' => [
        "</mark></em>"
    ],
    'fields' => [

        'attachment.content' => new \stdClass()
    ]
]

The example in the docs it uses double quotes (") in the query.

So, when I search for a term without double quotes, as of: paulo de tarso it brings the results acoordingly with the hightlights.

If I use the double quotes as of: "paulo de tarso" it does not show the highlights....

Has anyone faced this same issue?

Thanks!

Would it be because of the stop word "de" of the portuguese language?

Shouldn´t the quotes means that to do not ignore any word?

For the record, the problem was my attachment.content was mapped to use the option:

"term_vector" : "with_positions_offsets",

Which does not work with that type of query somehow... not sure if this is a bug or is the expected.

Cheers!

Hi @evert

What is wrong with using this option? I mean this is what we advice at https://www.elastic.co/guide/en/elasticsearch/plugins/5.1/mapper-attachments-highlighting.html

Can you reproduce the problem entirely using a pure REST script?

Hello @dadoonet,

I can send you full report later on this weekend, but here is in a few word... when using brazilian analyzer in the attachment.content field, using this option, in my search it will not bring the highlight... even using the simple query in the highlight_query.

This is what happened.

I will report this issue with all examples and reproduce test, so you guys can check.

A little exhastive after 4 days of research and tests.. I read ALL documentation so far... and will have to re index all my data, 500 pdf books, site will get in production by next week.

Cheers!

May be this is something related to the analyzer. Did you check what is produced exactly by running a _analyze API?

Hello @dadoonet,

Just see your message today. Merry Christmas!

I will check that and reply as well.

I am preparing to send as an issue in github, so you guys can check everything.

Thanks!

Hi @dadoonet,

I opened an issue with the most complete sample as I could. Please let me know if I can improve it.

Cheers!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.