evert
(evert)
December 12, 2016, 6:57pm
1
I have some pdf file indexed and I am using simple_query_string in order to use the very nice features of +, -, | etc.
So here is my search script:
[
'stored_fields' => ['name', 'author', 'editor', 'url'],
'query' => [
'simple_query_string' => [
'query'=> $search_text,
'fields'=> ['attachment.content^5','_all'],
'default_operator'=> 'and',
]
],
'highlight' => [
'number_of_fragments' => $fragnumber,
'fragment_size' => $fragsize,
'pre_tags' => [
"<em><mark>"
],
'post_tags' => [
"</mark></em>"
],
'fields' => [
'attachment.content' => new \stdClass()
]
]
The example in the docs it uses double quotes (") in the query.
So, when I search for a term without double quotes, as of: paulo de tarso it brings the results acoordingly with the hightlights .
If I use the double quotes as of: "paulo de tarso" it does not show the highlights ....
Has anyone faced this same issue?
Thanks!
evert
(evert)
December 12, 2016, 6:58pm
2
Would it be because of the stop word "de" of the portuguese language?
Shouldn´t the quotes means that to do not ignore any word?
evert
(evert)
December 22, 2016, 12:57am
3
For the record, the problem was my attachment.content was mapped to use the option:
"term_vector" : "with_positions_offsets",
Which does not work with that type of query somehow... not sure if this is a bug or is the expected.
Cheers!
dadoonet
(David Pilato)
December 22, 2016, 10:06am
4
Hi @evert
What is wrong with using this option? I mean this is what we advice at https://www.elastic.co/guide/en/elasticsearch/plugins/5.1/mapper-attachments-highlighting.html
Can you reproduce the problem entirely using a pure REST script?
evert
(evert)
December 22, 2016, 10:28am
5
Hello @dadoonet ,
I can send you full report later on this weekend, but here is in a few word... when using brazilian analyzer in the attachment.content field, using this option, in my search it will not bring the highlight... even using the simple query in the highlight_query.
This is what happened.
I will report this issue with all examples and reproduce test, so you guys can check.
A little exhastive after 4 days of research and tests.. I read ALL documentation so far... and will have to re index all my data, 500 pdf books, site will get in production by next week.
Cheers!
dadoonet
(David Pilato)
December 22, 2016, 10:40am
6
May be this is something related to the analyzer. Did you check what is produced exactly by running a _analyze
API?
evert
(evert)
December 25, 2016, 2:17pm
7
Hello @dadoonet ,
Just see your message today. Merry Christmas!
I will check that and reply as well.
I am preparing to send as an issue in github, so you guys can check everything.
Thanks!
evert
(evert)
December 25, 2016, 10:34pm
8
Hi @dadoonet ,
I opened an issue with the most complete sample as I could. Please let me know if I can improve it.
Cheers!
system
(system)
Closed
January 22, 2017, 10:34pm
9
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.