have bold, italic and uppercase texts (HTML) have higher scores?
in to matches in 1 phrase. So when searching for "fear of dark" it will also
match if fear and dark appear in different sentences ("Then it became dark.
He had fear of cats."). Is there a way to force the result to be in the same
sentence?
multi-word phrase search (the closer words appear the higher the scoring?)
Thanks,
Yannick
Could somebody help me with my Monday questions? Even if you only know the
answer to 1 of them it would be very helpful.
Thanks,
Yannick
From: Yannick Smits [mailto:mailinglists@goyaweb.nl]
Sent: maandag 20 juni 2011 23:15
To: users@elasticsearch.com
Subject: boost, scoring and phrase search
have bold, italic and uppercase texts (HTML) have higher scores?
in to matches in 1 phrase. So when searching for "fear of dark" it will also
match if fear and dark appear in different sentences ("Then it became dark.
He had fear of cats."). Is there a way to force the result to be in the same
sentence?
multi-word phrase search (the closer words appear the higher the scoring?)
Thanks,
Yannick
Hi Yannick
On Mon, 2011-06-20 at 23:15 +0200, Yannick Smits wrote:
Yes. Look at the Query DSL docs - most queries take a 'boost'
parameter.
instance have bold, italic and uppercase texts (HTML) have higher
scores?
I don't think you can (at least not currently)
results in to matches in 1 phrase. So when searching for âfear of
darkâ it will also match if fear and dark appear in different
sentences (âThen it became dark. He had fear of cats.â). Is there a
way to force the result to be in the same sentence?
I think you mean a 'text' query, not a 'text_phrase' query. A text
query will match text that contains the same words. A text_phrase query
will find text that includes exactly the same phrase (ignoring stop
words like 'of').
You can set the slop factor for text_phrase queries so that the words
don't have to be right next to each other.
multi-word phrase search (the closer words appear the higher the
scoring?)
With the text_phrase query, and slop, yes: proximity is taken into
account.
clint
Hi Clinton,
-
I had a look at the docs but could not find a way to specify a boost value based on indices the documents are in, at query time, only as a configuration/static (Elasticsearch Platform — Find real-time answers at scale | Elastic). What am I missing?
-
could you think of a strategy to simulate such a behavior? Like extracting the bold words and saving them with a higher boost to a different field or something without screwing up the highlighting feature?
-
yes, I'm using text_phrase. But still I would like to know if it is possible to have it look only within the phrase for the specified terms instead of matching over multiple phrases.
Thanks,
Yannick
-----Original Message-----
From: Clinton Gormley [mailto:clinton@iannounce.co.uk]
Sent: woensdag 22 juni 2011 13:41
To: users@elasticsearch.com
Subject: Re: boost, scoring and phrase search
Hi Yannick
On Mon, 2011-06-20 at 23:15 +0200, Yannick Smits wrote:
Yes. Look at the Query DSL docs - most queries take a 'boost'
parameter.
instance have bold, italic and uppercase texts (HTML) have higher
scores?
I don't think you can (at least not currently)
results in to matches in 1 phrase. So when searching for “fear of
dark” it will also match if fear and dark appear in different
sentences (“Then it became dark. He had fear of cats.”). Is there a
way to force the result to be in the same sentence?
I think you mean a 'text' query, not a 'text_phrase' query. A text query will match text that contains the same words. A text_phrase query will find text that includes exactly the same phrase (ignoring stop words like 'of').
You can set the slop factor for text_phrase queries so that the words don't have to be right next to each other.
multi-word phrase search (the closer words appear the higher the
scoring?)
With the text_phrase query, and slop, yes: proximity is taken into account.
clint
Hi Yannick
- I had a look at the docs but could not find a way to specify a
boost value based on indices the documents are in, at query time, only
as a configuration/static
(Elasticsearch Platform — Find real-time answers at scale | Elastic). What am I missing?
The page you link to above does not say that indices_boost is static -
it is a parameter that you can pass to any search query.
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"match_all" : {}
},
"indices_boost" : {
"index_foo" : 10,
"index_bar" : 1
}
}
'
- could you think of a strategy to simulate such a behavior? Like
extracting the bold words and saving them with a higher boost to a
different field or something without screwing up the highlighting
feature?
This would require some work on the client side. You could have two
fields: 'text' and 'important_text'. 'text' would contain all of the
text, and 'important_text' just the bits inside the tags.
Then you can use a bool query to boost anything found in important_text,
but only do the highlighting on the 'text' field.
- yes, I'm using text_phrase. But still I would like to know if it is
possible to have it look only within the phrase for the specified
terms instead of matching over multiple phrases.
ES has no concept of sentences. I thought of possibly breaking up the
content into individual sentences, eg:
[ 'The quick brown fox', 'jumped over the lazy dog']
but it looks like ES just concatenates these values anyway:
[Wed Jun 22 16:22:35 2011] Protocol: http, Server: 192.168.5.103:9200
curl -XPOST 'http://127.0.0.1:9200/foo/bar?pretty=1' -d '
{
"text" : [
"The quick brown fox",
"jumped over the lazy dog"
]
}
'
[Wed Jun 22 16:22:35 2011] Response:
{
"ok" : true,
"_index" : "foo",
"_id" : "-Dt7zDUCQKauV69L_32w9g",
"_type" : "bar",
"_version" : 1
}
[Wed Jun 22 16:22:49 2011] Protocol: http, Server: 192.168.5.103:9200
curl -XGET 'http://127.0.0.1:9200/foo/_search?pretty=1' -d '
{
"query" : {
"text_phrase" : {
"text" : "brown jumped"
}
}
}
'
[Wed Jun 22 16:22:49 2011] Response:
{
"hits" : {
"hits" : ,
"max_score" : null,
"total" : 0
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 2
}
[Wed Jun 22 16:22:54 2011] Protocol: http, Server: 192.168.5.103:9200
curl -XGET 'http://127.0.0.1:9200/foo/_search?pretty=1' -d '
{
"query" : {
"text_phrase" : {
"text" : "fox jumped"
}
}
}
'
[Wed Jun 22 16:22:54 2011] Response:
{
"hits" : {
"hits" : [
{
"_source" : {
"text" : [
"The quick brown fox",
"jumped over the lazy dog"
]
},
"_score" : 0.23013961,
"_index" : "foo",
"_id" : "-Dt7zDUCQKauV69L_32w9g",
"_type" : "bar"
}
],
"max_score" : 0.23013961,
"total" : 1
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 2
}