Regarding Boosting


(andavar) #1

Dear ElasticSearch Users,
What is difference between index time boosting and Query time boosting.
Here I am using query time boosting. Some scenario it will work fine. But most scenario it wont work.

My query is:
{
"field": {
"my_field": {
"query": "dept1|Executive||dept2|Director||dept3|Individual",
"analyzer": "myPipeAnalyzer",
"boost": 1.4
}
}
}

myPipeAnalyzer Settings:

{
"index": {
"analysis": {
"analyzer": {
"myPipeAnalyzer": {
"type": "pattern",
"flags": "DOTALL",
"lowercase": "true",
"pattern": "\|\|",
"stopwords": "none"
}
}
}
}
}

Please clarify why boosting wont work..

Thanks & Regards,
Andavar


(Clinton Gormley) #2

On Tue, 2012-09-11 at 06:36 -0700, andavar wrote:

Dear ElasticSearch Users,
What is difference between index time boosting and Query time
boosting.

It's just what it says. Index time boosting is set at index time. Query
time boosting can be modified at query time.

Here I am using query time boosting. Some scenario it will work fine. But
most scenario it wont work.

It's not obvious from your example what it is you want to achieve and
why you think it's not working.

clint

My query is:
{
"field": {
"my_field": {
"query":
"dept1|Executive||dept2|Director||dept3|Individual",
"analyzer": "myPipeAnalyzer",
"boost": 1.4
}
}
}

myPipeAnalyzer Settings:

{
"index": {
"analysis": {
"analyzer": {
"myPipeAnalyzer": {
"type": "pattern",
"flags": "DOTALL",
"lowercase": "true",
"pattern": "\|\|",
"stopwords": "none"
}
}
}
}
}

Please clarify why boosting wont work..

Thanks & Regards,
Andavar

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Regarding-Boosting-tp4022608.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


(andavar) #3

thanks Clint.

My query is:
{
"from": 0,
"size": 25,
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"bool": {
"should": [
{
"query_string": {
"query": "test",
"default_operator": "and"
}
}
]
}
},
"should": [
{
"term": {
"myField1": {
"value": true,
"boost": 2.4
}
}
},
{
"field": {
"myField2": {
"query": "dept1|Executive||dep2|Director/Manager||dept3|Individual",
"analyzer": "myPipeAnalyzer",
"boost": 1.4
}
}
}
]

}
},
"filter": {
"and": {
"filters": [
{
"term": {
"myField3": true
}
},
{
"query": {
"field": {
"myField4": "ACT"
}
}
}
]
}
}
}
}
}

In my query highlighted part for relevance related. myField1 value is true in some document and also myField2 value is"dept1|Executive||dep2|Director/Manager||dept3|Individual" like that.

Now I am searching using above mentioned query.

There are 3 scenario fetching the result.

  1. If document have Myfield1 and myField2 with that value means, that documents comes top of the result. Because I applied boost 2 & 1.4.
  2. If any one of that having means, it comes to next in result.
  3. These fields values different means, it comes to last.

These scenario wont work for that query.

Is there any other way for achieved my scenario..

thanks,


(Clinton Gormley) #4

Hi Andavar

There are 3 scenario fetching the result.

  1. If document have Myfield1 and myField2 with that value means, that
    documents comes top of the result. Because I applied boost 2 & 1.4.
  2. If any one of that having means, it comes to next in result.
  3. These fields values different means, it comes to last.

These scenario wont work for that query.

I'd propose doing it somewhat differently.

  1. For myField1-4, those are filters, not queries. So store the data
    correctly and search them using filters. Filters perform better than
    queries and can be cached.

By "store the data correctly" I mean, eg: set the mapping for myField2
to:
{ type: "string", index: "not_analyzed" }

which means that it index the exact value that you pass in
(eg "dept2|Director/Manager") instead of analyzing the text and storing
the terms that come from the analysis,
eg ["dept2", "director", "manager].

You can index your docs with eg:
{
myField2: "dept2|Director/Manager"
}

Or even multiple values:
{
myField2: ["dept1|Executive", "dept2|Director/Manager"]
}

You don't need to use complicated pipe encoding hacks.

To run filters on that field, you should use a term/s filter rather than
a query (see below).

(My solution below probably won't work yet, because your data needs to
be reindexed after mapping the fields correctly).

  1. The 'boost' values that you are using are not absolutes. Instead
    they get factored into the "relevance" score that elasticsearch
    calculates. This will depend on various factors, eg "how common is the
    term 'Executive'". That's not what you want. You're just saying "If a
    document has this field, then sort it higher".

So a better way to do that is to use the custom_filters_score query

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"custom_filters_score" : {
"query" : {
"filtered" : {
"filter" : {
"and" : [
{
"term" : {
"myField3" : true
}
},
{
"term" : {
"myField4" : "ACT"
}
}
]
},
"query" : {
"query_string" : {
"query" : "test",
"default_operator" : "and"
}
}
}
},
"score_mode" : "total",
"filters" : [
{
"boost" : "2.4",
"filter" : {
"term" : {
"myField1" : true
}
}
},
{
"boost" : "1.4",
"filter" : {
"terms" : {
"myField2" : [
"dept1|Executive",
"dept2|Director/Manager",
"dept3|Individual"
]
}
}
}
]
}
}
}
'

I'd have given you a gist showing all the steps, including the mapping
etc, but you only included partial information in your email. The more
info you give, the easier it is to answer a question. The best way is
to provide a complete recreation in curl that we can just copy and paste
to run.

See http://www.elasticsearch.org/help for info about better ways to ask
the question

hope this helps

clint

Note: "myField1" etc are bad names for fields. Give them meaningful
names, so that we (and you) don't have to guess what they mean

--


(andavar) #5

thanks Clinton,


(system) #6