Need suggestion: How to boost specific documents for a given search term


(Xudong You) #1

Hi ES experts,

I need your help on index design for a real scenario. It might be a long
question, let me try explain it as concise as possible.

We are building a search engine to provide site search for our customers,
the document in index could be something like this:

{ "Path":"http://www.foo.com/doc/abc/1", "Title":"Title 1",
"Description":"The description of doc 1", ... }
{ "Path":"http://www.foo.com/doc/abc/2", "Title":"Title 2",
"Description":"The description of doc 2", ... }
{ "Path":"http://www.foo.com/doc/abc/3", "Title":"Title 3",
"Description":"The description of doc 3", ... }
...

For each query, the returned hit documents are by default sorted by
relevance, but our customer also wants to boost some specific documents
for some keywords,

They will give us the following like boosting configuration XML:

http://www.foo.com/doc/abc/1 http://www.foo.com/doc/abc/2 http://www.foo.com/doc/abc/1 http://www.foo.com/doc/abc/3 http://www.foo.com/doc/abc/2 http://www.foo.com/doc/abc/1

That mean, if user search “keyword1", the top 1 hit document should be the
document whose Path field value is "http://www.foo.com/doc/abc/1",
regardless the relevance score of that document. Similarly, if search
"keyword3", the top 3 hit documents should be
"http://www.foo.com/doc/abc/3", "http://www.foo.com/doc/abc/2" and
"http://www.foo.com/doc/abc/1" respectively.

To satisfy this special requirement, my design is, firstly invert the
original boosting XML to following format:











<keyword value="keyword3" rank=9900" />







Then add a nested field "Boost", which contains a list of keyword/rank
field, to the document as following example:
{
"Boost": [
{ "keyword":"keyword1", "rank": 10000},
{ "keyword":"keyword2", "rank": 9900},
{ "keyword":"keyword3", "rank": 9800}
]
"Path":"http://www.foo.com/doc/abc/1",
"Title":"Title 1",
"Description":"The description of doc 1",
...
}

{
"Boost": [
{ "keyword":"keyword2", "rank": 10000},
{ "keyword":"keyword3", "rank": 9900}
]
"Path":"http://www.foo.com/doc/abc/2",
"Title":"Title 2",
"Description":"The description of doc 2",
...
}

{
"Boost": [
{ "keyword":"keyword3", "rank": 10000}
]
"Path":"http://www.foo.com/doc/abc/3",
"Title":"Title 3",
"Description":"The description of doc 3",
...
}

Then in query time, use nested query to get the rank value of each matched
document for a given search keyword, and use the score script to adjust the
relevance score by the rank value. Since the rank value from boosting XML
is much larger than normal relevance score ( generally less than 5), the
adjusted score of the documents which configured in boosting XML for given
keyword should be top scores.

Does this design work well? Any suggestions to better design?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3eb89d0f-b9a4-4d84-bc04-e0c764b9e314%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Xudong You) #2

Any one has good suggestion?

On Tuesday, April 28, 2015 at 11:01:40 PM UTC+8, Xudong You wrote:

Hi ES experts,

I need your help on index design for a real scenario. It might be a long
question, let me try explain it as concise as possible.

We are building a search engine to provide site search for our customers,
the document in index could be something like this:

{ "Path":"http://www.foo.com/doc/abc/1", "Title":"Title 1",
"Description":"The description of doc 1", ... }
{ "Path":"http://www.foo.com/doc/abc/2", "Title":"Title 2",
"Description":"The description of doc 2", ... }
{ "Path":"http://www.foo.com/doc/abc/3", "Title":"Title 3",
"Description":"The description of doc 3", ... }
...

For each query, the returned hit documents are by default sorted by
relevance, but our customer also wants to boost some specific documents
for some keywords,

They will give us the following like boosting configuration XML:

http://www.foo.com/doc/abc/1 http://www.foo.com/doc/abc/2 http://www.foo.com/doc/abc/1 http://www.foo.com/doc/abc/3 http://www.foo.com/doc/abc/2 http://www.foo.com/doc/abc/1

That mean, if user search “keyword1", the top 1 hit document should be the
document whose Path field value is "http://www.foo.com/doc/abc/1",
regardless the relevance score of that document. Similarly, if search
"keyword3", the top 3 hit documents should be "
http://www.foo.com/doc/abc/3", "http://www.foo.com/doc/abc/2" and "
http://www.foo.com/doc/abc/1" respectively.

To satisfy this special requirement, my design is, firstly invert the
original boosting XML to following format:











<keyword value="keyword3" rank=9900" />







Then add a nested field "Boost", which contains a list of keyword/rank
field, to the document as following example:
{
"Boost": [
{ "keyword":"keyword1", "rank": 10000},
{ "keyword":"keyword2", "rank": 9900},
{ "keyword":"keyword3", "rank": 9800}
]
"Path":"http://www.foo.com/doc/abc/1",
"Title":"Title 1",
"Description":"The description of doc 1",
...
}

{
"Boost": [
{ "keyword":"keyword2", "rank": 10000},
{ "keyword":"keyword3", "rank": 9900}
]
"Path":"http://www.foo.com/doc/abc/2",
"Title":"Title 2",
"Description":"The description of doc 2",
...
}

{
"Boost": [
{ "keyword":"keyword3", "rank": 10000}
]
"Path":"http://www.foo.com/doc/abc/3",
"Title":"Title 3",
"Description":"The description of doc 3",
...
}

Then in query time, use nested query to get the rank value of each matched
document for a given search keyword, and use the score script to adjust the
relevance score by the rank value. Since the rank value from boosting XML
is much larger than normal relevance score ( generally less than 5), the
adjusted score of the documents which configured in boosting XML for given
keyword should be top scores.

Does this design work well? Any suggestions to better design?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22af3d13-4d44-4550-9396-96d2974634ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3