Making order count


(evil faery) #1

Hello everyone.

Im new to ES, could you tell me if I can do next thing with it.

Case: I have a number of lists like:
#1 {item1: "aaa", item2:"bbb", item3:"ccc"}
#2 {item1: "bbb", item2:"ccc", item3:"aaa"}
#3 {item1: "vvv", item2:"aaa", item3:"zzz"}
#4 {item1: "ttt", item2:"www", item3:"xxx"}

I want to find lists that contain aaa,bbb,ccc But the tricky part is
that I want to take order into consideration as well. So that even if
I need lists with aaa,bbb,ccc, sometimes I would want to get #1 as
most relevant result, and sometimes #2.

Thanks.


(Karussell) #2

I would index the doc as you wrote it (3 fields item1, item2 and
item3) and then you you give every field a different boost depending
on your needs:

item1:text1^boost1 AND item2:text2^boost2 AND item3:text3^boost2 ...

Regards,
Peter.

--
http://jetwick.com Personalized Twitter Search

On 1 Mrz., 21:21, evil faery evil.faeryta...@gmail.com wrote:

Hello everyone.

Im new to ES, could you tell me if I can do next thing with it.

Case: I have a number of lists like:
#1 {item1: "aaa", item2:"bbb", item3:"ccc"}
#2 {item1: "bbb", item2:"ccc", item3:"aaa"}
#3 {item1: "vvv", item2:"aaa", item3:"zzz"}
#4 {item1: "ttt", item2:"www", item3:"xxx"}

I want to find lists that contain aaa,bbb,ccc But the tricky part is
that I want to take order into consideration as well. So that even if
I need lists with aaa,bbb,ccc, sometimes I would want to get #1 as
most relevant result, and sometimes #2.

Thanks.


(evil faery) #3

Hi Peter

thanks for your response. I understood your idea, but Im afraid I
didnt fully specify what I need in original post.

The thing is that even though im interested in order, it is
secondary. So if I search the above lists for aaa,bbb,ccc I want to
get #1, #2 and #3.

Where #3 is in the results because it has "aaa" (with lowest score),
#2 because it has all of them (bbb,ccc,aaa) and #1 is highest rated
because it has all of them too AND in the correct order.

Nick.

On Mar 1, 11:45 pm, Karussell tableyourt...@googlemail.com wrote:

I would index the doc as you wrote it (3 fields item1, item2 and
item3) and then you you give every field a different boost depending
on your needs:

item1:text1^boost1 AND item2:text2^boost2 AND item3:text3^boost2 ...

Regards,
Peter.

--http://jetwick.comPersonalized Twitter Search

On 1 Mrz., 21:21, evil faery evil.faeryta...@gmail.com wrote:

Hello everyone.

Im new to ES, could you tell me if I can do next thing with it.

Case: I have a number of lists like:
#1 {item1: "aaa", item2:"bbb", item3:"ccc"}
#2 {item1: "bbb", item2:"ccc", item3:"aaa"}
#3 {item1: "vvv", item2:"aaa", item3:"zzz"}
#4 {item1: "ttt", item2:"www", item3:"xxx"}

I want to find lists that contain aaa,bbb,ccc But the tricky part is
that I want to take order into consideration as well. So that even if
I need lists with aaa,bbb,ccc, sometimes I would want to get #1 as
most relevant result, and sometimes #2.

Thanks.


(cwho) #4

I've done something similar in Lucene before, you want to use one
field for the list, e.g.:

item: "aaa bbb ccc"
item: "bbb aaa ccc"
item: "vvv aaa zzz"

with an analyzer (e.g. whitespace) so each term gets indexed (or you
can try ES lists, which I haven't tried).

Your query for "aaa bbb ccc" with order weighting will look like:

(aaa OR bbb OR ccc) OR "aaa bbb ccc"

the first part matches anything that has one of the three terms - with
higher relevance for more matches, while the second part boosts the
ones that have them in the correct contiguous order. You can use a
fuzzy phrase query instead of a phrase query in the second part if you
want some form of noncontiguous order matches.

On Mar 2, 6:58 am, evil faery evil.faeryta...@gmail.com wrote:

Hi Peter

thanks for your response. I understood your idea, but Im afraid I
didnt fully specify what I need in original post.

The thing is that even though im interested in order, it is
secondary. So if I search the above lists for aaa,bbb,ccc I want to
get #1, #2 and #3.

Where #3 is in the results because it has "aaa" (with lowest score),
#2 because it has all of them (bbb,ccc,aaa) and #1 is highest rated
because it has all of them too AND in the correct order.

Nick.

On Mar 1, 11:45 pm, Karussell tableyourt...@googlemail.com wrote:

I would index the doc as you wrote it (3 fields item1, item2 and
item3) and then you you give every field a different boost depending
on your needs:

item1:text1^boost1 AND item2:text2^boost2 AND item3:text3^boost2 ...

Regards,
Peter.

--http://jetwick.comPersonalizedTwitter Search

On 1 Mrz., 21:21, evil faery evil.faeryta...@gmail.com wrote:

Hello everyone.

Im new to ES, could you tell me if I can do next thing with it.

Case: I have a number of lists like:
#1 {item1: "aaa", item2:"bbb", item3:"ccc"}
#2 {item1: "bbb", item2:"ccc", item3:"aaa"}
#3 {item1: "vvv", item2:"aaa", item3:"zzz"}
#4 {item1: "ttt", item2:"www", item3:"xxx"}

I want to find lists that contain aaa,bbb,ccc But the tricky part is
that I want to take order into consideration as well. So that even if
I need lists with aaa,bbb,ccc, sometimes I would want to get #1 as
most relevant result, and sometimes #2.

Thanks.


(evil faery) #5

Thanks for your suggestion, however that does not solve case where
order match will be partial.

For example I search for aaa, bbb, ccc and have these lists:

#5 zzz, aaa, bbb
#6 aaa, zzz, bbb

In this case i would want #6 to be ranked higher, because "aaa"
position is as in the query - on first place.

On Mar 2, 4:49 am, cwho80 fuzzyb...@gmail.com wrote:

I've done something similar in Lucene before, you want to use one
field for the list, e.g.:

item: "aaa bbb ccc"
item: "bbb aaa ccc"
item: "vvv aaa zzz"

with an analyzer (e.g. whitespace) so each term gets indexed (or you
can try ES lists, which I haven't tried).

Your query for "aaa bbb ccc" with order weighting will look like:

(aaa OR bbb OR ccc) OR "aaa bbb ccc"

the first part matches anything that has one of the three terms - with
higher relevance for more matches, while the second part boosts the
ones that have them in the correct contiguous order. You can use a
fuzzy phrase query instead of a phrase query in the second part if you
want some form of noncontiguous order matches.

On Mar 2, 6:58 am, evil faery evil.faeryta...@gmail.com wrote:

Hi Peter

thanks for your response. I understood your idea, but Im afraid I
didnt fully specify what I need in original post.

The thing is that even though im interested in order, it is
secondary. So if I search the above lists for aaa,bbb,ccc I want to
get #1, #2 and #3.

Where #3 is in the results because it has "aaa" (with lowest score),
#2 because it has all of them (bbb,ccc,aaa) and #1 is highest rated
because it has all of them too AND in the correct order.

Nick.

On Mar 1, 11:45 pm, Karussell tableyourt...@googlemail.com wrote:

I would index the doc as you wrote it (3 fields item1, item2 and
item3) and then you you give every field a different boost depending
on your needs:

item1:text1^boost1 AND item2:text2^boost2 AND item3:text3^boost2 ...

Regards,
Peter.

--http://jetwick.comPersonalizedTwitterSearch

On 1 Mrz., 21:21, evil faery evil.faeryta...@gmail.com wrote:

Hello everyone.

Im new to ES, could you tell me if I can do next thing with it.

Case: I have a number of lists like:
#1 {item1: "aaa", item2:"bbb", item3:"ccc"}
#2 {item1: "bbb", item2:"ccc", item3:"aaa"}
#3 {item1: "vvv", item2:"aaa", item3:"zzz"}
#4 {item1: "ttt", item2:"www", item3:"xxx"}

I want to find lists that contain aaa,bbb,ccc But the tricky part is
that I want to take order into consideration as well. So that even if
I need lists with aaa,bbb,ccc, sometimes I would want to get #1 as
most relevant result, and sometimes #2.

Thanks.


(cwho) #6

arguably in the above case, aaa and bbb are closer in #5 and should be
higher.

If you want to weight specific relations, try a bigram query.

Now put a "start of line" token at the start (other may have
experience using term vectors to get the same result):
so #5 is "start zzz aaa bbb"
and #6 is "start aaa zzz bbb"

then your bigram search for "aaa bbb ccc" is:
aaa OR bbb OR ccc OR ("start aaa") OR ("aaa bbb") OR ("bbb ccc")

Now in the above case you've specified that you want aaa to be in the
first position more than you want aaa to be next to bbb, this can be
tweaked by boosting the various bigram terms:
aaa OR bbb OR ccc OR ("start aaa")^1.5 OR ("aaa bbb") OR ("bbb ccc")

now the "start aaa" bigram term is weighted up, so results where aaa
is in the first term is now higher. Do experiment on your own to find
the fit for your case, but you get the drift....

On Mar 3, 5:55 am, evil faery evil.faeryta...@gmail.com wrote:

Thanks for your suggestion, however that does not solve case where
order match will be partial.

For example I search for aaa, bbb, ccc and have these lists:

#5 zzz, aaa, bbb
#6 aaa, zzz, bbb

In this case i would want #6 to be ranked higher, because "aaa"
position is as in the query - on first place.

On Mar 2, 4:49 am, cwho80 fuzzyb...@gmail.com wrote:

I've done something similar in Lucene before, you want to use one
field for the list, e.g.:

item: "aaa bbb ccc"
item: "bbb aaa ccc"
item: "vvv aaa zzz"

with an analyzer (e.g. whitespace) so each term gets indexed (or you
can try ES lists, which I haven't tried).

Your query for "aaa bbb ccc" with order weighting will look like:

(aaa OR bbb OR ccc) OR "aaa bbb ccc"

the first part matches anything that has one of the three terms - with
higher relevance for more matches, while the second part boosts the
ones that have them in the correct contiguous order. You can use a
fuzzy phrase query instead of a phrase query in the second part if you
want some form of noncontiguous order matches.

On Mar 2, 6:58 am, evil faery evil.faeryta...@gmail.com wrote:

Hi Peter

thanks for your response. I understood your idea, but Im afraid I
didnt fully specify what I need in original post.

The thing is that even though im interested in order, it is
secondary. So if I search the above lists for aaa,bbb,ccc I want to
get #1, #2 and #3.

Where #3 is in the results because it has "aaa" (with lowest score),
#2 because it has all of them (bbb,ccc,aaa) and #1 is highest rated
because it has all of them too AND in the correct order.

Nick.

On Mar 1, 11:45 pm, Karussell tableyourt...@googlemail.com wrote:

I would index the doc as you wrote it (3 fields item1, item2 and
item3) and then you you give every field a different boost depending
on your needs:

item1:text1^boost1 AND item2:text2^boost2 AND item3:text3^boost2 ...

Regards,
Peter.

--http://jetwick.comPersonalizedTwitterSearch

On 1 Mrz., 21:21, evil faery evil.faeryta...@gmail.com wrote:

Hello everyone.

Im new to ES, could you tell me if I can do next thing with it.

Case: I have a number of lists like:
#1 {item1: "aaa", item2:"bbb", item3:"ccc"}
#2 {item1: "bbb", item2:"ccc", item3:"aaa"}
#3 {item1: "vvv", item2:"aaa", item3:"zzz"}
#4 {item1: "ttt", item2:"www", item3:"xxx"}

I want to find lists that contain aaa,bbb,ccc But the tricky part is
that I want to take order into consideration as well. So that even if
I need lists with aaa,bbb,ccc, sometimes I would want to get #1 as
most relevant result, and sometimes #2.

Thanks.


(system) #7