Sorting the data based on frequency of term existing in all documents


(narinder.izap) #1

Hi Shay,

        I am facing some problem in SORTING. I need to know whether my 

requirements in the sorting are implementable in ES or not???
The scenario is :

I have an index as relations which is storing the the relationship between
two entities

The structure of the document is :

{
guid_1:11,
relationship:"fan_of",
guid_2:"12"
}

{
guid_1:11,
relationship:"follower_of",
guid_2:"12"
}

{
guid_1:11,
relationship:"fan_of",
guid_2:"13"
}

{
guid_1:22,
relationship:"fan_of",
guid_2:"12"
}

{
guid_1:22,
relationship:"fan_of",
guid_2:"14"
}

These document stores the relations. The first document represents that
guid_1 is a fan of guid_2. similarly, the second doc presents the guid_1 is
follower of guid_2.

So Now I need to find all the fan_of relations doc of guid_1 = 11 so that
are documents with guid_2 = 12,13 but sorting the document with guid_2 = 12
should be sorted first and then guid_2= 13.

So I need to search the fan_of relations of sorted on number of fan the
guid_2 is having. Is there any way to solve this query???

Please give your suggestion.

Narinder Kaur

--


(phill) #2

This is really a type of n:n cross linking table which is great in a
database to join and do counts etc., but in a document retrieval system
it is not a perfect fit, since there is no ability to have two 'foreign
keys' in an ES document, documents are most often structured (hence the
use of Json) or made up many fields with various tokenizing and parsing
characteristics, so I think you'll have to do any such merging in the
client that calls ES.

This is another example where some de-normalizing is probably part of
the solution.

Do you really want to know who is the most followed and who has the most
fans?

If so, this is not a sort it is a count, but in SQL you would do sort
and grouping and a count, but that's not very much in the style of
document retrieval.

-Paul

--


(narinder.izap) #3

I need to find the entities whom I have made fan and that result set should
be sorted by number of fan in descending order.

On Thursday, August 30, 2012 5:46:04 AM UTC+5:30, P Hill wrote:

This is really a type of n:n cross linking table which is great in a
database to join and do counts etc., but in a document retrieval system
it is not a perfect fit, since there is no ability to have two 'foreign
keys' in an ES document, documents are most often structured (hence the
use of Json) or made up many fields with various tokenizing and parsing
characteristics, so I think you'll have to do any such merging in the
client that calls ES.

This is another example where some de-normalizing is probably part of
the solution.

Do you really want to know who is the most followed and who has the most
fans?

If so, this is not a sort it is a count, but in SQL you would do sort
and grouping and a count, but that's not very much in the style of
document retrieval.

-Paul

--


(phill) #4

Another case were a working example, or at least an example of results
you expect would be useful.
For example, you should use your original "relation" documents and
invent some "people" to go with them.
I assume the guid_1 etc refers to some other type of document (aka
another table) like "people".
Then describe all the results you need, regardless of how you get them.

But I'd look into denormalizing across a bunch of children documents
with different children documents for
fan and its opposite fan_of (there is that demormalizing!), and others
for follower and its opposite follower_of. Then I could imagine one
query to get some person docs with the most fan children. Then in
another query which uses the parent IDs found in query 1 and the
opposite relationship, fan_of, find all the people documents who are the
fans.

But I'd have to test if the queries were fast enough for the job.

-Paul

On 8/29/2012 10:12 PM, Narinder Kaur wrote:

I need to find the entities whom I have made fan and that result set
should be sorted by number of fan in descending order.

--


(system) #5