A request for help with how to solve my seemingly tricky problem in ElasticSearch!


(James-2) #1

Hi all!

I'm trying to solve a problem of searching across 2 'entities' and sorting
and paging the results. I have been unable to come up with a solution so
far, so thought I would ask here, and would greatly appreciate anyone's
help!

I have considered storing these entities in various ways; as separate
documents (using a parent/child relationship), as a combined document and
as nested documents. In each case I hit problems!

The complexity is around how we want to group, sort, page and count the
results.

I have tried to outline the problem below.

Please let me know what you think!

Cheers, James.

curl -XPUT localhost:9200/jobs/company/1 -d'{
"name": "Company A",
"location": "America"
}'

curl -XPUT localhost:9200/jobs/company/2 -d'{
"name": "Company B",
"location": "Britain"
}'

curl -XPUT localhost:9200/jobs/employee/1 -d'{
"name": "Alice",
"job_title": "Astronaut"
}'

curl -XPUT localhost:9200/jobs/employee/2 -d'{
"name": "Bob",
"job_title": "Builder"
}'

====

Alice works for Company A and Company B

Bob works for Company B

The relationship between Employee and Company is many-to-many.

====

Example search: Employees that are Astronauts that work for Companies in
America or Britain

Desired result count: 1 (Only 1 Alice, despite the fact she works for 2
Companies)

Desired display:

Company A
Alice
Company B
Alice

So the result count is the count of unique Employees, and each page should
display a fixed number of Employees, but the results are grouped by Company
and primarily sorted by properties of Company (name, location etc.)

Example search: All

Company A
Alice
Company B
Alice
Bob

--


(David Pilato) #2

Hi James,

Here is what I think about your use case. Seems that you are thinking
relationnal in a NoSQL world.

Your point of interest seems to be Employees. So let's index employees.

curl -XPUT localhost:9200/jobs/employee/1 -d'{
"name":"Alice",
"job_title":"Astronaut",
"companies":[
{
"name":"Company A",
"location":"America"
},
{
"name":"Company B",
"location":"Britain"
}
]
}'

curl -XPUT localhost:9200/jobs/employee/2 -d'{
"name":"Bob",
"job_title":"Builder",
"companies":[
{
"name":"Company A",
"location":"America"
}
]
}'

So now, you can find all employees in America that are Astronaut with a
boolQuery on job_title and companies.location.

Does it make sense for your use case?

HTH
David.

Le 14 octobre 2012 à 16:09, James james.p523@yahoo.com a écrit :

Hi all!

I'm trying to solve a problem of searching across 2 'entities' and sorting
and paging the results. I have been unable to come up with a solution so far,
so thought I would ask here, and would greatly appreciate anyone's help!

I have considered storing these entities in various ways; as separate
documents (using a parent/child relationship), as a combined document and as
nested documents. In each case I hit problems!

The complexity is around how we want to group, sort, page and count the
results.

I have tried to outline the problem below.

Please let me know what you think!

Cheers, James.

curl -XPUT localhost:9200/jobs/company/1 -d'{
"name": "Company A",
"location": "America"
}'

curl -XPUT localhost:9200/jobs/company/2 -d'{
"name": "Company B",
"location": "Britain"
}'

curl -XPUT localhost:9200/jobs/employee/1 -d'{
"name": "Alice",
"job_title": "Astronaut"
}'

curl -XPUT localhost:9200/jobs/employee/2 -d'{
"name": "Bob",
"job_title": "Builder"
}'

====

Alice works for Company A and Company B

Bob works for Company B

The relationship between Employee and Company is many-to-many.

====

Example search: Employees that are Astronauts that work for Companies in
America or Britain

Desired result count: 1 (Only 1 Alice, despite the fact she works for 2
Companies)

Desired display:

Company A
Alice
Company B
Alice

So the result count is the count of unique Employees, and each page should
display a fixed number of Employees, but the results are grouped by Company
and primarily sorted by properties of Company (name, location etc.)

Example search: All

Company A
Alice
Company B
Alice
Bob

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(James-2) #3

Hi David,

Firstly, thanks very much for taking the time to reply; I really appreciate
it! :slight_smile:

If I do as you've suggested, then the result count is as desired, but I
cannot see how to do pagination, as we want to group Employees by Company,
sort those Companies by Company properties (e.g. name), and then display
pages of results where each page has a fixed number of Employees on it.

What would you suggest?

Thanks, James.

--


(David Pilato) #4

Grouping is not yet available. See https://github.com/elasticsearch/elasticsearch/pull/2326

So what I would probably do is to create one first query with a TermFacet on company to get all companies and then create a multi search with each company.
See http://www.elasticsearch.org/guide/reference/api/multi-search.html

It will cost you 2 queries but I think the number of companies will not grow so fast so you can cache it on client side?

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 15 oct. 2012 à 12:56, James james.p523@yahoo.com a écrit :

Hi David,

Firstly, thanks very much for taking the time to reply; I really appreciate it! :slight_smile:

If I do as you've suggested, then the result count is as desired, but I cannot see how to do pagination, as we want to group Employees by Company, sort those Companies by Company properties (e.g. name), and then display pages of results where each page has a fixed number of Employees on it.

What would you suggest?

Thanks, James.

--

--


(system) #5