Call for ideas


(Philippe Vaucher) #1

Hello,

I'm currently using elasticsearch for an intranet which main goal is to
manage users and invoices.
I created an index for the users with their invoices as an array of objects
containing the invoices informations.

I'm able to search for users just fine, but I wonder how I am supposed to
search in the invoices... for example, if I want to search for all invoices
with values between 0 and 500 and which user name is "frank" (lucene
search):

"name:frank AND invoices.value:[0 TO 500]"

This "works" fine in the sense that it returns all users named frank that
have at least one invoice with value between 0 and 500, but in what is
returned it also return the other invoices that are over 500 for those
users.

Is there a way to make it filter the list of invoices so only the ones
matching are returned?
Or maybe I should create two indexes, one for the users and one for the
invoices? But doens't that duplicate a lot of data?
Or maybe there is a way to represent/query relational data in elasticsearch
by indexing it differently?

Thanks,
Philippe


(David Pilato) #2

Yes. Duplicate data.
If you are searching invoice, you should index invoices (as a type, not as an index)
In your invoice, put all what you know about the user and query on it.

My 2 cents.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 23 mai 2012 à 18:44, Philippe Vaucher philippe.vaucher@gmail.com a écrit :

Hello,

I'm currently using elasticsearch for an intranet which main goal is to manage users and invoices.
I created an index for the users with their invoices as an array of objects containing the invoices informations.

I'm able to search for users just fine, but I wonder how I am supposed to search in the invoices... for example, if I want to search for all invoices with values between 0 and 500 and which user name is "frank" (lucene search):

"name:frank AND invoices.value:[0 TO 500]"

This "works" fine in the sense that it returns all users named frank that have at least one invoice with value between 0 and 500, but in what is returned it also return the other invoices that are over 500 for those users.

Is there a way to make it filter the list of invoices so only the ones matching are returned?
Or maybe I should create two indexes, one for the users and one for the invoices? But doens't that duplicate a lot of data?
Or maybe there is a way to represent/query relational data in elasticsearch by indexing it differently?

Thanks,
Philippe


(Philippe Vaucher) #3

Yes. Duplicate data.
If you are searching invoice, you should index invoices (as a type, not as
an index)
In your invoice, put all what you know about the user and query on it.

Well sometimes I'll be searching Users, sometimes I'll be searching
Invoices. I think creating two indexes makes more sense, especially if I
want to list each invoices properties.

Anyway, so be it! Thank you for confirming that elasticsearch relations are
one-way only and that its model is really a tree, and thus you need to
create multiple indexes.

Philippe


(David Pilato) #4

Hi Philippe,

Just a few things I would like to add.

I did not say that you can not create relations. You can do it with
parent/child and nested concepts but I really prefer to avoid that
complexity if it's only to save some disk space...

I understood that you want also to manage your users.

So you can have an index named : myindex with two types : invoice and user
You could also have one index invoice with a type invoice and one index
user with a type user (this is what you are talking about I think).

I prefer the first option as I will have to manage my index settings only
once, but it really depends on your use case. If you want to set a replica
factor different for each type, you will have to separate them in many
indices.

Cheers
David.

Le 24 mai 2012 à 15:31, Philippe Vaucher philippe.vaucher@gmail.com a
écrit :

If you are searching invoice, you should index invoices (as a type, not as
an index)
In your invoice, put all what you know about the user and query on it.

Well sometimes I'll be searching Users, sometimes I'll be searching
Invoices. I think creating two indexes makes more sense, especially if I
want to list each invoices properties.

Anyway, so be it! Thank you for confirming that elasticsearch relations are
one-way only and that its model is really a tree, and thus you need to
create multiple indexes.

Philippe

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(jschelle-2) #5

Your could also use a nested query if your invoices are objects inside users

http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html

You could also have user be a type and invoice be a type and simply have
the invoice type contain a field like userId. This way only the userId is
duplicated and not all the data.

On Thursday, May 24, 2012 9:31:54 AM UTC-4, Philippe Vaucher wrote:

Yes. Duplicate data.

If you are searching invoice, you should index invoices (as a type, not
as an index)
In your invoice, put all what you know about the user and query on it.

Well sometimes I'll be searching Users, sometimes I'll be searching
Invoices. I think creating two indexes makes more sense, especially if I
want to list each invoices properties.

Anyway, so be it! Thank you for confirming that elasticsearch relations
are one-way only and that its model is really a tree, and thus you need to
create multiple indexes.

Philippe


(Philippe Vaucher) #6

I did not say that you can not create relations. You can do it with
parent/child and nested concepts but I really prefer to avoid that
complexity if it's only to save some disk space...

I see, so I have two solutions:

  1. Create a user type which has all the invoices as properties, and
    create an invoice type that have its user information as properties. This
    duplicates data but then things are simple, if I want user I search in the
    users and if I want invoices I search in the invoices.
  2. Create one user type that has all the invoices as nested
    properties, then use a special search to tell elasticsearch to search and
    return me invoices instead of users. This doesn't duplicate data but it's
    more complicated to set up.

So you can have an index named : myindex with two types : invoice and user

You could also have one index invoice with a type invoice and one index
user with a type user (this is what you are talking about I think).

Yes I realised that I confused index and type as in my setup they're named
the same :slight_smile:

Thanks!
Philippe


(Philippe Vaucher) #7

Your could also use a nested query if your invoices are objects inside
users
http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html

Thanks for this interesting alternative.

You could also have user be a type and invoice be a type and simply have

the invoice type contain a field like userId. This way only the userId is
duplicated and not all the data.

Hum, good idea, tho it requires a bit more work than simply duplicating the
data because I'd have to load the results from the database instead of just
using the elasticsearch results.

Philippe


(system) #8