Should I even try to use elasticsearch to search across database many-to-many relationships?

I thought I'd run this by the group to see if anyone had any input.

We are looking to improve the search capabilities of our web app, which is
a large set of forms with a large schema containing many relationships. We
currently use hibernate so it's database agnostic, and just used db
queries. But now we'd like use start using some synonyms, and soundex
searches for names, as well as search the full text of some documents and
and a few database columns containing large bodies of text for reports.

I'm pretty sure that these requirements can be done with the database
capabilities, but obviously it's be a different implementation per DB.
Also, hibernate doesn't seem to support these advanced features, so we'd be
going back to native queries, with multiple implementations for those as
well.

I've done a lot of reading on elasticsearch, and am just about to start a
proof of concept to see how hard it'll end up being. The real question is,
with so many interconnected entities, does a search index really make
sense? We'd have to do a lot of housekeeping to keep the indexes up to
date. The biggest example of the problem is addresses. An address can be
linked to about of 40 different entities. So if we wanted to search on a
name for example, and find all of their previous addresses, we'd have to
store the addresses in every name document. Then if an address changes,
we'd have to go find all names that had that address, and update it. As
well as 40 other entities.

So.. as you can see and probably already know, many to many relationships,
where one doesn't own the other seems pretty hard to search across. If you
have any thoughts or experience that you could share, I'd appreciate it
greatly. Alternatives as well.

Steve

--

Hi Steve,

I came to Elasticsearch with exactly the same needs.

I was using Hibernate and my first thought was to use Hibernate Search for my
project. But, can not use easily HBSearch in a multinode env.
And I found Elasticsearch that covered all my needs and much more!

The most difficult part is for me to stop thinking relationnal in this NoSQL
world.
So, to answer to your questions, just design your documents as you want to
search for it.

The address problem is a common use case.

Imagine that your are designing a search engine for invoices. When an invoice is
generated, you want to take a snapshot that won't change. So the address is part
of your invoice document. In fact, you are not searching for address but for
invoices!

So, Index what you are searching for with all elements you need inside to
perform this search. Think Document as your top level entity in hibernate (will
all collections needed for search). You can also flatten your elements and
ignore not searchable fields.

But, in some use case, you need to still have "relations" between documents. You
can look at the parent/child feature.

At first, I was only using Jackson to convert directly my Hibernate entities to
JSon and push them into ES, but I finally came to use specific Search objects
that were serialized with Jackson and sent to ES.

Does it help?
David.

Le 11 janvier 2013 à 03:48, thatguy1177@gmail.com a écrit :

I thought I'd run this by the group to see if anyone had any input.

We are looking to improve the search capabilities of our web app, which is a
large set of forms with a large schema containing many relationships. We
currently use hibernate so it's database agnostic, and just used db queries.
But now we'd like use start using some synonyms, and soundex searches for
names, as well as search the full text of some documents and and a few
database columns containing large bodies of text for reports.

I'm pretty sure that these requirements can be done with the database
capabilities, but obviously it's be a different implementation per DB. Also,
hibernate doesn't seem to support these advanced features, so we'd be going
back to native queries, with multiple implementations for those as well.

I've done a lot of reading on elasticsearch, and am just about to start a
proof of concept to see how hard it'll end up being. The real question is,
with so many interconnected entities, does a search index really make sense?
We'd have to do a lot of housekeeping to keep the indexes up to date. The
biggest example of the problem is addresses. An address can be linked to about
of 40 different entities. So if we wanted to search on a name for example, and
find all of their previous addresses, we'd have to store the addresses in
every name document. Then if an address changes, we'd have to go find all
names that had that address, and update it. As well as 40 other entities.

So.. as you can see and probably already know, many to many relationships,
where one doesn't own the other seems pretty hard to search across. If you
have any thoughts or experience that you could share, I'd appreciate it
greatly. Alternatives as well.

Steve

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Thanks for your input. Unfortunately, my data is not like invoices, that
are static. Both sides of the data can change, and one side doesn't own the
relationship. So I'm not sure that the parent child relationship would be
appropriate. Especially since each person can have multiple addresses.

Another common example would be an author and book. Authors have many books
and books have many authors. If you had a typo in à book title you would
have to reindex it and all of its authors.
On Jan 11, 2013 3:11 AM, "David Pilato" david@pilato.fr wrote:

**
Hi Steve,

I came to Elasticsearch with exactly the same needs.

I was using Hibernate and my first thought was to use Hibernate Search
for my project. But, can not use easily HBSearch in a multinode env.
And I found Elasticsearch that covered all my needs and much more!

The most difficult part is for me to stop thinking relationnal in this
NoSQL world.
So, to answer to your questions, just design your documents as you want
to search for it.

The address problem is a common use case.

Imagine that your are designing a search engine for invoices. When an
invoice is generated, you want to take a snapshot that won't change. So the
address is part of your invoice document. In fact, you are not searching
for address but for invoices!

So, Index what you are searching for with all elements you need inside to
perform this search. Think Document as your top level entity in hibernate
(will all collections needed for search). You can also flatten your
elements and ignore not searchable fields.

But, in some use case, you need to still have "relations" between
documents. You can look at the parent/child feature.

At first, I was only using Jackson to convert directly my Hibernate
entities to JSon and push them into ES, but I finally came to use specific
Search objects that were serialized with Jackson and sent to ES.

Does it help?
David.

Le 11 janvier 2013 à 03:48, thatguy1177@gmail.com a écrit :

I thought I'd run this by the group to see if anyone had any input.

We are looking to improve the search capabilities of our web app, which
is a large set of forms with a large schema containing many relationships.
We currently use hibernate so it's database agnostic, and just used db
queries. But now we'd like use start using some synonyms, and soundex
searches for names, as well as search the full text of some documents and
and a few database columns containing large bodies of text for reports.

I'm pretty sure that these requirements can be done with the database
capabilities, but obviously it's be a different implementation per DB.
Also, hibernate doesn't seem to support these advanced features, so we'd be
going back to native queries, with multiple implementations for those as
well.

I've done a lot of reading on elasticsearch, and am just about to start a
proof of concept to see how hard it'll end up being. The real question is,
with so many interconnected entities, does a search index really make
sense? We'd have to do a lot of housekeeping to keep the indexes up to
date. The biggest example of the problem is addresses. An address can be
linked to about of 40 different entities. So if we wanted to search on a
name for example, and find all of their previous addresses, we'd have to
store the addresses in every name document. Then if an address changes,
we'd have to go find all names that had that address, and update it. As
well as 40 other entities.

So.. as you can see and probably already know, many to many
relationships, where one doesn't own the other seems pretty hard to search
across. If you have any thoughts or experience that you could share, I'd
appreciate it greatly. Alternatives as well.

Steve

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

Caveat: I'm not an ES expert =)

I imagine you've read this link, but in case you haven't, here is a good
tutorial on ES parent/child mapping:
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Honestly, I've had a lot of problems with the parent/child mappings. It
works great...until it doesn't. I found it more flexible to split docs
into separate types. My experience was before the has_parent filter was
implemented, so this may be outdated advice.

A somewhat kludgy fix is to remove addresses to their own type, and
associate IDs between the entity and the address. You would then search
the original entity (say, search for a name) and perform a secondary query
for the address type to retrieve all the previous addresses. Updating
either address or entity requires only updating the doc and not all
references. It's basically mimicking the parent/child relationship, except
it provides more flexibility since the docs are in their own types. It
does, however, require more queries to get the data that you need.

Alternatively (this is just spitballing now) perhaps index things that you
know will need searching and simply store a database ID in the document.
You can use ES to search and then perform a DB query for all relational
values that don't need search capability.

Curious to see what the experts have to say...I'm still new to these more
complicated setups.

-Zach

On Friday, January 11, 2013 4:02:40 AM UTC-5, Steve Miller wrote:

Thanks for your input. Unfortunately, my data is not like invoices, that
are static. Both sides of the data can change, and one side doesn't own the
relationship. So I'm not sure that the parent child relationship would be
appropriate. Especially since each person can have multiple addresses.

Another common example would be an author and book. Authors have many
books and books have many authors. If you had a typo in à book title you
would have to reindex it and all of its authors.
On Jan 11, 2013 3:11 AM, "David Pilato" <da...@pilato.fr <javascript:>>
wrote:

**
Hi Steve,

I came to Elasticsearch with exactly the same needs.

I was using Hibernate and my first thought was to use Hibernate Search
for my project. But, can not use easily HBSearch in a multinode env.
And I found Elasticsearch that covered all my needs and much more!

The most difficult part is for me to stop thinking relationnal in this
NoSQL world.
So, to answer to your questions, just design your documents as you want
to search for it.

The address problem is a common use case.

Imagine that your are designing a search engine for invoices. When an
invoice is generated, you want to take a snapshot that won't change. So the
address is part of your invoice document. In fact, you are not searching
for address but for invoices!

So, Index what you are searching for with all elements you need inside
to perform this search. Think Document as your top level entity in
hibernate (will all collections needed for search). You can also flatten
your elements and ignore not searchable fields.

But, in some use case, you need to still have "relations" between
documents. You can look at the parent/child feature.

At first, I was only using Jackson to convert directly my Hibernate
entities to JSon and push them into ES, but I finally came to use specific
Search objects that were serialized with Jackson and sent to ES.

Does it help?
David.

Le 11 janvier 2013 à 03:48, thatg...@gmail.com <javascript:> a écrit :

I thought I'd run this by the group to see if anyone had any input.

We are looking to improve the search capabilities of our web app, which
is a large set of forms with a large schema containing many relationships.
We currently use hibernate so it's database agnostic, and just used db
queries. But now we'd like use start using some synonyms, and soundex
searches for names, as well as search the full text of some documents and
and a few database columns containing large bodies of text for reports.

I'm pretty sure that these requirements can be done with the database
capabilities, but obviously it's be a different implementation per DB.
Also, hibernate doesn't seem to support these advanced features, so we'd be
going back to native queries, with multiple implementations for those as
well.

I've done a lot of reading on elasticsearch, and am just about to start
a proof of concept to see how hard it'll end up being. The real question
is, with so many interconnected entities, does a search index really make
sense? We'd have to do a lot of housekeeping to keep the indexes up to
date. The biggest example of the problem is addresses. An address can be
linked to about of 40 different entities. So if we wanted to search on a
name for example, and find all of their previous addresses, we'd have to
store the addresses in every name document. Then if an address changes,
we'd have to go find all names that had that address, and update it. As
well as 40 other entities.

So.. as you can see and probably already know, many to many
relationships, where one doesn't own the other seems pretty hard to search
across. If you have any thoughts or experience that you could share, I'd
appreciate it greatly. Alternatives as well.

Steve

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

Thanks for your thoughts. Yes, I've read that article, and it seems to me
that your approach is probably the only way to handle it, mimicking foreign
key relationships.But I'm still holding out hope.

Steve

--