Hierarchical Documents

Hi,

I am developing an application where I need to store, search, and
manipulate complex hierarchical documents. A sample document can be seen
here:

This document describes the schema a relational database and has an
attribute called "tables" which is a list of dictionaries describing the
tables in the database. Each dictionary has an attribute called "column"
which is a list of dictionaries describing the columns in that table.

I want to use ES as the backend to store this document and present
different views for it.

An example of view would be a grid displaying the 'name' and 'numRows'
fields for all tables.
Another example of view would be grid displaying the 'name', 'type' and
'remarks' fields for all the columns in the document, segmented by the
table they belong to.

If I store this information as a single monolithic document then I lose the
ability to use ES to compute facets like how many columns of type 'X' exist.
This approach also means that each view will have to pre-process the
retrieved document before displaying it because the hierarchical
information has to be flattened out.

Alternatively, I can split the information into various document types, eg:
column type, table type, etc, which contain information in manner that maps
naturally to the views.
The drawback of this approach is that a particular value could be stored in
more than one document and an update will necessarily mean updating
multiple documents to keep the information synchronized.

I'm hoping someone from the community can provide some guidance with tips
and best practices for dealing with this type of use case. If my use case
is too much of an abuse of ES, then perhaps somebody can suggest
alternative architectures, like using CouchDb as the backing store and
integrating with ES via River for search.

Thanks,
Shaq

You will have to experiment with it. Check parent child relationships,
also, you might need to denormalize your data.

On Wed, Jun 13, 2012 at 10:47 PM, Salman Haq salman.haq@gmail.com wrote:

Hi,

I am developing an application where I need to store, search, and
manipulate complex hierarchical documents. A sample document can be seen
here:

A complex hierarchical JSON document · GitHub

This document describes the schema a relational database and has an
attribute called "tables" which is a list of dictionaries describing the
tables in the database. Each dictionary has an attribute called "column"
which is a list of dictionaries describing the columns in that table.

I want to use ES as the backend to store this document and present
different views for it.

An example of view would be a grid displaying the 'name' and 'numRows'
fields for all tables.
Another example of view would be grid displaying the 'name', 'type' and
'remarks' fields for all the columns in the document, segmented by the
table they belong to.

If I store this information as a single monolithic document then I lose
the ability to use ES to compute facets like how many columns of type 'X'
exist.
This approach also means that each view will have to pre-process the
retrieved document before displaying it because the hierarchical
information has to be flattened out.

Alternatively, I can split the information into various document types,
eg: column type, table type, etc, which contain information in manner that
maps naturally to the views.
The drawback of this approach is that a particular value could be stored
in more than one document and an update will necessarily mean updating
multiple documents to keep the information synchronized.

I'm hoping someone from the community can provide some guidance with tips
and best practices for dealing with this type of use case. If my use case
is too much of an abuse of ES, then perhaps somebody can suggest
alternative architectures, like using CouchDb as the backing store and
integrating with ES via River for search.

Thanks,
Shaq