Reviewing ES for parent/child aggregation


(ripplekhera) #1

I am reviewing ES to replace our current setup of homegrown indexing built
with Riak + Riak Search. Our current setup is not exactly epic fail, but
definitely not giving us what we need. And of course, I am trying to build
this while a current live application already exists, so it had to be did
yesterday.

My major requirements are:

  1. Modeling a one to many relationship. Think of it as Books + Paragraph.
    Books have attributes as Author, PublishTime (year+month), Title, Amazon
    link. Paragraphs belong to a book and have words, numerical score, start
    and end position within the book.

Book (Author, Date, Title, Url, Content)
|
|--> Paragraph (Words, Score, StartPosition, EndPosition)

This is not the true model but a very close analogy.

  1. Performing computations (such average score for words) based on date,
    with the result grouped by book. So essentially, its aggregating the score
    from the paragraphs and grouping by book. Also finding, "other" words that
    are significant based on current date and word query.

  2. Scale: Currently there are millions of 'book' objects, which could grow
    to billions or more. We could have to report on 10 million book objects at
    one time, which could involve aggregating 100 million paragraph objects
    using some computation. I am hoping to use statistical facets and/or
    scripting to move this to ES instead of transporting it to the app level
    and doing aggregations using Java lists.

My main questions:

  1. Do you think ES is a good fit for search+compute? Will the statistical
    facet work for my requirements?

  2. Should I model the data as Parent/Child or Nested Documents? I need to
    search by Parent, but aggregate by Child. Based on my reading of the
    reference material and this forum, that is not possible using the
    Parent/Child schema since the only thing available is the has_child filter
    which does not allow you to do computations on the children, just filter by
    them.

  3. Will ES be able to handle this scale?

  4. Are the computations I need to perform possible using the stastical
    facets and/or scripting?

Please do get back to me quickly, so I know I am making the right design
decisions from the get go.


(system) #2