Organize categorized data, how many indices, how many types?

Hi,

a total beginner question:
i have documents which data differ just in some fields according to a
category the docs are assigned to.
now i am thinking of creating types for each category which holds then of
course mostly redundant but also category-related different data.

depending on 2 other criteria i would need to spread this bunch of types
over almost 10 indices.
i would only search docs against one index and and my intended
"category-type".

is this the right approach if possible to say anything about this in
general?
would it be better to organize nested data or something?
are there any common limits according number of indices / types?

Thx for help.
Tom

--

Hi Tom,

If you go to http://www.elasticsearch.org/ and type in "parent" in the
search field at the top, that may be what you are looking for.
Oh, you can also
try: http://search-lucene.com/?q=parent&fc_project=ElasticSearch&fc_type=web+site

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Sunday, January 20, 2013 11:43:16 AM UTC-5, Tom wrote:

Hi,

a total beginner question:
i have documents which data differ just in some fields according to a
category the docs are assigned to.
now i am thinking of creating types for each category which holds then of
course mostly redundant but also category-related different data.

depending on 2 other criteria i would need to spread this bunch of types
over almost 10 indices.
i would only search docs against one index and and my intended
"category-type".

is this the right approach if possible to say anything about this in
general?
would it be better to organize nested data or something?
are there any common limits according number of indices / types?

Thx for help.
Tom

--

Hi Otis,

thanks for hints but what i failed to describe above: it is a one to many
relation between doc and categories, so a doc would need mutliple
"parents"...

Am Montag, 21. Januar 2013 04:17:08 UTC+1 schrieb Otis Gospodnetic:

Hi Tom,

If you go to http://www.elasticsearch.org/ and type in "parent" in the
search field at the top, that may be what you are looking for.
Oh, you can also try:
http://search-lucene.com/?q=parent&fc_project=ElasticSearch&fc_type=web+site

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Sunday, January 20, 2013 11:43:16 AM UTC-5, Tom wrote:

Hi,

a total beginner question:
i have documents which data differ just in some fields according to a
category the docs are assigned to.
now i am thinking of creating types for each category which holds then of
course mostly redundant but also category-related different data.

depending on 2 other criteria i would need to spread this bunch of types
over almost 10 indices.
i would only search docs against one index and and my intended
"category-type".

is this the right approach if possible to say anything about this in
general?
would it be better to organize nested data or something?
are there any common limits according number of indices / types?

Thx for help.
Tom

--

On 1/20/2013 7:31 PM, Tom wrote:

Hi Otis,

thanks for hints but what i failed to describe above: it is a one to
many relation between doc and categories, so a doc would need mutliple
"parents"...

Then maybe you should think of the categories as "tags", any one
document can have a field called "tags" with 0, 1 or many values.

On your problem of various types with more or fewer optional fields: ES
is document oriented and can handle documents that have all kind of
variations of extra and missing fields. There is no overhead when a
document doesn't have some of fields.

Why 10 indicies? Are you sure?
ES supports different types (like different tables in a RDB) all in one
index. You can even search all types at the same time using variations
like sharing names for equivalent fields and searching an "_all" field
when appropriate.

I hope that helps,

-Paul

--

Hi Paul, hi guys,

i will try to clarify it a bit with being more concrete:

Preconditions:

  • I need to find documents mainly organized by 3 criteras:
    -> contexts
    -> categories
    -> languages

  • These ones are all in n:n relations with docs in RDBMS

  • Searching will run always against a strict combination of:
    1 context +1 language

  • While searching against context + language i need to retrieve:

  • all docs belonging to specific category and retrieving category specific
    docs data
    (simple example: a field holding category specific sorting value)

  • category specific facets to generate filters from

  • ...
    Cause of that i thaught about this structure in Elasticsearch:

  • Build indices over context and language
    => i always search against this combination

  • Build a doc type for each category in given index
    => i can retrieve all docs for a specific category and indexing docs per
    category with (of course a lot redundant and so actually sharable data but
    even with) category specific data (simple example again: a categorysorting
    field)
    To blow up this scenario imagine:

  • 10 contexts

  • 5 languages

  • 50 categories

  • 5000 docs
    This approach would result in:

  • 50 indices
    each holding 50 category types holding max 5000 docs in each category
    = max 5.000 docs per category type, max 250.000 docs per context-lang index
    (with a mighty bunch of redundancy)
    Would this be generally possible that way (peformance and so on)?
    Any suggests on building a better structure are appreciated. :slight_smile:

Thanks for ur help!!
Regards
Tom

--

no recommendations...?

Am Montag, 21. Januar 2013 10:24:41 UTC+1 schrieb Tom:

Hi Paul, hi guys,

i will try to clarify it a bit with being more concrete:

Preconditions:

  • I need to find documents mainly organized by 3 criteras:
    -> contexts
    -> categories
    -> languages

  • These ones are all in n:n relations with docs in RDBMS

  • Searching will run always against a strict combination of:
    1 context +1 language

  • While searching against context + language i need to retrieve:

  • all docs belonging to specific category and retrieving category specific
    docs data
    (simple example: a field holding category specific sorting value)

  • category specific facets to generate filters from

  • ...
    Cause of that i thaught about this structure in Elasticsearch:

  • Build indices over context and language
    => i always search against this combination

  • Build a doc type for each category in given index
    => i can retrieve all docs for a specific category and indexing docs per
    category with (of course a lot redundant and so actually sharable data but
    even with) category specific data (simple example again: a categorysorting
    field)
    To blow up this scenario imagine:

  • 10 contexts

  • 5 languages

  • 50 categories

  • 5000 docs
    This approach would result in:

  • 50 indices
    each holding 50 category types holding max 5000 docs in each category
    = max 5.000 docs per category type, max 250.000 docs per context-lang
    index (with a mighty bunch of redundancy)
    Would this be generally possible that way (peformance and so on)?
    Any suggests on building a better structure are appreciated. :slight_smile:

Thanks for ur help!!
Regards
Tom

--