a total beginner question:
i have documents which data differ just in some fields according to a
category the docs are assigned to.
now i am thinking of creating types for each category which holds then of
course mostly redundant but also category-related different data.
depending on 2 other criteria i would need to spread this bunch of types
over almost 10 indices.
i would only search docs against one index and and my intended
"category-type".
is this the right approach if possible to say anything about this in
general?
would it be better to organize nested data or something?
are there any common limits according number of indices / types?
On Sunday, January 20, 2013 11:43:16 AM UTC-5, Tom wrote:
Hi,
a total beginner question:
i have documents which data differ just in some fields according to a
category the docs are assigned to.
now i am thinking of creating types for each category which holds then of
course mostly redundant but also category-related different data.
depending on 2 other criteria i would need to spread this bunch of types
over almost 10 indices.
i would only search docs against one index and and my intended
"category-type".
is this the right approach if possible to say anything about this in
general?
would it be better to organize nested data or something?
are there any common limits according number of indices / types?
thanks for hints but what i failed to describe above: it is a one to many
relation between doc and categories, so a doc would need mutliple
"parents"...
Am Montag, 21. Januar 2013 04:17:08 UTC+1 schrieb Otis Gospodnetic:
On Sunday, January 20, 2013 11:43:16 AM UTC-5, Tom wrote:
Hi,
a total beginner question:
i have documents which data differ just in some fields according to a
category the docs are assigned to.
now i am thinking of creating types for each category which holds then of
course mostly redundant but also category-related different data.
depending on 2 other criteria i would need to spread this bunch of types
over almost 10 indices.
i would only search docs against one index and and my intended
"category-type".
is this the right approach if possible to say anything about this in
general?
would it be better to organize nested data or something?
are there any common limits according number of indices / types?
thanks for hints but what i failed to describe above: it is a one to
many relation between doc and categories, so a doc would need mutliple
"parents"...
Then maybe you should think of the categories as "tags", any one
document can have a field called "tags" with 0, 1 or many values.
On your problem of various types with more or fewer optional fields: ES
is document oriented and can handle documents that have all kind of
variations of extra and missing fields. There is no overhead when a
document doesn't have some of fields.
Why 10 indicies? Are you sure?
ES supports different types (like different tables in a RDB) all in one
index. You can even search all types at the same time using variations
like sharing names for equivalent fields and searching an "_all" field
when appropriate.
i will try to clarify it a bit with being more concrete:
Preconditions:
I need to find documents mainly organized by 3 criteras:
-> contexts
-> categories
-> languages
These ones are all in n:n relations with docs in RDBMS
Searching will run always against a strict combination of:
1 context +1 language
While searching against context + language i need to retrieve:
all docs belonging to specific category and retrieving category specific
docs data
(simple example: a field holding category specific sorting value)
category specific facets to generate filters from
...
Cause of that i thaught about this structure in Elasticsearch:
Build indices over context and language
=> i always search against this combination
Build a doc type for each category in given index
=> i can retrieve all docs for a specific category and indexing docs per
category with (of course a lot redundant and so actually sharable data but
even with) category specific data (simple example again: a categorysorting
field)
To blow up this scenario imagine:
10 contexts
5 languages
50 categories
5000 docs
This approach would result in:
50 indices
each holding 50 category types holding max 5000 docs in each category
= max 5.000 docs per category type, max 250.000 docs per context-lang index
(with a mighty bunch of redundancy)
Would this be generally possible that way (peformance and so on)?
Any suggests on building a better structure are appreciated.
Am Montag, 21. Januar 2013 10:24:41 UTC+1 schrieb Tom:
Hi Paul, hi guys,
i will try to clarify it a bit with being more concrete:
Preconditions:
I need to find documents mainly organized by 3 criteras:
-> contexts
-> categories
-> languages
These ones are all in n:n relations with docs in RDBMS
Searching will run always against a strict combination of:
1 context +1 language
While searching against context + language i need to retrieve:
all docs belonging to specific category and retrieving category specific
docs data
(simple example: a field holding category specific sorting value)
category specific facets to generate filters from
...
Cause of that i thaught about this structure in Elasticsearch:
Build indices over context and language
=> i always search against this combination
Build a doc type for each category in given index
=> i can retrieve all docs for a specific category and indexing docs per
category with (of course a lot redundant and so actually sharable data but
even with) category specific data (simple example again: a categorysorting
field)
To blow up this scenario imagine:
10 contexts
5 languages
50 categories
5000 docs
This approach would result in:
50 indices
each holding 50 category types holding max 5000 docs in each category
= max 5.000 docs per category type, max 250.000 docs per context-lang
index (with a mighty bunch of redundancy)
Would this be generally possible that way (peformance and so on)?
Any suggests on building a better structure are appreciated.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.