Multiple Types within an Index


(Paulo Tioseco) #1

Hi All!

I'm just learning Elasticsearch and currently going through the guide, it
seems like you can create multiple types within an index.

What is the use case of having multiple types within an index?

Is it common to have multiple types within an index?

For example, if I have user and pet documents which have completely
different mappings, do I put them in the same index but in separate types?
Or do I create separate index for each?

Thanks,

Paulo

--


(David Pilato) #2

If you have a PetStore, I suggest to create an index petstore and store in it
everything relative to your petstore (user, pets, invoices, ...).
If you need to analyze your logs files of your webserver petstore, create
another index logs and store everything relative to logs (apache logs, sql logs,
es logs...).

That said, you may also want to play with scalability. So you can start year
2013 with a petstore2013 index with only 1 shard and 1 replica and if you see
that you have to hold more and more documents, create a new index petstore2014
with 10 shards (and add an alias petstore on top of them).

As you can see, design is very flexible.

HTH
David

Le 4 janvier 2013 à 19:43, Paulo Tioseco paulotioseco@gmail.com a écrit :

Hi All!

I'm just learning Elasticsearch and currently going through the guide, it
seems like you can create multiple types within an index.

What is the use case of having multiple types within an index?

Is it common to have multiple types within an index?

For example, if I have user and pet documents which have completely different
mappings, do I put them in the same index but in separate types? Or do I
create separate index for each?

Thanks,

Paulo

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(Paulo Tioseco) #3

Thanks for the reply, David.

It's much clearer now. By putting multiple types in one index, I can
do a search across all types.

So thinking in terms of relational databases and tables, indexes are
like databases and types are like tables. Except that types are
"schema-less." Is this a correct analogy?

Paulo

On Fri, Jan 4, 2013 at 10:53 AM, David Pilato david@pilato.fr wrote:

If you have a PetStore, I suggest to create an index petstore and store in
it everything relative to your petstore (user, pets, invoices, ...).
If you need to analyze your logs files of your webserver petstore, create
another index logs and store everything relative to logs (apache logs, sql
logs, es logs...).

That said, you may also want to play with scalability. So you can start year
2013 with a petstore2013 index with only 1 shard and 1 replica and if you
see that you have to hold more and more documents, create a new index
petstore2014 with 10 shards (and add an alias petstore on top of them).

As you can see, design is very flexible.

HTH
David

Le 4 janvier 2013 à 19:43, Paulo Tioseco paulotioseco@gmail.com a écrit :

Hi All!

I'm just learning Elasticsearch and currently going through the guide, it
seems like you can create multiple types within an index.

What is the use case of having multiple types within an index?

Is it common to have multiple types within an index?

For example, if I have user and pet documents which have completely
different mappings, do I put them in the same index but in separate types?
Or do I create separate index for each?

Thanks,

Paulo

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--


(David Pilato) #4

Yes. It make sense. Think type as the Top level entity when you store an object (Hibernate merge with Cascade All option for example).
BTW, with Elasticsearch you can search across all indexes...

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 4 janv. 2013 à 20:23, Paulo Tioseco paulotioseco@gmail.com a écrit :

Thanks for the reply, David.

It's much clearer now. By putting multiple types in one index, I can
do a search across all types.

So thinking in terms of relational databases and tables, indexes are
like databases and types are like tables. Except that types are
"schema-less." Is this a correct analogy?

Paulo

On Fri, Jan 4, 2013 at 10:53 AM, David Pilato david@pilato.fr wrote:

If you have a PetStore, I suggest to create an index petstore and store in
it everything relative to your petstore (user, pets, invoices, ...).
If you need to analyze your logs files of your webserver petstore, create
another index logs and store everything relative to logs (apache logs, sql
logs, es logs...).

That said, you may also want to play with scalability. So you can start year
2013 with a petstore2013 index with only 1 shard and 1 replica and if you
see that you have to hold more and more documents, create a new index
petstore2014 with 10 shards (and add an alias petstore on top of them).

As you can see, design is very flexible.

HTH
David

Le 4 janvier 2013 à 19:43, Paulo Tioseco paulotioseco@gmail.com a écrit :

Hi All!

I'm just learning Elasticsearch and currently going through the guide, it
seems like you can create multiple types within an index.

What is the use case of having multiple types within an index?

Is it common to have multiple types within an index?

For example, if I have user and pet documents which have completely
different mappings, do I put them in the same index but in separate types?
Or do I create separate index for each?

Thanks,

Paulo

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

--


(Jörg Prante) #5

Hi,

having index types with different mappings allows you to organize different
data types, data sources, or different data models in a single index.

Imagine building a library: a library has books, serials, but also CD or
DVD, or even full-text files online, all in different languages, and all
have different media identifier numbers (ISBN/EAN, ISSN, CD-Item-No/UPCs,
URLs).

The challenge is to build a single index with the media stock item number
as the primary key and with the different media identifiers in an
identifier field with a comfortable identifier search.

Instead of creating different identifier field types in each document
class, like you would probably do in a relational database world, you can
create different index types in Elasticsearch and put the identifier
numbers in a commonly named field "identifier". This is the power of
"mapping" in Elasticsearch.

As an example, let's assume three different identifier types. Books have
ISBN/EAN, CDs have UPCs, and online videos have URLs.

index/type/id = library/books/1

{
"identifier" : "978-1-933988-17-7"
}

index/type/id = library/cds/2

{
"title" : "The first CD ever",
"identifier" : "042280001124"
}

index/type/id = library/videos/3

{
"creator" : "Simon Willnauer",
"title" : "Lucene 4 Performance Tuning",
"identifier" : "http://vimeo.com/55163540"
}

Assume you have many analyzers in the Elasticsearch mapping plus other
methods to process such identifiers accordingly to their type (for example,
auto-adding tags for the ID scheme, removing hyphens from ISBNs, extracting
IDs from video URLs, looking up persistent identifiers, whatever), the
advantage should be obvious.

Later while searching, you can direct your search client to the index
"library", and all searches to the "identifier" field will be mapped
correctly to the effective mapping of the underlying identifier field.

You can even arrange searches to different objects in different index types.

And this approach scales. Adding a new media type to the library is easy.
Just add a new index type, set up new analyzers in the mapping, and you're
done.

So, a type is not just a table in the sense of a relational database, it is
more like a virtual table or a materialized view.

Another example:

Index types can also be organized by custom demand. In a scientific article
database, you want to add articles, but on different academic subjects or
field of studies. Scientists like to perform limited searches within
contributions in their field without using facets, but if you ask them more
firmly, they also want to search in all articles. Now, you could add a
custom field to your documents, or you could instrument Elasticsearch and
create index types for each field of stud. A search UI can direct the field
of study to the index type much more elegant by REST addressing than
managing an optional filter term in the query to select the requested field
of study.

Note, internally in a document, you can refer to a field named "_type",
because under the hood, Elasticsearch handles the index type like an
ordinary non-analyzed field. It is just exposed to the API.

Best regards,

Jörg

--


(Jan Fiedler) #6

Just a clarifying question after all the praise on types: How about the
Lucene internal memory structures like the infamous field cache (used
heavily for facet calculation)? Assume I have a single index with types A
and B and run faceting on a field of type A only (because type B does not
have that field at all). Doesn't this mean that the Lucene FieldCache still
has one huge array representing the facet value of all docs of the entire
index even though it only holds null values for all the documents of type
B? Assuming that there are many more instances of type B than A isn't that
quite a resource waste (in terms of RAM) ?

Coming back to the 'when to use types' question: Doesn't this mean that one
should use types within a single index for very uniform documents only
(like books, cds, dvd) ?

--


(Karel Minařík-2) #7

Assume you have many analyzers in the Elasticsearch mapping plus other
methods to process such identifiers accordingly to their type (for example,
auto-adding tags for the ID scheme, removing hyphens from ISBNs, extracting
IDs from video URLs, looking up persistent identifiers, whatever), the
advantage should be obvious.

Note, though, that you can't set eg. different analyzers for different
types in the same index (unless this behaviour changed).

Karel

--


(Karel Minařík-2) #8

Coming back to the 'when to use types' question: Doesn't this mean that
one should use types within a single index for very uniform documents only
(like books, cds, dvd) ?

Yes, that's the original idea, see:
http://www.elasticsearch.org/blog/2010/02/12/yourdatayoursearch.html

Karel

--


(phill) #9

While technically you can't set one analyzer for one type, it might be
worth summarizing what you can do.
You can set:

  • an analyzer for a field
  • a default analyzer for the index (thus all types in the index)
  • a different analyzer for a type at indexing time.

For a field, you define the analyzer in the mapping.
|analyzer |"The analyzer used to analyze the text contents when
|analyzed| during indexing and when searching using a query string.
Defaults to the globally configured analyzer."
-- http://www.elasticsearch.org/guide/reference/mapping/core-types.html

You can define the default for an index in the "index-module".
"The |default| logical name allows one to configure an analyzer that
will be used both for indexing and for searching APIs. The
|default_index| logical name can be used to configure a default analyzer
that will be used just when indexing, and the |default_search| can be
used to configure a default analyzer that will be used just when searching."
-- http://www.elasticsearch.org/guide/reference/index-modules/analysis/

You can pick an analyzer dynamically at index time with the _analyzer
meta-field.
"The |_analyzer| mapping allows to use a document field property as the
name of the analyzer that will be used to index the document. "
-- http://www.elasticsearch.org/guide/reference/mapping/analyzer-field.html

The only application I can see for this last one is as one solution for
multi-language support.

While you can't set one value for all fields, you can define an analyzer
(often no analyzer) on every field on every type to configure every
field. To help you can set the common default for all, then define
those that are different.

-Paul

On 1/5/2013 11:25 PM, Karel Minařík wrote:

Assume you have many analyzers in the Elasticsearch mapping plus
other methods to process such identifiers accordingly to their type
(for example, auto-adding tags for the ID scheme, removing hyphens
from ISBNs, extracting IDs from video URLs, looking up persistent
identifiers, whatever), the advantage should be obvious.

Note, though, that you can't set eg. different analyzers for different
types in the same index (unless this behaviour changed).

Karel

--


(powidl) #10
Later while searching, you can direct your search client to the index "library", and all searches to the "identifier" field will be mapped correctly to the effective mapping of the underlying identifier field.

Hello,

this is exactly what I need (query the identifier for all types), but I'm not able to form the correct query. Can someone give me a short example, how such a query should look like?

My best guess would be:

{
"query_string" : {
"fields" : ["books.identifier", "cds.identifier","videos.identifier"],
"query" : "123",
}
}

But if a new type is added (and in my usecase the type is dynamic, so I don't know the possible types right now - but all types will have a set of common fields), I must update the query - so this is not a good option (for me).

Thanks a lot
Roland


(Roland Pirklbauer) #11

Am Samstag, 5. Januar 2013 12:16:39 UTC+1 schrieb Jörg Prante:

Later while searching, you can direct your search client to the index
"library", and all searches to the "identifier" field will be mapped
correctly to the effective mapping of the underlying identifier field.

Hello,

this is exactly what I need (query the identifier for all types), but I'm
not able to form the correct query. Can someone give me a short example,
how such a query should look like?

My best guess would be:

{
"query_string" : {
"fields" : ["books.identifier",
"cds.identifier","videos.identifier"],
"query" : "123",
}
}

But if a new type is added (and in my usecase the type is dynamic, so I
don't know the possible types right now - but all types will have a set of
common fields), I must update the query - so this is not a good option (for
me).

Thanks a lot
Roland

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a959aa1b-0187-4e0b-81e2-6c8879adc9fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Roland Pirklbauer) #12

Am Donnerstag, 20. Februar 2014 04:37:34 UTC+1 schrieb Roland Pirklbauer:

Am Samstag, 5. Januar 2013 12:16:39 UTC+1 schrieb Jörg Prante:

Later while searching, you can direct your search client to the index
"library", and all searches to the "identifier" field will be mapped
correctly to the effective mapping of the underlying identifier field.

Hello,

this is exactly what I need (query the identifier for all types), but I'm
not able to form the correct query. Can someone give me a short example,
how such a query should look like?

My best guess would be:

{
"query_string" : {
"fields" : ["books.identifier",
"cds.identifier","videos.identifier"],
"query" : "123",
}
}

But if a new type is added (and in my usecase the type is dynamic, so I
don't know the possible types right now - but all types will have a set of
common fields), I must update the query - so this is not a good option (for
me).

Thanks a lot
Roland

The answer is quite simple. You can just ignore the type in the fields
name, so the above example should look like: {

"query" : {
"term" : { "identifier": "123" }
}

Hope this can help others with the same simple problem.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5de08a08-4b9e-476b-a11c-b30de670ffde%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #13

That say, I'm wondering if you set your id as document id, it would be more efficient to use multiget API in that case.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 mars 2014 à 06:49, Roland Pirklbauer powidl1974@gmail.com a écrit :

Am Donnerstag, 20. Februar 2014 04:37:34 UTC+1 schrieb Roland Pirklbauer:

Am Samstag, 5. Januar 2013 12:16:39 UTC+1 schrieb Jörg Prante:

Later while searching, you can direct your search client to the index "library", and all searches to the "identifier" field will be mapped correctly to the effective mapping of the underlying identifier field.
Hello,

this is exactly what I need (query the identifier for all types), but I'm not able to form the correct query. Can someone give me a short example, how such a query should look like?

My best guess would be:

{
"query_string" : {
"fields" : ["books.identifier",
"cds.identifier","videos.identifier"],
"query" : "123",
}
}

But if a new type is added (and in my usecase the type is dynamic, so I don't know the possible types right now - but all types will have a set of common fields), I must update the query - so this is not a good option (for me).

Thanks a lot
Roland

The answer is quite simple. You can just ignore the type in the fields name, so the above example should look like: {

"query" : {
"term" : { "identifier": "123" }
}

Hope this can help others with the same simple problem.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5de08a08-4b9e-476b-a11c-b30de670ffde%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/B91F5D22-A3A7-4F04-911C-C2EF23557417%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(Roland Pirklbauer) #14

Hello,

The example is a little Bit confussing. The fields in question are Not the id, they are for example the subject, body, creation time of Outlook Elements. The Type is used to distinguish between Mails, contacts and appointments - for example.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b3d18e8-e781-4c6a-a71e-1e3f7cd2fb6c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #15