Would it be a bad idea to define a universal type containing thousands,
perhaps tens of thousands, of fields within a single elasticsearch Mapping?
The documents I store for a given index would use a subset of the fields in
the mapping but for management reasons it would be convenient to have a
universal mapping type that I could use for all indexes.
I have never explored the impact on performance with thousands of
types (not even hundreds), but you can override the default behavior
of types or use dynamic templates to define a universal type:
It depends of you want to override the defaults based on type (string,
boolean, etc...) or field name. I believe I have read at some point
that too many types does have an impact on performance.
Would it be a bad idea to define a universal type containing thousands,
perhaps tens of thousands, of fields within a single elasticsearch Mapping?
The documents I store for a given index would use a subset of the fields in
the mapping but for management reasons it would be convenient to have a
universal mapping type that I could use for all indexes.
Each field comes with an overhead of memory usage (on the Lucene level),
theoretically its possible, but you will need to check the memory (sadly,
the memory used by Lucene is not exposed to be reported by ES).
Would it be a bad idea to define a universal type containing thousands,
perhaps tens of thousands, of fields within a single elasticsearch Mapping?
The documents I store for a given index would use a subset of the fields
in the mapping but for management reasons it would be convenient to have a
universal mapping type that I could use for all indexes.
On Wednesday, May 9, 2012 1:55:32 AM UTC-7, kimchy wrote:
Each field comes with an overhead of memory usage (on the Lucene level),
theoretically its possible, but you will need to check the memory (sadly,
the memory used by Lucene is not exposed to be reported by ES).
How many fields is excessive? Currently I'm storing syslog and apache
logs, but I'm considering giving the development teams to log anything in
json format, which would let them create any fields they please.
Is 100 fields too many? Or 1000? Any guidance would be appreciated.
Nice to meet you again Although you got some of my opinion on the
subject on the Logstash ML, here's a second shot:
On Mon, Feb 11, 2013 at 7:22 PM, Bruce Lysik blysik@yahoo.com wrote:
On Wednesday, May 9, 2012 1:55:32 AM UTC-7, kimchy wrote:
Each field comes with an overhead of memory usage (on the Lucene level),
theoretically its possible, but you will need to check the memory (sadly,
the memory used by Lucene is not exposed to be reported by ES).
How many fields is excessive? Currently I'm storing syslog and apache
logs, but I'm considering giving the development teams to log anything in
json format, which would let them create any fields they please.
In production you probably want to have control over those fields, instead
of relying on ES to detect field types for you. One reason for that is if
the first log in an index accidentally contains an integer in a field that
would normally be string, you'll get indexing errors for pretty much all
the other logs that day.
How many fields do you expect to get in total?
Is 100 fields too many? Or 1000? Any guidance would be appreciated.
Maybe someone else can pop in with some benchmarks, but the sound of 100
and 1000 fields doesn't seem too many to me.
Of course, the definition of excessive will depend on quite some factors,
like:
how much memory you have
how many mapping types you have per index (you'll have to add them up to
get the total number of fields per Lucene index)
what search performance you're expecting (which in turn depends on how
much data you have in an index, how many indices, shards...)
So if you want to make sure, you can run a performance test with your
worst-case scenario and see if all goes well. I'd also recommend monitoring
your cluster during the test, to see what limits you're approaching. There
are quite a lot of nice tools out there for monitoring ES, one of which is
ours:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.