Thousands of fields in a mapping type?

Would it be a bad idea to define a universal type containing thousands,
perhaps tens of thousands, of fields within a single elasticsearch Mapping?

The documents I store for a given index would use a subset of the fields in
the mapping but for management reasons it would be convenient to have a
universal mapping type that I could use for all indexes.

I have never explored the impact on performance with thousands of
types (not even hundreds), but you can override the default behavior
of types or use dynamic templates to define a universal type:

http://www.elasticsearch.org/guide/reference/index-modules/mapper.html
http://www.elasticsearch.org/guide/reference/mapping/root-object-type.html

It depends of you want to override the defaults based on type (string,
boolean, etc...) or field name. I believe I have read at some point
that too many types does have an impact on performance.

--
Ivan

On Sun, May 6, 2012 at 11:53 AM, aaron atdixon@gmail.com wrote:

Would it be a bad idea to define a universal type containing thousands,
perhaps tens of thousands, of fields within a single elasticsearch Mapping?

The documents I store for a given index would use a subset of the fields in
the mapping but for management reasons it would be convenient to have a
universal mapping type that I could use for all indexes.

Each field comes with an overhead of memory usage (on the Lucene level),
theoretically its possible, but you will need to check the memory (sadly,
the memory used by Lucene is not exposed to be reported by ES).

On Sun, May 6, 2012 at 9:53 PM, aaron atdixon@gmail.com wrote:

Would it be a bad idea to define a universal type containing thousands,
perhaps tens of thousands, of fields within a single elasticsearch Mapping?

The documents I store for a given index would use a subset of the fields
in the mapping but for management reasons it would be convenient to have a
universal mapping type that I could use for all indexes.

On Wednesday, May 9, 2012 1:55:32 AM UTC-7, kimchy wrote:

Each field comes with an overhead of memory usage (on the Lucene level),
theoretically its possible, but you will need to check the memory (sadly,
the memory used by Lucene is not exposed to be reported by ES).

How many fields is excessive? Currently I'm storing syslog and apache
logs, but I'm considering giving the development teams to log anything in
json format, which would let them create any fields they please.

Is 100 fields too many? Or 1000? Any guidance would be appreciated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Bruce,

Nice to meet you again :slight_smile: Although you got some of my opinion on the
subject on the Logstash ML, here's a second shot:

On Mon, Feb 11, 2013 at 7:22 PM, Bruce Lysik blysik@yahoo.com wrote:

On Wednesday, May 9, 2012 1:55:32 AM UTC-7, kimchy wrote:

Each field comes with an overhead of memory usage (on the Lucene level),
theoretically its possible, but you will need to check the memory (sadly,
the memory used by Lucene is not exposed to be reported by ES).

How many fields is excessive? Currently I'm storing syslog and apache
logs, but I'm considering giving the development teams to log anything in
json format, which would let them create any fields they please.

In production you probably want to have control over those fields, instead
of relying on ES to detect field types for you. One reason for that is if
the first log in an index accidentally contains an integer in a field that
would normally be string, you'll get indexing errors for pretty much all
the other logs that day.

How many fields do you expect to get in total?

Is 100 fields too many? Or 1000? Any guidance would be appreciated.

Maybe someone else can pop in with some benchmarks, but the sound of 100
and 1000 fields doesn't seem too many to me.

Of course, the definition of excessive will depend on quite some factors,
like:

  • how much memory you have
  • how many mapping types you have per index (you'll have to add them up to
    get the total number of fields per Lucene index)
  • what search performance you're expecting (which in turn depends on how
    much data you have in an index, how many indices, shards...)

So if you want to make sure, you can run a performance test with your
worst-case scenario and see if all goes well. I'd also recommend monitoring
your cluster during the test, to see what limits you're approaching. There
are quite a lot of nice tools out there for monitoring ES, one of which is
ours:

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.