I'd like not to use analysis across my schema to save a bit of CPU (I know
the penalty this inflicts on searching). Right now I set "index":
"not_analyzed" per field but this is cumbersome.
It feels like you're almost defeating the whole purpose of using
Elasticsearch with this approach! Is it really that much of a problem?
On 19 December 2014 at 08:15, Eran Duchan pavius@gmail.com wrote:
I'd like not to use analysis across my schema to save a bit of CPU (I know
the penalty this inflicts on searching). Right now I set "index":
"not_analyzed" per field but this is cumbersome.
We use ElasticSearch to index our structured analytics data. We chose it
for a few reasons:
All fields are indexed so we can search by any field or combination
of fields, including nested fields
Flexible and built in geospatial searches
Can scale with our data, which grows at ~100M documents a day
It's pretty much a generic datastore (though not the source of truth).
While we do have quite a few string fields in our data, these are mostly
enumeration values ("connected", "not connected") and in preliminary tests
we've found that disabling analysis (per field) shows savings of ~5% CPU.
Not a huge amount but every bit helps.
On Fri, Dec 19, 2014 at 8:15 AM, Eran Duchan pavius@gmail.com wrote:
I'd like not to use analysis across my schema to save a bit of CPU (I know
the penalty this inflicts on searching). Right now I set "index":
"not_analyzed" per field but this is cumbersome.
Ok that makes a bit more sense, but it seems the amount of CPU you will
save isn't worth the effort.
You could create an index template that matches fields with pattern "*" and
sets index: not_analyzed, that'd be easiest.
On 19 December 2014 at 10:12, Eran Duchan pavius@gmail.com wrote:
We use Elasticsearch to index our structured analytics data. We chose it
for a few reasons:
All fields are indexed so we can search by any field or combination
of fields, including nested fields
Flexible and built in geospatial searches
Can scale with our data, which grows at ~100M documents a day
It's pretty much a generic datastore (though not the source of truth).
While we do have quite a few string fields in our data, these are mostly
enumeration values ("connected", "not connected") and in preliminary tests
we've found that disabling analysis (per field) shows savings of ~5% CPU.
Not a huge amount but every bit helps.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.