Trouble:
Guid is not one of the supported data
types: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
As a result guid by default is indexed as string where dashes '-' are used
to break the long string into terms.
Search and aggregations will produce less than desired results unless the
Id field is mapped as not analysed.
Alternative is to strip off the dashes from the document, but such approach
will require custom serialization/de-serialization for the Guid data-type
in .Net using popular json parser http://james.newtonking.com/json.
The challenge is that Guid is widely used in our application and field name
is not always 'Id' and mapping every guid has proven quite time consuming
and in case of a missed mapping the data needs to be re-indexed.
Request/Question:
Is there a less effort consuming approach?
Are there plans to support guid as core type?
Brian is right, if you map your field as not_analyzed, then you can do
exact case-sensitive matches on it, as well as term facets/aggregations and
sorts. This applies to fields like IDs, Guids, or anything you can think of
that you don't want tokenized:
The slight trouble with that approach is an ongoing mapping changes every
time we add another Id (guid field) to the json document and it is quite
often on the move.
One of the reasons I fell in love with Elasticsearch was it somewhat
schema-less approach.
The situation with Guid breaks this idealistic model, requiring to add
not-analysed mapping for every new field of type guid and if it has been
missed - reindex the type after deleting and adding mapping.
That's why I was wondering whether there are plans to add native support
for guid.
On Thursday, April 3, 2014 5:26:00 PM UTC-4, Binh Ly wrote:
Brian is right, if you map your field as not_analyzed, then you can do
exact case-sensitive matches on it, as well as term facets/aggregations and
sorts. This applies to fields like IDs, Guids, or anything you can think of
that you don't want tokenized:
Even if ES supported a specific GUID field type, then it could still fail
to be auto-detected.
I would think that you could detect when your data added a new guid field
much more reliably than ES could auto-detect it.
Note that you can easily update the mappings of an existing index in
non-breaking ways, and one of these valid ways is to add a field that
didn't exist before.
I, too, liked ES's schema-less approach which made it easier to dive
directly into and learn. But as time went on, I have finally locked down ES
to never automatically create an index, and to never automatically map a
field that doesn't already have an existing mapping. Combined with the cool
ability to add mappings for new fields to an existing index, these make it
easy to reliably catch new unexpected fields and then add the mappings for
them without the chance of ES dynamically creating an incompatible mapping.
Note that the auto-detection issue is the same whether ES supports a "guid"
field type or whether you need to be a little more wordy and specify a
"string" type that is indexed but not_analyzed. If you make ES guess, it
can still guess wrong and define the new fields as "string" but with the
standard analyzer.
Even if ES supported a specific GUID field type, then it could still fail
to be auto-detected.
I would think that you could detect when your data added a new guid field
much more reliably than ES could auto-detect it.
Note that you can easily update the mappings of an existing index in
non-breaking ways, and one of these valid ways is to add a field that
didn't exist before.
I, too, liked ES's schema-less approach which made it easier to dive
directly into and learn. But as time went on, I have finally locked down ES
to never automatically create an index, and to never automatically map a
field that doesn't already have an existing mapping. Combined with the cool
ability to add mappings for new fields to an existing index, these make it
easy to reliably catch new unexpected fields and then add the mappings for
them without the chance of ES dynamically creating an incompatible mapping.
Note that the auto-detection issue is the same whether ES supports a
"guid" field type or whether you need to be a little more wordy and specify
a "string" type that is indexed but not_analyzed. If you make ES guess, it
can still guess wrong and define the new fields as "string" but with the
standard analyzer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.