Add core type - guid


(Vladimir Khazin) #1

Sample data:

"_source": {
"Id": "ca23459f-cc96-46cb-8ae8-509368467670",
"Title": "TPTest Scaling 10:3"
}

Trouble:
Guid is not one of the supported data
types: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
As a result guid by default is indexed as string where dashes '-' are used
to break the long string into terms.
Search and aggregations will produce less than desired results unless the
Id field is mapped as not analysed.
Alternative is to strip off the dashes from the document, but such approach
will require custom serialization/de-serialization for the Guid data-type
in .Net using popular json parser http://james.newtonking.com/json.
The challenge is that Guid is widely used in our application and field name
is not always 'Id' and mapping every guid has proven quite time consuming
and in case of a missed mapping the data needs to be re-indexed.

Request/Question:
Is there a less effort consuming approach?
Are there plans to support guid as core type?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bee7866e-2c0a-4ae1-9b48-0b17da2931ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #2

Perhaps, index the Id field but do not analyze it? Then it will be indexed
and queried intact as-is.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8367d3a3-1469-4bd9-8fbb-49b86bad23f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #3

Brian is right, if you map your field as not_analyzed, then you can do
exact case-sensitive matches on it, as well as term facets/aggregations and
sorts. This applies to fields like IDs, Guids, or anything you can think of
that you don't want tokenized:

{
"mappings": {
"doc": {
"properties": {
"Id": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/edbe39c7-9983-40ff-afa2-c7f61319dc6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Vladimir Khazin) #4

Thank you Brian and Binh for your comments!

The slight trouble with that approach is an ongoing mapping changes every
time we add another Id (guid field) to the json document and it is quite
often on the move.
One of the reasons I fell in love with ElasticSearch was it somewhat
schema-less approach.
The situation with Guid breaks this idealistic model, requiring to add
not-analysed mapping for every new field of type guid and if it has been
missed - reindex the type after deleting and adding mapping.

That's why I was wondering whether there are plans to add native support
for guid.

On Thursday, April 3, 2014 5:26:00 PM UTC-4, Binh Ly wrote:

Brian is right, if you map your field as not_analyzed, then you can do
exact case-sensitive matches on it, as well as term facets/aggregations and
sorts. This applies to fields like IDs, Guids, or anything you can think of
that you don't want tokenized:

{
"mappings": {
"doc": {
"properties": {
"Id": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15d0435b-f11c-47de-afc5-19fe414fac0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #5

Vladimir,

Even if ES supported a specific GUID field type, then it could still fail
to be auto-detected.

I would think that you could detect when your data added a new guid field
much more reliably than ES could auto-detect it.

Note that you can easily update the mappings of an existing index in
non-breaking ways, and one of these valid ways is to add a field that
didn't exist before.

I, too, liked ES's schema-less approach which made it easier to dive
directly into and learn. But as time went on, I have finally locked down ES
to never automatically create an index, and to never automatically map a
field that doesn't already have an existing mapping. Combined with the cool
ability to add mappings for new fields to an existing index, these make it
easy to reliably catch new unexpected fields and then add the mappings for
them without the chance of ES dynamically creating an incompatible mapping.

Note that the auto-detection issue is the same whether ES supports a "guid"
field type or whether you need to be a little more wordy and specify a
"string" type that is indexed but not_analyzed. If you make ES guess, it
can still guess wrong and define the new fields as "string" but with the
standard analyzer.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff1cae6b-056b-49bd-9054-ea7ea954eabd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Vladimir Khazin) #6

Hard to argue with your comments and experience.

Thank you for the feedback!

On Fri, Apr 4, 2014 at 4:59 PM, InquiringMind brian.from.fl@gmail.comwrote:

Vladimir,

Even if ES supported a specific GUID field type, then it could still fail
to be auto-detected.

I would think that you could detect when your data added a new guid field
much more reliably than ES could auto-detect it.

Note that you can easily update the mappings of an existing index in
non-breaking ways, and one of these valid ways is to add a field that
didn't exist before.

I, too, liked ES's schema-less approach which made it easier to dive
directly into and learn. But as time went on, I have finally locked down ES
to never automatically create an index, and to never automatically map a
field that doesn't already have an existing mapping. Combined with the cool
ability to add mappings for new fields to an existing index, these make it
easy to reliably catch new unexpected fields and then add the mappings for
them without the chance of ES dynamically creating an incompatible mapping.

Note that the auto-detection issue is the same whether ES supports a
"guid" field type or whether you need to be a little more wordy and specify
a "string" type that is indexed but not_analyzed. If you make ES guess, it
can still guess wrong and define the new fields as "string" but with the
standard analyzer.

Brian

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cbtYLj5B8eM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ff1cae6b-056b-49bd-9054-ea7ea954eabd%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ff1cae6b-056b-49bd-9054-ea7ea954eabd%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Sincerely yours,
Vlad Khazin
Email: vlad.khazin@icssolutions.ca
Skype: vladimir.khazin
Cell: 416-802-2771
Fax: 866-425-2660
http://www.linkedin.com/in/vkhazin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMnv9kbFXS_ZDKYOFf24b69prmO-SbwNP3h_aaHLx4_sPVQnJQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7