Elasticsearch-Hadoop: EsOutputFormat and the 'date' type


(Telax) #1

Hello,

I'm interested in using the EsOutputFormat class in a hadoop mapreduce
task.
During experimentation I have noticed that there is no direct handling for
'date' objects.
My data contains a number of 'date' fields which must be transposed into
the Elasticsearch index, however, I am currently unable to successfully
transpose those fields which should be of type 'date' as instead they are
simple submitted into the index as 'string' type.
Using templates, I have tried to define a dynamic_date_formats as well as
explicitly specifying a date type and format mapping for a matched field in
a dynamic template which matches against the name of those fields which
should be 'date' types.
In either case, data fields indexed into my Elasticsearch cluster which
should be recognized as 'date' types are only set as strings .

Here is an example template similar to that with which I have been
experimenting.

{
"template" : "index-name-",
"mappings" : {
"default" : {
"dynamic_date_formats" : ["yyyy-MM-dd hh:mm"]
"dynamic_templates" : [
{ "date_field_template": {
"match": "date_
",
"mapping": {
"type": "date",
"format" : ""yyyy-MM-dd hh:mm""
}
}
}
}

Any help on this issue would be greatly appreciated.
Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/10ff4422-ccdb-4fc0-8ccb-34b4b5e5180a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #2

Make sure the template does match. This might not be always obvious however it's easy to test out. First, check your
template and after defining the template, send a request with a sample payload to see whether the doc gets properly
created. A common mistake is defining the template after the index is created which makes it useless; the template gets
applied when a the index is created (and thus it becomes part of its mapping).
Second, if the mapping appears correct, double-check your es-hadoop configuration and potentially turn on logging to see
the payload sent by es-hadoop to elasticsearch.

Hope this helps,

On 7/1/14 11:09 PM, Telax wrote:

Hello,

I'm interested in using the EsOutputFormat class in a hadoop mapreduce task.
During experimentation I have noticed that there is no direct handling for 'date' objects.
My data contains a number of 'date' fields which must be transposed into the Elasticsearch index, however, I am
currently unable to successfully transpose those fields which should be of type 'date' as instead they are simple
submitted into the index as 'string' type.
Using templates, I have tried to define a dynamic_date_formats as well as explicitly specifying a date type and format
mapping for a matched field in a dynamic template which matches against the name of those fields which should be 'date'
types.
In either case, data fields indexed into my Elasticsearch cluster which should be recognized as 'date' types are only
set as strings .

Here is an example template similar to that with which I have been experimenting.
{
"template" : "index-name-",
"mappings" : {
"default" : {
"dynamic_date_formats" : ["yyyy-MM-dd hh:mm"]
"dynamic_templates" : [
{ "date_field_template": {
"match": "date_
",
"mapping": {
"type": "date",
"format" : ""yyyy-MM-dd hh:mm""
}
}
}
}

Any help on this issue would be greatly appreciated.
Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/10ff4422-ccdb-4fc0-8ccb-34b4b5e5180a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/10ff4422-ccdb-4fc0-8ccb-34b4b5e5180a%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53BF0B05.6060205%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Telax) #3

Hi Costin, thank you for your reply.

My issue actually came down to the ordering of my matches. I had a
'match:*' as the first dynamic template which disabled norms. Although this
template didn't explicitly define a type for any matched field it would
automatically set the 'date' field to a string type. The "date_" template
would then match but fail to set the type to date as it had already been
defined. Simply reordering my dynamic templates so that the date matcher
came before the catch all solved the issue :slight_smile:
On 10 Jul 2014 22:52, "Costin Leau" costin.leau@gmail.com wrote:

Make sure the template does match. This might not be always obvious
however it's easy to test out. First, check your template and after
defining the template, send a request with a sample payload to see whether
the doc gets properly created. A common mistake is defining the template
after the index is created which makes it useless; the template gets
applied when a the index is created (and thus it becomes part of its
mapping).
Second, if the mapping appears correct, double-check your es-hadoop
configuration and potentially turn on logging to see the payload sent by
es-hadoop to elasticsearch.

Hope this helps,

On 7/1/14 11:09 PM, Telax wrote:

Hello,

I'm interested in using the EsOutputFormat class in a hadoop mapreduce
task.
During experimentation I have noticed that there is no direct handling
for 'date' objects.
My data contains a number of 'date' fields which must be transposed into
the Elasticsearch index, however, I am
currently unable to successfully transpose those fields which should be
of type 'date' as instead they are simple
submitted into the index as 'string' type.
Using templates, I have tried to define a dynamic_date_formats as well as
explicitly specifying a date type and format
mapping for a matched field in a dynamic template which matches against
the name of those fields which should be 'date'
types.
In either case, data fields indexed into my Elasticsearch cluster which
should be recognized as 'date' types are only
set as strings .

Here is an example template similar to that with which I have been
experimenting.
{
"template" : "index-name-",
"mappings" : {
"default" : {
"dynamic_date_formats" : ["yyyy-MM-dd hh:mm"]
"dynamic_templates" : [
{ "date_field_template": {
"match": "date_
",
"mapping": {
"type": "date",
"format" : ""yyyy-MM-dd hh:mm""
}
}
}
}

Any help on this issue would be greatly appreciated.
Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to
elasticsearch+unsubscribe@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/10ff4422-
ccdb-4fc0-8ccb-34b4b5e5180a%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/10ff4422-
ccdb-4fc0-8ccb-34b4b5e5180a%40googlegroups.com?utm_medium=
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/WPT086_Q1ZI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/53BF0B05.6060205%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO4iR%2BHyAtqNF%2B0mGsz6-W9JkED7E6mhaeW4wd1_v%2BpP6%3DUA-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4