Hive to elasticsearch Parsing exception

Hello Everyone,

I'm loading data from a a hive table (0.13) in to elasticsearch (1.4.4).
With the auto create index option turned on , I don't face any problems and
I can see all the data in ES.

However, I get the following error when i create the index manually.

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
unrecoverable error [Bad Request(400) - [MapperParsingException[failed to
parse]; nested: NumberFormatException[For input string: "NULL"]; ]];
Bailing out..
at
org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
at
org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
at
org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
at
org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:152)
at
org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:146)
at
org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:621)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:262)

To create the index manually, I've used the same mappings from the first
auto create step and changed one field to geo point type.
Changing the field type is the only change I made.

The column that I wanted to be geo fields had a few nulls, so i selected
rows without nulls and still have the same error.

Is there any way to identify which column is causing the issue ? There's
about 70 columns in my table.

Tl;dr
Hive table to elasticsearch
Auto create index works fine
Fails when I manually created index with almost same mapping (except one
field changed from string to geopoint)

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO9TxdO22hy2%3Dcz1S_DJgvtd0rsw%2Bu0WL8SqLFR8GTbbGJr9EQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Likely the issue is caused by the fact that in your manual mapping, the
"NULL" value is not actually mapped to null but actually to a string value.
You should be able to get around it by converting "NULL" to a proper NULL
value which es-hadoop can recognized; additionally you can 'translate' it
to a default one.

As for understanding what field caused the exception, unfortunately
Elasticsearch doesn't provide enough information about this yet but it
should. Can you please raise a quick issue on es-hadoop about this?

Thanks,

On Thu, Mar 12, 2015 at 10:12 PM, P lva ruvikal@gmail.com wrote:

Hello Everyone,

I'm loading data from a a hive table (0.13) in to elasticsearch (1.4.4).
With the auto create index option turned on , I don't face any problems
and I can see all the data in ES.

However, I get the following error when i create the index manually.

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
unrecoverable error [Bad Request(400) - [MapperParsingException[failed to
parse]; nested: NumberFormatException[For input string: "NULL"]; ]];
Bailing out..
at
org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
at
org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
at
org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
at
org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:152)
at
org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:146)
at
org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:621)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:262)

To create the index manually, I've used the same mappings from the first
auto create step and changed one field to geo point type.
Changing the field type is the only change I made.

The column that I wanted to be geo fields had a few nulls, so i selected
rows without nulls and still have the same error.

Is there any way to identify which column is causing the issue ? There's
about 70 columns in my table.

Tl;dr
Hive table to elasticsearch
Auto create index works fine
Fails when I manually created index with almost same mapping (except one
field changed from string to geopoint)

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9TxdO22hy2%3Dcz1S_DJgvtd0rsw%2Bu0WL8SqLFR8GTbbGJr9EQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9TxdO22hy2%3Dcz1S_DJgvtd0rsw%2Bu0WL8SqLFR8GTbbGJr9EQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmderE6q3w0mJytbmfKkYHyegs7zwi9x5wtOe9G_MWKEyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ignoring both null values and "null strings" worked.

Will open a issue about this.

Thanks a lot Costin.

On Fri, Mar 13, 2015 at 12:08 AM, Costin Leau costin.leau@gmail.com wrote:

Likely the issue is caused by the fact that in your manual mapping, the
"NULL" value is not actually mapped to null but actually to a string value.
You should be able to get around it by converting "NULL" to a proper NULL
value which es-hadoop can recognized; additionally you can 'translate' it
to a default one.

As for understanding what field caused the exception, unfortunately
Elasticsearch doesn't provide enough information about this yet but it
should. Can you please raise a quick issue on es-hadoop about this?

Thanks,

On Thu, Mar 12, 2015 at 10:12 PM, P lva ruvikal@gmail.com wrote:

Hello Everyone,

I'm loading data from a a hive table (0.13) in to elasticsearch (1.4.4).
With the auto create index option turned on , I don't face any problems
and I can see all the data in ES.

However, I get the following error when i create the index manually.

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
unrecoverable error [Bad Request(400) - [MapperParsingException[failed to
parse]; nested: NumberFormatException[For input string: "NULL"]; ]];
Bailing out..
at
org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
at
org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
at
org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
at
org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:152)
at
org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:146)
at
org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:621)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:262)

To create the index manually, I've used the same mappings from the first
auto create step and changed one field to geo point type.
Changing the field type is the only change I made.

The column that I wanted to be geo fields had a few nulls, so i selected
rows without nulls and still have the same error.

Is there any way to identify which column is causing the issue ? There's
about 70 columns in my table.

Tl;dr
Hive table to elasticsearch
Auto create index works fine
Fails when I manually created index with almost same mapping (except one
field changed from string to geopoint)

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9TxdO22hy2%3Dcz1S_DJgvtd0rsw%2Bu0WL8SqLFR8GTbbGJr9EQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9TxdO22hy2%3Dcz1S_DJgvtd0rsw%2Bu0WL8SqLFR8GTbbGJr9EQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJogdmderE6q3w0mJytbmfKkYHyegs7zwi9x5wtOe9G_MWKEyw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAJogdmderE6q3w0mJytbmfKkYHyegs7zwi9x5wtOe9G_MWKEyw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO9TxdOwO41e3NXm-pmohyGY8TjSF-RnB4kc1S%2B7U3Hm3cZkuQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.