Elasticsearch is saving decimal field as text


(sami) #1

I can not get it to work , I tried BIGINT, DOUBLE, DECIMAL(6,1) but elasticsearch is showing data type as "text" except for the first field "txn_id" , strange ?

+---------------------+------------+--------------------+--+
| col_name | data_type | comment |
+---------------------+------------+--------------------+--+
| txn_id | bigint | from deserializer |
| txn_process_date | timestamp | from deserializer |
| transp_id | string | from deserializer |
| ext_plaza_id | string | from deserializer |
| transp_cur_bal | bigint | from deserializer |
| ext_lane_id | string | from deserializer |
| ext_date_time | timestamp | from deserializer |
| toll_amt_charged | bigint | from deserializer |
| toll_amt_collected | bigint | from deserializer |
| toll_amt_full | bigint | from deserializer |
| ent_plaza_id | string | from deserializer |
| ent_date_time | timestamp | from deserializer |
| ent_lane_id | string | from deserializer |
+---------------------+------------+--------------------+--+

      "toll_amt_full" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },

      "txn_id" : {
        "type" : "long"
      },
      "txn_process_date" : {
        "type" : "date"
      }

The data in the fields is a numeric type and I found a document on the web saying that elasticsearch for hadoop will set the type as "long/double" if the data is integer or real .

https://www.elastic.co/guide/en/elasticsearch/hadoop/master/mapping.html

0: jdbc:hive2://hadoop2:10000/default> select toll_amt_collected,toll_amt_full from pa_lane_txn_orc limit 5;
+---------------------+----------------+--+
| toll_amt_collected | toll_amt_full |
+---------------------+----------------+--+
| 79 | 106 |
| 106 | 132 |
| 106 | 132 |
| 102 | 125 |
| 106 | 125 |

I changed the fields containing numbers to 'FLOAT' but only one is converted to float the rest are still text ?

0: jdbc:hive2://hadoop2:10000/default> select toll_amt_full, toll_amt_collected, toll_amt_charged from pa_lane_txn_orc limit 10;
+----------------+---------------------+-------------------+--+
| toll_amt_full | toll_amt_collected | toll_amt_charged |
+----------------+---------------------+-------------------+--+
| 106 | 79 | 79 |
| 132 | 106 | 106 |
| 132 | 106 | 106 |
| 125 | 102 | 106 |

     "toll_amt_charged" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "toll_amt_collected" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "toll_amt_full" : {
        "type" : "float"
      },
      "transp_id" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },

I discovered some more strange behavior , it sometimes create the fields as text and sometimes as FLOAT ? whats going on here .

      "toll_amt_charged" : {
        "type" : "float"
      },
      "toll_amt_collected" : {
        "type" : "float"
      },
      "toll_amt_full" : {
        "type" : "float"
      },
      "transp_id" : {

(James Baiera) #2

@aliyesami ES-Hadoop relies on Elasticsearch's built in dynamic mapping sense functionality to assign field types. If you are having a hard time with field assignments, I would advise using something like an index template to map out your fields ahead of time.


(sami) #3

hi James but why would elasticsearch change mapping dynamically if data is not changing ? what other factors can influence its decision to change the mapping ?

thanks


(James Baiera) #4

Elasticsearch will only set the mappings when you are creating a new index. ES-Hadoop attempts to serialize simple numeric types in JSON as numbers and everything else as a JSON string or object. After that, it's up to Elasticsearch to assign the field mappings. If Elasticsearch mis-predicts the field type that you were hoping for, the advice that we give is often to either create your indices and mappings before hand or to use index templates for your jobs to use when creating an index. There shouldn't be any fields changing once they are set in the mapping for an index.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.