Any issue store integer values in text field

Hi,

I have type text field, the value for this field sometimes comes in as integer, it is store in ES as integer. any issue when querying this field later on? will the integer value be indexed as string value by ES?

thanks,
Shan

The only issue is the very first document's detected "type".
People often store documents in monthly or some sort of time base fashion. So if the very first document for the new (say monthly) index shows up as an integer, that month's field will be automatically mapped to integer.
This will create an issue for subsequent documents with "text" value. Those documents will not be stored.

The solution is to use mapping to control what that field's type is. In your case should be "text" since it could be mixed. And because it's a mixed typed field, your only solution is "text".
There's no issue storing integer as text cuz you can index & search text without issue.

In general, if a field will only be integer, then storing it as integer will be more efficient than text.
Just use a mapping to fix fields you care about. It's deterministic so you don't have to worry about this particular field being accidentally mapped to integer and cause write to drop bunch of documents.

We had an issue with auto type detection between integer and float. We needed float but documents showing up with integer causing the auto detection to restrict mapping to integer only. Learned the hard way.

1 Like

Just to make sure I understand it correctly. right now, I have my field mapped as type text; the data for this field could be 123 or "123", internally ES will store and index 123 as text, both cases will make no difference for search, correct?

correct. You must use "text" since that field is mixed. You have no choice.
It won't impact searching.
You just need to ensure "order" when doing comparison.
Is "123" still greater than "2" for example. I don't remember.

We don't do comparison that much so it's not an issue for us storing number as text. And we often do so when it makes sense. Storing text has the benefit of capturing unforeseen values like "null", "N/A", or "None", etc.
We sometime can't guarantee upstream data format; therefore, storing as text has such benefit of capturing bad values and ask the engineers to fix their payload, etc...
If we just force to a restrictive format, the end result is the possibility of losing payload (ES won't store it) and you will not know until the customer is complaining.

Just a real deployment experience to share with you.

I did some testing, I have a field 'age' with type integer, both {"age":21}, {"age":"21"} got indexed fine, I got {"age":21}, {"age":"21"} in search results. I think even the mapping of this field is integer, ES does not store string value as integer, is it?
this document {"age":"21w"} will be rejected, which is expected.

I suppose they are obtained from _source. _source stores just the same as the input. You may check by _field the way elasticsearch index them.

PUT /test_integer/
{
  "mappings": {
    "properties": {
      "age": {
        "type":"integer"
      }
    }
  }
}

POST /test_integer/_bulk
{"index":{}}
{"age":"12"}
{"index":{}}
{"age":12}

GET /test_integer/_search?filter_path=hits.hits._source,hits.hits.fields
{
  "fields": [
    "age"
  ]
}
{
  "hits" : {
    "hits" : [
      {
        "_source" : {
          "age" : "12"
        },
        "fields" : {
          "age" : [
            12
          ]
        }
      },
      {
        "_source" : {
          "age" : 12
        },
        "fields" : {
          "age" : [
            12
          ]
        }
      }
    ]
  }
}

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.