Some documents not respecting mapping


(Matt Hughes) #1

I created a document type with the following mapping:

{

"access" : {
"properties" : {

  •  "body_bytes_sent" : {*
    
  •    "type" : "integer"*
    
  •  },*
    "protocol" : {
      "type" : "string"
    },
    "request_method" : {
      "type" : "string"
    },
    
  •  "request_time" : {*
    
  •    "type" : "float"*
    
  •  },*
    "status" : {
      "type" : "string"
    },
    "timestamp" : {
      "type" : "date",
      "format" : "dateOptionalTime"
    },
    "url" : {
      "type" : "string"
    }
    
    }
    }
    }

I have two documents of this type:

{

"_id": "734pPtfbT8iv3rWAo_IzBg",
"_index": "nginx",
"_source": {

  • "body_bytes_sent": 78,*
    "protocol": "HTTP/1.1",
    "request_method": "GET",
  • "request_time": 0.035000000000000003,*
    "status": "200",
    "timestamp": "2013-10-07T21:36:01+00:00",
    "url": "redacted"
    },
    "_type": "access",
    "_version": 1,
    "exists": true
    }

You can see that body_bytes_sent and request_time both have the correct
apparent mapping of int and float. But this other document has everything
as strings:

{

"_id": "qAtMiBxpQs6uLdv61Uznpw",
"_index": "nginx",
"_source": {
  •    "body_bytes_sent": "475",*
      "protocol": "HTTP/1.1",
      "request_method": "GET",
    
  •    "request_time": "0.088",*
      "status": "200",
      "timestamp": "2013-10-07T21:43:36+00:00",
      "url": "redacted",
    
    },
    "_type": "access",
    "_version": 1,
    "exists": true
    }

Now when I try to get the average request time, I get all kinds of problems
(ClassCastException). I thought that mappings were immutable on a type once
they were set. How can I make sure my data always has the correct type?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Hughes) #2

Strangely, after several hours all documents started storing in the correct
format. I guess this resolved itself, but some documents from yesterday are
still all strings.

On Tuesday, October 8, 2013 9:53:56 AM UTC-4, Matt Hughes wrote:

I created a document type with the following mapping:

{

"access" : {
"properties" : {

  •  "body_bytes_sent" : {*
    
  •    "type" : "integer"*
    
  •  },*
    "protocol" : {
      "type" : "string"
    },
    "request_method" : {
      "type" : "string"
    },
    
  •  "request_time" : {*
    
  •    "type" : "float"*
    
  •  },*
    "status" : {
      "type" : "string"
    },
    "timestamp" : {
      "type" : "date",
      "format" : "dateOptionalTime"
    },
    "url" : {
      "type" : "string"
    }
    
    }
    }
    }

I have two documents of this type:

{

"_id": "734pPtfbT8iv3rWAo_IzBg",
"_index": "nginx",
"_source": {

  • "body_bytes_sent": 78,*
    "protocol": "HTTP/1.1",
    "request_method": "GET",
  • "request_time": 0.035000000000000003,*
    "status": "200",
    "timestamp": "2013-10-07T21:36:01+00:00",
    "url": "redacted"
    },
    "_type": "access",
    "_version": 1,
    "exists": true
    }

You can see that body_bytes_sent and request_time both have the
correct apparent mapping of int and float. But this other document has
everything as strings:

{

"_id": "qAtMiBxpQs6uLdv61Uznpw",
"_index": "nginx",
"_source": {
  •    "body_bytes_sent": "475",*
      "protocol": "HTTP/1.1",
      "request_method": "GET",
    
  •    "request_time": "0.088",*
      "status": "200",
      "timestamp": "2013-10-07T21:43:36+00:00",
      "url": "redacted",
    
    },
    "_type": "access",
    "_version": 1,
    "exists": true
    }

Now when I try to get the average request time, I get all kinds of
problems (ClassCastException). I thought that mappings were immutable on a
type once they were set. How can I make sure my data always has the correct
type?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #3

hmm what you are seeing is the "_source" which is an unprocessed version of
what you send to ES. Even if you send it as a string we index it as a
float. The source will still be a string - we don't process it. Is it
possible that you send it as strings and change over to float?

simon

On Tuesday, October 8, 2013 4:46:37 PM UTC+2, Matt Hughes wrote:

Strangely, after several hours all documents started storing in the
correct format. I guess this resolved itself, but some documents from
yesterday are still all strings.

On Tuesday, October 8, 2013 9:53:56 AM UTC-4, Matt Hughes wrote:

I created a document type with the following mapping:

{

"access" : {
"properties" : {

  •  "body_bytes_sent" : {*
    
  •    "type" : "integer"*
    
  •  },*
    "protocol" : {
      "type" : "string"
    },
    "request_method" : {
      "type" : "string"
    },
    
  •  "request_time" : {*
    
  •    "type" : "float"*
    
  •  },*
    "status" : {
      "type" : "string"
    },
    "timestamp" : {
      "type" : "date",
      "format" : "dateOptionalTime"
    },
    "url" : {
      "type" : "string"
    }
    
    }
    }
    }

I have two documents of this type:

{

"_id": "734pPtfbT8iv3rWAo_IzBg",
"_index": "nginx",
"_source": {

  • "body_bytes_sent": 78,*
    "protocol": "HTTP/1.1",
    "request_method": "GET",
  • "request_time": 0.035000000000000003,*
    "status": "200",
    "timestamp": "2013-10-07T21:36:01+00:00",
    "url": "redacted"
    },
    "_type": "access",
    "_version": 1,
    "exists": true
    }

You can see that body_bytes_sent and request_time both have the
correct apparent mapping of int and float. But this other document has
everything as strings:

{

"_id": "qAtMiBxpQs6uLdv61Uznpw",
"_index": "nginx",
"_source": {
  •    "body_bytes_sent": "475",*
      "protocol": "HTTP/1.1",
      "request_method": "GET",
    
  •    "request_time": "0.088",*
      "status": "200",
      "timestamp": "2013-10-07T21:43:36+00:00",
      "url": "redacted",
    
    },
    "_type": "access",
    "_version": 1,
    "exists": true
    }

Now when I try to get the average request time, I get all kinds of
problems (ClassCastException). I thought that mappings were immutable on a
type once they were set. How can I make sure my data always has the correct
type?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4