Find missing documents in index by id field

I've imported over 250 millions documents in Elasticsearch from Oracle via Logstash and 45 docs hasn't been imported. Count on ES and count on ORA are not the same. I want to check which documents are missing. My doc_id is one parameter that is unique and it is increasing by 1 always. So, If my first document was 100000001, the next one is 100000002 and so on. Idea is to check which doc_ids are missing by increasing starting number by 1 and print doc_id-s of missing docs. I'm not quite sure am I using the correct script because my doc_id is a number but it's mapped like text so I need to convert it to a number also. Here's what I tried (Elasticsearch is 2.3):

GET /my_index/my_type/_search
{
  "size": 0,
  "aggs": {
    "find_missing_ids": {
      "histogram": {
        "field": "my_id",  #text type - need to convert to Int/long
        "interval": 1,
        "min_doc_count": 0
      },
      "aggs": {
        "remove_existing_bucket_selector": {
          "bucket_selector": {
            "buckets_path": {
              "count": "_count"
            },
            "script": {
              "inline": "count == 0",
              "lang": "expression"
            }
          }
        }
      }
    }
  }
}

How can I cast my_id field to number, because when I try to add this to script:

"script": {
              "inline": "if (doc.containsKey('my_id')) { if (doc['my_id.value'] != null) {	return Integer.parseInt(doc['my_id'].value) }}", 
              "lang": "expression"

I get

{
"type": "json_parse_exception",
"reason": "json_parse_exception: Illegal unquoted character ((CTRL-CHAR, code 9)): has to be escaped using backslash to be included in string value\n at [Source: [B@18d4b515; line: 17, column: 100]"

Mapping for id field is

           "my_id": {
            "type": "string",
            "index": "not_analyzed"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.