I've imported over 250 millions documents in Elasticsearch from Oracle via Logstash and 45 docs hasn't been imported. Count on ES and count on ORA are not the same. I want to check which documents are missing. My doc_id
is one parameter that is unique and it is increasing by 1 always. So, If my first document was 100000001
, the next one is 100000002
and so on. Idea is to check which doc_id
s are missing by increasing starting number by 1 and print doc_id
-s of missing docs. I'm not quite sure am I using the correct script because my doc_id
is a number but it's mapped like text so I need to convert it to a number also. Here's what I tried (Elasticsearch is 2.3):
GET /my_index/my_type/_search
{
"size": 0,
"aggs": {
"find_missing_ids": {
"histogram": {
"field": "my_id", #text type - need to convert to Int/long
"interval": 1,
"min_doc_count": 0
},
"aggs": {
"remove_existing_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": {
"inline": "count == 0",
"lang": "expression"
}
}
}
}
}
}
}
How can I cast my_id
field to number, because when I try to add this to script:
"script": {
"inline": "if (doc.containsKey('my_id')) { if (doc['my_id.value'] != null) { return Integer.parseInt(doc['my_id'].value) }}",
"lang": "expression"
I get
{
"type": "json_parse_exception",
"reason": "json_parse_exception: Illegal unquoted character ((CTRL-CHAR, code 9)): has to be escaped using backslash to be included in string value\n at [Source: [B@18d4b515; line: 17, column: 100]"
Mapping for id field is
"my_id": {
"type": "string",
"index": "not_analyzed"