Delete by query is not deleting all the docs and still index doc count shows same


(kish) #1

I tried to delete the 2000 docs of particular index. It got deleted but when i try to list then i am able to see the docs count unchanged..

root@kishan> curl -XPOST 'localhost:9200/cscfcounter/cscfcounter/_delete_by_query?routing=1&pretty' -H 'Content-Type: application/json' -d'
{
  "size" : 2000,
  "sort" : [
        { "date" : {"order" : "asc"}}
    ]
  
}'
{
  "took" : 61,
  "timed_out" : false,
  "total" : 3,
  "deleted" : 3,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
root@kishan> curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   cscfcounter       MWNXpegeTuacRlK2bMpbHg   5   1      20943            0     11.6mb          5.8mb
root@kishan>

(kish) #2

one sample:

root@khasimtestnode-oam01> curl -XGET 'localhost:9200/cscfcounter/_search?pretty' -H 'Content-Type: application/json' -d'                    
{
  "size" : 2,           
  "sort" : [
{ "date" : {"order" : "asc"}}        
]    
}'  | grep  "date"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1408  100  1345  100    63   116k   5591 --:--:-- --:--:-- --:--:--  119k
          "date" : "2018-04-09T14:19:07.195777239+05:30"
          "date" : "2018-04-09T14:19:17.195406349+05:30"
root@khasimtestnode-oam01> curl -XPOST 'localhost:9200/cscfcounter/_delete_by_query?routing=1&pretty' -H 'Content-Type: application/json' -d'
{
  "size" : 2,
  "sort" : [
        { "date" : {"order" : "asc"}}
    ]
                  
}'
{
  "took" : 21,
  "timed_out" : false,
  "total" : 2,
  "deleted" : 2,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
root@khasimtestnode-oam01> curl -XGET 'localhost:9200/cscfcounter/_search?pretty' -H 'Content-Type: application/json' -d'                    
{
  "size" : 2,
  "sort" : [
{ "date" : {"order" : "asc"}}        
]    
}'  | grep  "date"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1408  100  1345  100    63   208k  10004 --:--:-- --:--:-- --:--:--  218k
          "date" : "2018-04-09T14:19:07.195777239+05:30"
          "date" : "2018-04-09T14:19:17.195406349+05:30"

If we see the date field, it is present even after deletion


(David Pilato) #3

Are you trying to remove entirely your index here?

If so, why not just running a:

DELETE cscfcounter

It will be much more efficient.


(kish) #4

No, i do not want delete my entire index. I just want delete old entries. In my case i want delete old 100/1000 entries for every 20hrs.


(David Pilato) #5

You'd better create time based indices and drop the old data directly IMO.

Anyway, when you run your script, are you refreshing the index or waiting a bit before running the next query?


(kish) #6

Yes, i initial tried to during creation of index itself. Could you kindly tell how to condition (max_docs) during creation of index itself? i tried by rollover-index by NewIndicesRolloverService but could get it.

Yes, i tried refresh also. maybe for sample from below code, i tried removing the 500 docs but on the output it showed only 2** even though there are 700 docs available.

root@khasimtestnode-oam01> curl -XGET 'localhost:9200/_cat/indices?v&pretty'  | grep  -i cscfcounter
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1596  100  1596    0     0   122k      0 --:--:-- --:--:-- --:--:--  129k
green  open   cscfcounter        un2SueTyQGWBY99sTBhG2g   5   1        846            5    674.2kb        333.7kb
root@khasimtestnode-oam01> curl -XPOST 'localhost:9200/cscfcounter/_delete_by_query?routing=1&pretty' -H 'Content-Type: application/json' -d'
{
  "size" : 500,
  "sort" : [
        { "date" : {"order" : "asc"}}
    ]
  
}'
{
  "took" : 223,
  "timed_out" : false,
  "total" : 144,
  "deleted" : 144,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
root@khasimtestnode-oam01> curl -XGET 'localhost:9200/_cat/indices?v&pretty'  | grep  -i cscfcounter
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1596  100  1596    0     0   6772      0 --:--:-- --:--:-- --:--:--  6791
green  open   cscfcounter        un2SueTyQGWBY99sTBhG2g   5   1        707            0    509.9kb        254.9kb

(David Pilato) #7

What is the output of:

GET cscfcounter/_search?routing=1

(kish) #8

root@kishan> curl -XGET 'localhost:9200/cscfcounter/_search?routing=1&pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 77,
"max_score" : 1.0,
"hits" : [
{
"_index" : "cscfcounter",
"_type" : "cscfcounter",
"_id" : "AWKp2jUbB8Zto5JzLVgZ",
"_score" : 1.0,
"_source" : {
"CpuAverageLoad" : 5,
"HaGroupId" : "1001",
"LbGroupId" : "",
"MemFree" : 100,
"MemUsed" : 0,
"NodeId" : "khasimtestnode-cscf01",
"NodeType" : "cscf",
"State" : "online",
"Static_limit" : 0,
"date" : "2018-04-09T15:31:12.731189342+05:30"
}
},
{
"_index" : "cscfcounter",
"_type" : "cscfcounter",
"_id" : "AWKp2qERB8Zto5JzLVge",
"_score" : 1.0,
"_source" : {
"CpuAverageLoad" : 5,
"HaGroupId" : "1002",
"LbGroupId" : "",
"MemFree" : 100,
"MemUsed" : 0,
"NodeId" : "khasimtestnode-cscf02",
"NodeType" : "cscf",
"State" : "online",
"Static_limit" : 0,
"date" : "2018-04-09T15:31:40.370994808+05:30"
}
},
{
"_index" : "cscfcounter",
"_type" : "cscfcounter",
"_id" : "AWKp2zJOPZnjvZWmG7ws",
"_score" : 1.0,
"_source" : {
"CpuAverageLoad" : 4,
"HaGroupId" : "1001",
"LbGroupId" : "",
"MemFree" : 100,
"MemUsed" : 0,
"NodeId" : "khasimtestnode-cscf01",
"NodeType" : "cscf",
"State" : "online",
"Static_limit" : 0,
"date" : "2018-04-09T15:32:17.546556193+05:30"
}
},
...
...
...
...
...
...


(David Pilato) #9

So if you run now:

POST cscfcounter/_delete_by_query?routing=1
{
  "size" : 500,
  "sort" : [
        { "date" : {"order" : "asc"}}
    ]  
}

It should remove 77 hits.

Then if you run:

GET cscfcounter/_search?routing=1

You should get 0 document back unless you have indexed new data in the meantime.


(kish) #10

Okay, many thx ,i am trying . But

  • what is the difference between the doc.count from '_cat/indices' and 'cscfcounter/_search?routing=1' ???
  • To add in golang where can i get the API list? and i understand that i have missed to 'routing' the in delete_by_query. So is it possible to add in golang syntax.?

(kish) #11

Any suggestions?


(David Pilato) #12

Read this and specifically the "Also be patient" part.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

what is the difference between the doc.count from '_cat/indices' and 'cscfcounter/_search?routing=1'

The later counts documents that are available for search from one single shard (the shard which correspond to routing key = 1).
IIRC the former counts the documents that have been indexed (including documents that have been removed may be)

To add in golang where can i get the API list? and i understand that i have missed to 'routing' the in delete_by_query. So is it possible to add in golang syntax.?

I don't know. I don't know the go client.

In the first place, why are you using routing?


(kish) #13

okay, Why i am using routing...
I see in your command

'POST cscfcounter/_delete_by_query?routing=1'

you have used the routing=1 and hence i want to use it.
If i try to delete without using the routing=1, as said, i am unable to delete the number i mention in my request.


(David Pilato) #14

you have used the routing=1 and hence i want to use it.

You have been using routing in the first place. See Delete by query is not deleting all the docs and still index doc count shows same

root@kishan> curl -XPOST 'localhost:9200/cscfcounter/cscfcounter/_delete_by_query?routing=1&pretty' -H 'Content-Type: application/json' -d'

(system) closed #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.