Deleted documents in search results

Hi,

I have index for examle "twitter" with 2 shards and 1 replica in the 5 node
cluster (master node has no data). Several documents was deleted from index
like this:

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

but they still appear in search response and it doesn't matter wheather it
is filter or query. When I try to get document directly by GET for example:

curl -XGET 'http://localhost:9200/twitter/tweet/1'

{
_index: "twitter",
_type: "tweet",
_id: "1",
exists: false
}

So I tried call several APIs for fixing it, but it is still same:

curl -XPOST 'http://localhost:9200/twitter/_refresh'
curl -XPOST 'http://localhost:9200/twitter/_flush'
curl -XPOST 'http://localhost:9200/twitter/_cache/clear'
curl -XPOST 'http://localhost:9200/twitter/
_optimize?only_expunge_deletes=true'

How can I force Lucine/Elasticsearch to delete documents?

Can you help me please? :slight_smile:

--

Can you reproduce it with a curl recreation?
Did you try to delete it, then refresh, then search?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 nov. 2012 à 23:02, Petr Jancarik jancarikpetr@gmail.com a écrit :

Hi,

I have index for examle "twitter" with 2 shards and 1 replica in the 5 node cluster (master node has no data). Several documents was deleted from index like this:

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

but they still appear in search response and it doesn't matter wheather it is filter or query. When I try to get document directly by GET for example:

curl -XGET 'http://localhost:9200/twitter/tweet/1'

{
_index: "twitter",
_type: "tweet",
_id: "1",
exists: false
}

So I tried call several APIs for fixing it, but it is still same:

curl -XPOST 'http://localhost:9200/twitter/_refresh'
curl -XPOST 'http://localhost:9200/twitter/_flush'
curl -XPOST 'http://localhost:9200/twitter/_cache/clear'
curl -XPOST 'http://localhost:9200/twitter/_optimize?only_expunge_deletes=true'

How can I force Lucine/Elasticsearch to delete documents?

Can you help me please? :slight_smile:

--

--

So I tried to reindex deleted document again:

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{something: "aaa"}'

{"ok":true,"_index":"twitter,"_type":"entity","_id":"1","_version":2}

After that I performed a search:

curl -XPOST 'http://localhost:9200/twitter/tweet/http://localhost:9200/twitter/tweet/1
_search' -d '{filter: {term: {something: "aaa"}}}'

{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 1.0,
"_source" : {something: "aaa"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 1.0,
"_source" : {something: "aaa"}
}]
}
}

and I dont't understand at all how can be possible to have 2 totaly same
documents!

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

{"ok":true,"found":false,"_index":"twitter,"_type":"entity","_id":"1","_version":3}

I performed a search again, only one document was returned. So I tried this
process of reindexing and delete several times and now I can not even
delete "duplicit" document... :smiley: So I wait for a few minutes and tried
delete document again and duplicit one was succesfully deleted.

On Thursday, November 29, 2012 7:49:00 AM UTC+1, David Pilato wrote:

Can you reproduce it with a curl recreation?
Did you try to delete it, then refresh, then search?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 nov. 2012 à 23:02, Petr Jancarik <jancar...@gmail.com <javascript:>>
a écrit :

Hi,

I have index for examle "twitter" with 2 shards and 1 replica in the 5
node cluster (master node has no data). Several documents was deleted
from index like this:

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

but they still appear in search response and it doesn't matter wheather it
is filter or query. When I try to get document directly by GET for example:

curl -XGET 'http://localhost:9200/twitter/tweet/1'

{
_index: "twitter",
_type: "tweet",
_id: "1",
exists: false
}

So I tried call several APIs for fixing it, but it is still same:

curl -XPOST 'http://localhost:9200/twitter/_refresh'
curl -XPOST 'http://localhost:9200/twitter/_flush'
curl -XPOST 'http://localhost:9200/twitter/_cache/clear'
curl -XPOST 'http://localhost:9200/twitter/
_optimize?only_expunge_deletes=true'

How can I force Lucine/Elasticsearch to delete documents?

Can you help me please? :slight_smile:

--

--

But I can not still delete original document :frowning: Neither clear cache, flush
or refresh of index doesn't help...

On Thursday, November 29, 2012 12:57:40 PM UTC+1, Petr Jancarik wrote:

So I tried to reindex deleted document again:

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{something: "aaa"}'

{"ok":true,"_index":"twitter,"_type":"entity","_id":"1","_version":2}

After that I performed a search:

curl -XPOST 'http://localhost:9200/twitter/tweet/http://localhost:9200/twitter/tweet/1
_search' -d '{filter: {term: {something: "aaa"}}}'

{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 1.0,
"_source" : {something: "aaa"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 1.0,
"_source" : {something: "aaa"}
}]
}
}

and I dont't understand at all how can be possible to have 2 totaly same
documents!

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

{"ok":true,"found":false,"_index":"twitter,"_type":"entity","_id":"1","_version":3}

I performed a search again, only one document was returned. So I tried
this process of reindexing and delete several times and now I can not even
delete "duplicit" document... :smiley: So I wait for a few minutes and tried
delete document again and duplicit one was succesfully deleted.

On Thursday, November 29, 2012 7:49:00 AM UTC+1, David Pilato wrote:

Can you reproduce it with a curl recreation?
Did you try to delete it, then refresh, then search?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 nov. 2012 à 23:02, Petr Jancarik jancar...@gmail.com a écrit :

Hi,

I have index for examle "twitter" with 2 shards and 1 replica in the 5
node cluster (master node has no data). Several documents was deleted
from index like this:

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

but they still appear in search response and it doesn't matter wheather
it is filter or query. When I try to get document directly by GET for
example:

curl -XGET 'http://localhost:9200/twitter/tweet/1'

{
_index: "twitter",
_type: "tweet",
_id: "1",
exists: false
}

So I tried call several APIs for fixing it, but it is still same:

curl -XPOST 'http://localhost:9200/twitter/_refresh'
curl -XPOST 'http://localhost:9200/twitter/_flush'
curl -XPOST 'http://localhost:9200/twitter/_cache/clear'
curl -XPOST 'http://localhost:9200/twitter/
_optimize?only_expunge_deletes=true'

How can I force Lucine/Elasticsearch to delete documents?

Can you help me please? :slight_smile:

--

--

    After that I performed a search:
    
    
    curl -XPOST 'http://localhost:9200/twitter/tweet/_search' -d
    '{filter: {term: {something: "aaa"}}}'
    
    
    
    {
      "took" : 20,
      "timed_out" : false,
      "_shards" : {
        "total" : 2,
        "successful" : 2,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "twitter",
          "_type" : "tweet",
          "_id" : "1",
          "_score" : 1.0, 
          "_source" : {something: "aaa"}
        }, {
          "_index" : "twitter",
          "_type" : "tweet",
          "_id" : "1",
          "_score" : 1.0, 
          "_source" : {something: "aaa"}
        }]
      }
    }
    
    
    and I dont't understand at all how can be possible to have 2
    totaly same documents!

Did you index the first doc with routing or parent?

Try repeating your search but ask for the _routing:

curl -XGET 'http://127.0.0.1:9200/ia/_search?pretty=1' -d '
{
"fields" : [
"_routing"
],
"filter" : {
"term" : {
"_id" : 1
}
}
}
'

clint

--

Great, it helps !!!

Thanks a lot Clint :slight_smile:

On Thursday, November 29, 2012 1:37:08 PM UTC+1, Clinton Gormley wrote:

    After that I performed a search: 
    
    
    curl -XPOST 'http://localhost:9200/twitter/tweet/_search' -d 
    '{filter: {term: {something: "aaa"}}}' 
    
    
    
    { 
      "took" : 20, 
      "timed_out" : false, 
      "_shards" : { 
        "total" : 2, 
        "successful" : 2, 
        "failed" : 0 
      }, 
      "hits" : { 
        "total" : 2, 
        "max_score" : 1.0, 
        "hits" : [ { 
          "_index" : "twitter", 
          "_type" : "tweet", 
          "_id" : "1", 
          "_score" : 1.0, 
          "_source" : {something: "aaa"} 
        }, { 
          "_index" : "twitter", 
          "_type" : "tweet", 
          "_id" : "1", 
          "_score" : 1.0, 
          "_source" : {something: "aaa"} 
        }] 
      } 
    } 
    
    
    and I dont't understand at all how can be possible to have 2 
    totaly same documents! 

Did you index the first doc with routing or parent?

Try repeating your search but ask for the _routing:

curl -XGET 'http://127.0.0.1:9200/ia/_search?pretty=1' -d '
{
"fields" : [
"_routing"
],
"filter" : {
"term" : {
"_id" : 1
}
}
}
'

clint

--