Failing to search after removing docs


(Javier Barroso) #1

Hello,

I'm lost with this question.

I'm playing with elasticsearch, I loaded apache logs from logstash, then I want to remove such logs and add it again, with a new logstash configuration.

For remove all the logs docs I'm using the next bash script:

function formar_bulk_delete
{
        jq -M -c -r '{delete: .hits.hits[]|{_index: ._index, _id: ._id, _type: ._type}}' "$1"
}
i=$(date +%s)
output_search="search_$i.txt"
output_bulk="bulk_$i.txt"
    curl -s -o $output_search -x '' \
            "http://192.168.2.192:9200/_search?scroll=1m&size=100"  -d '
    {
            "query": {
                    "match": {
                            "source": "httpd"
                    }
            }
    }
    '
formar_bulk_delete $output_search > bulk_data_$i.txt
curl -s -o $output_bulk -x '' -XPOST "http://192.168.2.192:9200/_bulk" --data-binary @bulk_data_$i.txt
                curl -s -x '' -XGET http://192.168.2.192:9200/_search/scroll -d '
                {
                        "scroll": "1m",
                        "scroll_id": "'$id'"
                }
' > $output_search
let i=$i+1
output_search="search_$i.txt"
output_bulk="bulk_$i.txt"
formar_bulk_delete $output_search > bulk_data_$i.txt
curl -o $output_bulk -s -x '' -XPOST "http://192.168.2.192:9200/_bulk" --data-binary @bulk_data_$i.txt

I'm not sure why it is not removing all the docs that I want, but, after exec such script, I get a error when I search a item that was removed:

curl -s -x '' http://192.168.2.192:9200/_search -d '{ "query": { "match": { "_id" : "AVMdWYmDtHY6VebGk5W5" } } }' > fallo_al_buscar_item_eliminado-resultado.json
jq '{"_shards failed": ._shards.failed,"_shards sucessful": ._shards.successful,"shard failure": ._shards.failures[0]}' fallo_al_buscar_item_eliminado-resultado.json
{
"_shards failed": 198,
"_shards sucessful": 1928,
"shard failure": {
"shard": 2,
"index": "logstash-2016.01.15",
"node": "trVvv6kYQUG6MrSagqFUgQ",
"reason": {
"type": "es_rejected_execution_exception",
"reason": "rejected execution of org.elasticsearch.transport.TransportService$4@7195089f on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@24d59183[Running, pool size = 7, active threads = 7, queued tasks = 1000, completed tasks = 567730]]"
}
}
}

After such error, It will search ok

Do you have any tip ?

I have modified next parameter, so I can scroll from 400 to 400, I'm not sure how can I scroll + bulk on larger (1000 ? docs?)
threadpool.bulk.queue_size: 500

Thank you


(Mark Harwood) #2

I'm not 100% certain what your script is doing but I can see that the error is that the server is overloaded with pending search requests (the queued tasks waiting for a free search thread is 1,000).

Generally in logging scenarios mass deletion of content is achieved by dropping whole indices rather than individual document deletes. This means that you organise log records into "time-based indices" (e.g. one per day) and use an alias to control what indices are seen as current. Old indices can then be deleted. Check out https://www.elastic.co/guide/en/elasticsearch/guide/master/time-based.html


(Javier Barroso) #3

Thank you!!

But, how can I see from where are such searchs??

Currently elasticsearch are only poblated by 5 or 6 servers which send its logs.

Nos I'm populating it with new logs , and removing it with scroll+bulk (my script do it) for test and for play with logstash configuration

My bulks have only 100 requests, how can I know on what are busy elasticsearch?

Thank you very much. Your help is appreciated


(Javier Barroso) #4

I can read how to know what is pending: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-pending.html

Not sure if I can use such api when queue is full and not sure why 1000 pending tasks are there

Thank you!


(system) #5