Issue with reindexing


(Niv Penso) #1

Hey guys,

I am using the script below to reindex 115,000 documents. (I am running the
script locally)

<?php // PHP ReIndexer with Bulk API require 'vendor/autoload.php'; // we use this function to create the "scan & scroll" search requests because such requests doesn't exist in the ES PHP API. function curlWrapper($uri, $method, $data = '') { $ch = curl_init($uri); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $method); curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0); if ($data != '') curl_setopt($ch, CURLOPT_POSTFIELDS, $data); $response = curl_exec($ch); curl_close($ch); return $response; } error_reporting(E_ALL); ini_set( 'display_errors','1'); date_default_timezone_set("UTC"); $ELSEARCH_SERVER = "http://someserver:9200/"; $OLDINDEX = "OldIndex"; //old index $SECONDINDEX = "NewIndex"; // new index $TYPE = 'MyType'; // old type $LOGPATH = '/var/log/elasticsearch/elasticsearch.log'; $clientParams = array(); $clientParams['logging'] = true; $clientParams['logPath'] = $LOGPATH; $clientParams['logLevel'] = Psr\Log\LogLevel::INFO; $clientParams['hosts'] = array ($ELSEARCH_SERVER); $dstEl = new Elasticsearch\Client($clientParams); //start the scan request //We want to find all documents, so we do a simple match_all $query ='{"query" : {"match_all" : {}}}'; //The scroll=10m param says that this scroll session should be valid for 10 minutes before expiring //The size=100 param says that 100 results should be returned per scroll $uri = $ELSEARCH_SERVER.$OLDINDEX."/".$TYPE. "/_search?search_type=scan&scroll=10m&size=100"; $response = curlWrapper($uri, 'GET', $query); $data = json_decode($response); //total number of documents in the index $total = $data->hits->total; //scroll session id, used to request the next batch of data $scroll_id = $data->_scroll_id; //The scan request doesn't actually return any data, just a session "scroll id" //We now query ES and provide this id to start retrieving the data $uri = $ELSEARCH_SERVER."_search/scroll?scroll=10m"; $response = curlWrapper($uri, 'GET', $scroll_id); $data = json_decode($response); // Initialize bulk insertion parameters. $bulkInsertParams = array(); $bulkInsertParams['index'] = $SECONDINDEX; $bulkInsertParams['type'] = $TYPE; echo date("Y-m-d H:i:s") . ": Start ReIndexing." . PHP_EOL; //Loop through all the data while (count($data->hits->hits) > 0) { $bulkInsertParams["body"]=null; foreach ($data->hits->hits as $item) // run for each match of the "scan&scroll search". { $bulkInsertParams["body"][] = array( 'index' => array( '_id' => $item->_id ) ); $bulkInsertParams["body"][] = array( 'doc' => $item->_source ); } $retVal = $dstEl->bulk($bulkInsertParams); //Each scroll request returns another scroll_id which is used to continue //scrolling through the data $scroll_id = $data->_scroll_id; //retrieve the next batch of data - the new session is good for an additional 10m, etc etc $uri = $ELSEARCH_SERVER."_search/scroll?scroll=10m"; $response = curlWrapper($uri, 'GET', $scroll_id); $data = json_decode($response); } echo date("Y-m-d H:i:s") . ": DONE!" . PHP_EOL; ?>

every thing seems to work fine and even when i use this query:

GET NewIndex/MyType/_search
{
"size":0
}

I get these results (Which looks good)

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 115102,
"max_score": 0,
"hits": []
}
}

But when i am trying to make a query on the documents' field i get no
results while when i run the exact same query on the old index i get the
expected results..

This is the query (if it helps):

GET NewIndex/MyType/_search
{
"query": {
"terms": {
"doc_type": [
"user_view"
]
}
}
}

the results are:

{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
}
}

while the results for the OldIndex are:

{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 104452,
"max_score": 0,
"hits": []
}
}

I am wondering if there is something else that i should do to make the
documents get indexed in the elasticsearch?

Note:
(*) when I try to get specific document (by key) from NewIndex the results
is fine..

Thnx for you help
Niv :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84795182-eab5-4b74-a8ef-d1bcdb989659%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2