Hey guys,
I am using the script below to reindex 115,000 documents. (I am running the
script locally)
<?php
// PHP ReIndexer with Bulk API
require 'vendor/autoload.php';
// we use this function to create the "scan & scroll" search requests
because such requests doesn't exist in the ES PHP API.
function curlWrapper($uri, $method, $data = '')
{
$ch = curl_init($uri);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $method);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
if ($data != '')
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
error_reporting(E_ALL);
ini_set( 'display_errors','1');
date_default_timezone_set("UTC");
$ELSEARCH_SERVER = "http://someserver:9200/";
$OLDINDEX = "OldIndex"; //old index
$SECONDINDEX = "NewIndex"; // new index
$TYPE = 'MyType'; // old type
$LOGPATH = '/var/log/elasticsearch/elasticsearch.log';
$clientParams = array();
$clientParams['logging'] = true;
$clientParams['logPath'] = $LOGPATH;
$clientParams['logLevel'] = Psr\Log\LogLevel::INFO;
$clientParams['hosts'] = array ($ELSEARCH_SERVER);
$dstEl = new Elasticsearch\Client($clientParams);
//start the scan request
//We want to find all documents, so we do a simple match_all
$query ='{"query" : {"match_all" : {}}}';
//The scroll=10m param says that this scroll session should be valid for 10
minutes before expiring
//The size=100 param says that 100 results should be returned per scroll
$uri = $ELSEARCH_SERVER.$OLDINDEX."/".$TYPE.
"/_search?search_type=scan&scroll=10m&size=100";
$response = curlWrapper($uri, 'GET', $query);
$data = json_decode($response);
//total number of documents in the index
$total = $data->hits->total;
//scroll session id, used to request the next batch of data
$scroll_id = $data->_scroll_id;
//The scan request doesn't actually return any data, just a session "scroll
id"
//We now query ES and provide this id to start retrieving the data
$uri = $ELSEARCH_SERVER."_search/scroll?scroll=10m";
$response = curlWrapper($uri, 'GET', $scroll_id);
$data = json_decode($response);
// Initialize bulk insertion parameters.
$bulkInsertParams = array();
$bulkInsertParams['index'] = $SECONDINDEX;
$bulkInsertParams['type'] = $TYPE;
echo date("Y-m-d H:i:s") . ": Start ReIndexing." . PHP_EOL;
//Loop through all the data
while (count($data->hits->hits) > 0)
{
$bulkInsertParams["body"]=null;
foreach ($data->hits->hits as $item) // run for each match of the
"scan&scroll search".
{
$bulkInsertParams["body"][] = array(
'index' => array(
'_id' => $item->_id
)
);
$bulkInsertParams["body"][] = array(
'doc' => $item->_source
);
}
$retVal = $dstEl->bulk($bulkInsertParams);
//Each scroll request returns another scroll_id which is used to continue
//scrolling through the data
$scroll_id = $data->_scroll_id;
//retrieve the next batch of data - the new session is good for an
additional 10m, etc etc
$uri = $ELSEARCH_SERVER."_search/scroll?scroll=10m";
$response = curlWrapper($uri, 'GET', $scroll_id);
$data = json_decode($response);
}
echo date("Y-m-d H:i:s") . ": DONE!" . PHP_EOL;
?>
every thing seems to work fine and even when i use this query:
GET NewIndex/MyType/_search
{
"size":0
}
I get these results (Which looks good)
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 115102,
"max_score": 0,
"hits": []
}
}
But when i am trying to make a query on the documents' field i get no
results while when i run the exact same query on the old index i get the
expected results..
This is the query (if it helps):
GET NewIndex/MyType/_search
{
"query": {
"terms": {
"doc_type": [
"user_view"
]
}
}
}
the results are:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
}
}
while the results for the OldIndex are:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 104452,
"max_score": 0,
"hits": []
}
}
I am wondering if there is something else that i should do to make the
documents get indexed in the elasticsearch?
Note:
(*) when I try to get specific document (by key) from NewIndex the results
is fine..
Thnx for you help
Niv
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84795182-eab5-4b74-a8ef-d1bcdb989659%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.