Hi all,
I'm writing a web crawler in Node.js and indexing with ElasticSearch.
However, I've ran into a problem where the code hangs at the indexing
function.
Here's how the client is initialised:
var es_client = new elasticsearch.Client({
host: "localhost:9200",
log: ['error', 'trace'],
keepAlive: true,
sniffOnConnectionFault: true,
//sniffInterval: 6000,
sniffOnStart: true,
maxKeepAliveTime: 600000
});
And here's the indexing API call:
es_client.index({
index: seedURL,
type: 'post',
id: generate_md5(username + "\n" + post_title + "\n" + post_content),
body: {
thread_md5 : thread_md5,
thread_title : thread_title,
thread_url : post_list_page_url,
post_title: post_title,
post_order : post_order,
post_content: post_content,
timestamp: timestamp,
username: username,
}
}).then(
function (resp) {
console.log("Elasticsearch response to indexing " + post_title
- "...");
console.log(resp);
},
function (err) {
console.log("[ERROR] An error occurred whilst indexing: " +
post_title + "...");
console.log(err.message);
}
);
I have been testing the crawler script by commenting out the call to
indexing and it finishes the crawl no problem. This showed that the problem
somehow lies with ElasticSearch.
I have also had a look at the ElasticSearch logs and no errors were raised.
Lastly - and this could be the best hint yet - is that the number of
documents successfully indexed at every trial run hangs at exactly 277
documents.
Thoughts?
Cheers,
James
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/47d54f06-1ed8-4170-a019-31e88009fb06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.