Hello,
We have a small cluster with 3 nodes running 1.3.6.
I have an index setup with only two fields.
{
index: index_name,
body: {
settings: {
number_of_shards: 3,
store: {
type: :mmapfs
}
},
mappings: {
mapping_name => {
properties: {
:value => {type: 'string', analyzer: 'keyword'},
:post_ids => {type: 'long', index: 'not_analyzed'}
}
}
}
}
}
We are basically storing strings and all the post they are related to.
The problem is that this data is not stored this way in the database so I
don't have an id to represent each string nor do I have all the post_ids
from the start.
So I use the sha1 of the string value as id and I use and script to append
to the post_ids.
Here is my code that I use to index using the bulk api end point.
def index!
posts_ids = Post.where...
bulk_data = []
strings.uniq.each do |string|
string_id = Digest::SHA1.hexdigest string
bulk_data <<
{
update:
{
_index: 'post_strings',
_type: 'post_string',
_id: string_id,
data: {
script: "ctx._source.post_ids += additional_post_ids",
params: {
additional_post_ids: post_ids
},
upsert: {
value: string,
post_ids: post_ids
}
}
}
}
if bulk_data.count == 100
$elasticsearch.bulk :body => bulk_data
bulk_data = []
end
end
$elasticsearch.bulk :body => bulk_data if bulk_data.any?
end
So this worked fine for the first 75 Million strings but It was getting
slower and slower until it reached an indexing rate of only 50 doc per sec.
After that the cluster just killed itself because the nodes couldn't take
to each other.
I'm gessing all the threads were blocked trying to index and nodes had no
available threads to respond.
At first I tought it would be related to the sha1 id being not very
efficient but with my test with sequencial ids it was not getting better.
I'm out of ideas right now. Any help would be greatly appreciated.
Cheers.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.