The optimal batch size is really dependent on what you index. Indexing
100 items with 1mb size is different than indexing 100 items with 1k size.
Also, it depends on how many concurrent clients are issuing the bulk
requests.
On Monday, February 6, 2012 at 2:05 PM, K.B. wrote:
Hello Oren,
Im having a similar problem, meaning ES is nearly unresponsive during
the index of a large batch insert - in my case its about 1300 to 1600
batch items per batch insert following a whole index drop and create
cycle. Can you please tell me what the best batch size was so you
didnt encounter any delays on the system?
Best
KB
On 25 Jan., 07:54, Oren Mazor oren.ma...@gmail.com wrote:
Hi Shay,
just a follow up (because I hate it when there is no closure).
I modified my import script to use bulk imports, so instead of 10
insertions a second, I now end up doing one bulk insertion every ten
seconds. I had it up to a minute, but I think inserting 600-800
records in one bulk request was causing some problems, so I shortened
the frequency.
so far I'm not seeeing any serious delays in testing this week, but
tomorrow I'll do some bigger load testing with our big index. it seems
promising at the moment!
On Jan 20, 2:26 pm, Shay Banon kim...@gmail.com wrote:
Hard to tell if its GC, you can monitor it using bigdesk to see changes,
see how memory is behaving. Though you way you have a 30 minute "pause",
which is strange. Did you check the refresh stats? Also, when this happens,
can you simply get by id the relevant new / modified document?
On Fri, Jan 20, 2012 at 5:58 PM, Oren Mazor oren.ma...@gmail.com wrote:
Yup. I've done direct queries for a document that should be there, and
even 30 minutes later, it is still not available.
based on the semi-regular pattern of these delays, I'm wondering if
there's some kind of memory or gc issue playing up?
we have two nodes with 16gb/32 on the first, and 10/24 on the second.
On Jan 20, 10:06 am, Shay Banon kim...@gmail.com wrote:
It makes little sense to use query_string as a filter, I suggest you
don't
do that. But, even when using it as a filter, you should still see
changes.
Can you verify its not the query? i.e. just search for a document
recently
added and see if you get it back?
On Fri, Jan 20, 2012 at 8:07 AM, Oren Mazor oren.ma...@gmail.com
wrote:
also, its probably worth sharing my frontend's query:
{
"filter" : {
"and" : [
{
"term": {
"SID": $num
}
},
{
"query": {
"query_string" : {
"default_operator" : "AND",
"fields": ["X","Y"],
"query" : "$QUERY"
}
}
}
]
},
"sort" : [
{
"Y" : {
"order" : "desc"
}
}
],
"size" : 1
}'
I understand that there is no caching involved with the AND filter,
but sort is a different matter (Y is a date)
On Jan 20, 12:21 am, Oren Mazor oren.ma...@gmail.com wrote:
yup. I can see an insertion request going into ES (but not the
response. now that I think of it), but running my query shows no
record is available for that item.
all of our records are virtually the same size (about 1kb), and the
most insertions we'd be seeing is 10-20 per second. occasionally that
might go up to 50.
how often does refresh happen by default, and how long does it take?
I'm wondering if 10 shards is not enough for the size of our index.
On Jan 18, 4:13 pm, Shay Banon kim...@gmail.com wrote:
Does this happen with search request, where you see the old data?
By
default, elasticsearch will refresh an index to see newly indexed
docs
(or
deletes) every seconds. Can you use the index stats API to see if
there was
a bump in how long it took to refresh (there is refresh stats
there).
On Wed, Jan 18, 2012 at 8:15 AM, Oren Mazor oren.ma...@gmail.com
wrote:
Hi all,
We've deployed elasticsearch in our production and we're
incredibly
happy with search performance. However, we're seeing occasional
issues
where ES seems to return an older version of a record. In some
cases
it can take up to half an hour before the proper (latest)
version of
a
record shows up. We have two nodes with 10 shards each with one
replica, and the index is about 30m records and 25gb in size, so
it's
not the smallest
This is pretty hard to reproduce, so its relatively hard to
test. But
I'd love to hear ideas.
thanks!