I have an ecommerce store with a relatively large index of products. About
10 000 000 products, 15 GB index size. This index gets updated very often,
maybe 100-1000 updates per second.
What I'm trying to do is to setup two servers. One for rapidly updating the
index, and another one just for querying. There is no problem if the data
on the "querying" server is relatively stale (up to 4 hours is ok).
However I can't figure out a way to implement this. When I do replication,
querying server is also suffering from intense writing i/o (even on XFS
with a large buffer, or ext4 with commit=60,data=writeback), and sometimes
the querying gets redirected to the "indexing" server.
So far my best guesses are:
Setup both nodes over the same Shared FS or Hadoop Gateway, write to
"indexing" node, read from "querying" node (maybe even disable data
alteration with 0.19.2 new APIs), don't let them join into cluster (?).
Sometimes reboot "querying" node so it will recover from the gateway (maybe
there is a better way to propagate changes?). Use "memory" index on
"querying" node to prevent confusion over changed disk data (?).
Setup a simple cluster replication with 1 replica, and just use
"preference=_local" Search parameter to query local replica.
Would gladly accept any advice on how to achieve this. Thanks!