We're willing to index lots of data, however after a certain period of time
it doesn't have to be as "hot" as data from the past week. What do you
experts think of the following:
Open index
Write lots of data into it
Index becomes less important after 7 days
Close index
(g)zip the index
Index remains gzip
Request for search on index
Un(g)zip the index
Open the index
Perform the search
Is there anything I miss here, that would cause problems? A quick test on a
single node worked perfectly. The default compression doesn't help us
enough.
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
On Sep 18, 2012, at 3:09 PM, Robin Verlangen robin@us2.nl wrote:
Hi there,
We're willing to index lots of data, however after a certain period of time it doesn't have to be as "hot" as data from the past week. What do you experts think of the following:
Open index
Write lots of data into it
Index becomes less important after 7 days
Close index
(g)zip the index
Index remains gzip
Request for search on index
Un(g)zip the index
Open the index
Perform the search
Is there anything I miss here, that would cause problems? A quick test on a single node worked perfectly. The default compression doesn't help us enough.
Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Thank you for the reference, however I was already aware of those options.
A quick benchmark gave us indication we could still win a lot: I'll publish
the details in here soon!
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
On Sep 18, 2012, at 3:09 PM, Robin Verlangen robin@us2.nl wrote:
Hi there,
We're willing to index lots of data, however after a certain period of
time it doesn't have to be as "hot" as data from the past week. What do you
experts think of the following:
Open index
Write lots of data into it
Index becomes less important after 7 days
Close index
(g)zip the index
Index remains gzip
Request for search on index
Un(g)zip the index
Open the index
Perform the search
Is there anything I miss here, that would cause problems? A quick test on
a single node worked perfectly. The default compression doesn't help us
enough.
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
Thank you for the reference, however I was already aware of those options.
A quick benchmark gave us indication we could still win a lot: I'll publish
the details in here soon!
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
On Sep 18, 2012, at 3:09 PM, Robin Verlangen robin@us2.nl wrote:
Hi there,
We're willing to index lots of data, however after a certain period of
time it doesn't have to be as "hot" as data from the past week. What do you
experts think of the following:
Open index
Write lots of data into it
Index becomes less important after 7 days
Close index
(g)zip the index
Index remains gzip
Request for search on index
Un(g)zip the index
Open the index
Perform the search
Is there anything I miss here, that would cause problems? A quick test on
a single node worked perfectly. The default compression doesn't help us
enough.
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
Thanks Robin -- seeing some real numbers is always refreshing! IMO
Elasticsearch documentation dearly lacks some "ballpark figures" of
what to expect under common scenarios. Together with some general info
on how things are expected to scale (constant/linear/sublinear...)
that would already help newcomers a lot.
Btw any chance you might also be comparing query times in your setup?
Would the no/compression/tv affect that at all? (ignoring the offline
ZIP option, of course)
Best,
Radim
On Sep 19, 10:36 am, Robin Verlangen ro...@us2.nl wrote:
Robin Verlangen Software engineer
*
*
Whttp://www.robinverlangen.nl
E ro...@us2.nl
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
We'll get into that later. Our application (CloudPelican) is going to
gather lots and lots of data from all kinds of different sources. We
already picked Elasticsearch out of Solor, Solandra, raw Cassandra and
Lucene for our indexing process. First important thing for us was to
determine how much storage overhead was involved. Query times are relevant,
but probably tuneable with lots of parameters.
Once we have more I'll update you over here, or you can just stay in touch
with my blog.
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
Thanks Robin -- seeing some real numbers is always refreshing! IMO
Elasticsearch documentation dearly lacks some "ballpark figures" of
what to expect under common scenarios. Together with some general info
on how things are expected to scale (constant/linear/sublinear...)
that would already help newcomers a lot.
Btw any chance you might also be comparing query times in your setup?
Would the no/compression/tv affect that at all? (ignoring the offline
ZIP option, of course)
Best,
Radim
On Sep 19, 10:36 am, Robin Verlangen ro...@us2.nl wrote:
Robin Verlangen Software engineer
*
*
Whttp://www.robinverlangen.nl
E ro...@us2.nl
Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may
be
confidential. If you are not the intended recipient, you are reminded
that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.