Thank you Jorg,
I'll start from the second question: Thanks! My problem was that I didn't
know about the _shutdown option so I was simply killing the process
therefore forcing the system to recover the indices.
As far as the migration from solr to elasticsearch is concerned, I
basically want the indexed/analyzed but unstored field to be transferred
from solr to ES, so I can perform a full-text search on it.
So are there tools allowing me to copy the lucene indexes over to
elasticsearch and allow me to have the same functionality?
To retrieve the actual document, I'll simply take the id and retrieve the
document from the storage. This is how the system was built before and how
I have to test it: indexed but unstored fields are kept inside solr, which
is queried for full-text searches. Actual documents are kept in a separate
filesystem. The results of the queries are taken and used to retrieve the
actual documents from this filesystem.
If we decide to move with ES, then we could change the approach and have
everything stored inside ES and reindex our full archive.
Thanks for the sharding advice, I realize I cannot use sharding with the
current configuration. The current system in solr has just 1 collection
with 1 core and 1 instance.
We are confronting performances between ES and SOLR multicore on
distributed system (not cloud, but simply having several instances and
balance the load using a custom algorithm, to have more control on where
the data goes) and after this we'll decide where we should go.
Thanks
Il giorno martedì 3 giugno 2014 09:55:21 UTC-7, Jörg Prante ha scritto:
If you have indexed the data in Solr, you should consider a tool that
can traverse the Lucene index and reconstruct the documents. This is not a
straightforward process, as you know already, because analyzed fields look
different than the original input. The reconstruction may not recover the
original input, but could be used for input into Elasticsearch, when
transformed to JSON. It heavily depends on the Solr analyzers you used.
You know that Elasticsearch index is sharded, so it is obvious you have to
reindex the documents in order to take advantage of ES sharding.
What time intervals do you mean to be expected at ES startup? When
shutting down ES, you should use the _shutdown endpoint for a clean
shutdown. A clean shutdown writes checksums to disk for fast startup. When
starting with valid checksums, ES is available within a few seconds and
turns to state "green". Otherwise it performs indices recovery. After all
shards respond after invalid checksums, and this duration is due to the
shard sizes and disk I/O speed, an ES cluster starts usually within 30
seconds to 1 minute. It can not do much faster after unclean shutdowns
because of the index recovery. The recovery, like index/search depends on
the overall power of your ES cluster. There are tunables to increase
recovery speed, by suppressing search/index performance at the same time.
Jörg
Am 02.06.14 21:33, schrieb Diego Marchi:
Hello all,
I'm testing the ES environment to see if a migration from Solr could
bring benefits to our system. We are considering a complete renovation of
our service, taking it from Java to Python plus a lot of new enhancements.
Currently we use Solr for indexing purposes. We store webpages from
customers and index them using solar. Within a solr document we have a
dozen of fields to keep track of the data, the data itself is indexed in
Solr in a *content *field which is set (in the schema.xml) to be
indexed="true" stored="false". In fact, I can do a text search on it but I
cannot retrieve the whole field (obviously..)
The actual content is saved on our server and it is a massive 22TB of
data. You'll understand we cannot reindex the whole thing just for testing
purposes. We're considering to use a subset of it but also this is time
consuming.
I was looking if there was any way to transfer the indexed but unstored
*content *field directly from solr to Elasticsearch.
On another topic, when I shut down and turn on again the ES engine, I
noticed that the documents are not all available at once, but they take
time to load.
Is that an expected behavior or is there a way (configuration option..) to
have all the documents available right away? I'm thinking, for instance, if
I have to update the engine or add some more options or for whatever reason
I need to turn down the engine and turn it on again, do I need to wait for
all the documents to be loaded in the system?
With Solr I see all of them available immediately after the search
engine has been launched...
Thank you,
Diego
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c23e11d-74fd-48c0-98b0-4d75514a6a33%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8c23e11d-74fd-48c0-98b0-4d75514a6a33%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce468f5d-c784-46d4-8d74-965c9447696d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.