Hi everyone,
I'm new to the enterprise search world and am beginning a new project that
deals with listings and full-text search. The data store I will be using
is Cassandra for lightning fast insertions, but will definitely need help
on the searching/querying side.
The application will be both write and read/search intensive, but there is
a projected ~600x more reads/searches than writes, so the index will
constantly be updating amongst real-time searches. There will be faceted
searches on multiple attributes and also full-text searches on certain
attributes. Everything needs to be real-time, when a row is persisted
within Cassandra it should be immediately and consistently replicated,
indexed, and searchable in the search system. The entire stack will be
deployed on AWS, so a more cloud oriented solution like ElasticSearch does
sound tempting.
Exploring prospective technologies, the main ones that popped out are
Solr/Solandra, ElasticSearch, and Sphinx. Going with a Lucene based
solution leaves me with Solr/Solandra and ElasticSearch. I fully
understand that DataStax Enterprise now has enterprise search integrating
Solr and Cassandra within the same JVM, but I would like follow open-source
and avoid being locked into a specific vendor. After research,
ElasticSearch seems like the more scalable solution being designed for the
cloud (distributed and elastic) and more performant in real-time
applications. Is there any reason I should look more into Solr/Solandra
(Solandra development is dead now after DataStax took in their developer)?
I'm sure DataStax chose to integrate with them for a reason...
-
Am I looking at the right technology for my use case? Research shows me
that ES may perform better than Solr in my use case, but the user base for
Solr seems a lot more mature. ES does seem to have very active community
support though. -
What is the best way for me to get started with this project?
-
How difficult will it be to integrate Cassandra with ES? I see that it
has been done before, but how substantial will the engineering effort be?
Should I be looking at DSE instead to mitigate formidable overhead? Any
further documentation or engineering work done to bridge this gap?
From this link I saw that there is some kind of integration between
Cassandra and ES, but looking at the github, the last commit was 2 years
ago. What kind of integration does this entail and should I even be
looking here?
Thanks in advance.
Regards,
Alvin
--