Hello all,
I've been using Solr for quite some time now in the project I'm working on.
When I started this project, ES was still a bit at the beginning so I chose
the - at the time - more advanced Solr.
Now I'm re-evaluating which technology would be most appropriate for me,
especially since I'm hitting a point in data growth where I'd like to - or
even must - shard / scale out my Lucene index.
I read through the ES docs in order to achieve an understanding of whether
the features required by my applications are met. I still have a few
questions:
- Faceting method and performance: I have quite a LOT of possible facet
values, i.e. more than 1.5M. But I already know which terms I want to get
the facet values of. I can give a list of terms to Solr for which it
returns the counts to me. Can I do this with ES? I would need a few hundred
facet counts in one request. There are quite some options for term faceting
in ES, especially with the script fields and I'm not quite sure how
powerful this is. I saw you can explicitely EXCLUDE a list of terms; I'd
like to do the opposite. Can I do this? With Solr's fieldCache facet method
I'm able to get facet values in approx 1-2 seconds on a NOT sharded
environment with an index of 22M documents; I hope sharding will speed this
up enough to stay under one second of query time with a few hundred of
specified facet terms. If I could ask ES for specific term counts, could I
expect an answer in 1-2 seconds if I would not use the scalability features
(just for comparison!). - Additional facet term information: I have a use case where I want to
sort the term facet values by a kind of TF/IDF measure; i.e. I need the
facet count as well as the (total) document frequency of the facet term.
How could I get this information? Without it taking too long, of course,
but it's okay when it takes a few seconds. - Follow-up to the point above: If I had to use e.g. a plugin to get
the term document frequency values, how easy would it be to use this plugin
with ElasticSearch's scaleout-capabilities? I have such a plugin for Solr
but it only works with a single instance / node. I would have to write code
for the distributed case when sharding. Is it the same with ES or easier (I
would hope because of the distributed nature of ES from the beginning it
wouldn't be so hard)? - PreAnalyzed field values: I have quite a few fields with pre-analyzed
values, i.e. I already know the exact sequence of terms together with their
position increment, begin, start etc; for Solr there's the PreAnalyzed
field type, is there something similar in ES? Although I believe a custom
analyzer could do the trick, I haven't tried yet.
I hope I don't bother you too much with these questions. I'm trying to get
an overview about what Solr and ES can do / can't do (easily) for me. Since
I'm currently on Solr, I don't want to change without being informed
appropriatly before.
Thank you!
Best regards,
Erik
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.