We're approaching the first release of our product and we use ElasticSearch
as a key component in our system. But there's still some questions and
doubts so I'd like to listen to the more experienced users and
ElasticSearch folks here.
We use ElasticSearch as a search tool but also the storage of all
documents. It means that the front-end retrieves fields from ES just as if
it's a database. We've already disable the index (index: no) on the fields
that don't need to be searched (list of ids etc.) but is this a good usage
of ElasticSearch? Given that we expected to have ~ 1 billion documents (~
1.4kb each) in our first 3 months in a single index.
We will use thrift to push documents in production because we've seen a
performance gain. Is there any downside of using thrift over plain json?
Some of our queries uses regexp filter. In my comprehension this needs
to load the target field of every document to see if it matches, so it's
pretty costly for an index of 1 billion docs?
We're also benchmarking our ES setup but your advice and experience are
very appreciated! Thanks in advance!
On Tuesday, December 24, 2013 8:47:54 AM UTC-5, Han JU wrote:
Hi,
We're approaching the first release of our product and we use
Elasticsearch as a key component in our system. But there's still some
questions and doubts so I'd like to listen to the more experienced users
and Elasticsearch folks here.
We use Elasticsearch as a search tool but also the storage of all
documents. It means that the front-end retrieves fields from ES just as if
it's a database. We've already disable the index (index: no) on the fields
that don't need to be searched (list of ids etc.) but is this a good usage
of Elasticsearch? Given that we expected to have ~ 1 billion documents (~
1.4kb each) in our first 3 months in a single index.
1.4KB is pretty small, so that's fine. Often keeping it all in ES is
simpler - doesn't require another hope to another server (e.g. a DB) to
retrieve display data, there is one moving piece fewer, which makes
everything simple. I'd keep your display data in ES and worry about
changing it later IFF you have issues.
We will use thrift to push documents in production because we've seen a
performance gain. Is there any downside of using thrift over plain json?
Some of our queries uses regexp filter. In my comprehension this needs
to load the target field of every document to see if it matches, so it's
pretty costly for an index of 1 billion docs?
Yes, regexps are not the fastest. What are you trying to do that requires
regexp filter?
The reason that we use regexp filter is also the simplicity. We have a
layer that translates front-end format query to Elasticsearch query DSL.
Definitly we need use something like phrase_query and some custom analyser.
On Tuesday, December 24, 2013 8:47:54 AM UTC-5, Han JU wrote:
Hi,
We're approaching the first release of our product and we use
Elasticsearch as a key component in our system. But there's still some
questions and doubts so I'd like to listen to the more experienced users
and Elasticsearch folks here.
We use Elasticsearch as a search tool but also the storage of all
documents. It means that the front-end retrieves fields from ES just as if
it's a database. We've already disable the index (index: no) on the fields
that don't need to be searched (list of ids etc.) but is this a good usage
of Elasticsearch? Given that we expected to have ~ 1 billion documents (~
1.4kb each) in our first 3 months in a single index.
1.4KB is pretty small, so that's fine. Often keeping it all in ES is
simpler - doesn't require another hope to another server (e.g. a DB) to
retrieve display data, there is one moving piece fewer, which makes
everything simple. I'd keep your display data in ES and worry about
changing it later IFF you have issues.
We will use thrift to push documents in production because we've seen
a performance gain. Is there any downside of using thrift over plain json?
Some of our queries uses regexp filter. In my comprehension this needs
to load the target field of every document to see if it matches, so it's
pretty costly for an index of 1 billion docs?
Yes, regexps are not the fastest. What are you trying to do that requires
regexp filter?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.