Several questions on ES in production environment


(Han JU) #1

Hi,

We're approaching the first release of our product and we use ElasticSearch
as a key component in our system. But there's still some questions and
doubts so I'd like to listen to the more experienced users and
ElasticSearch folks here.

  1. We use ElasticSearch as a search tool but also the storage of all
    documents. It means that the front-end retrieves fields from ES just as if
    it's a database. We've already disable the index (index: no) on the fields
    that don't need to be searched (list of ids etc.) but is this a good usage
    of ElasticSearch? Given that we expected to have ~ 1 billion documents (~
    1.4kb each) in our first 3 months in a single index.

  2. We will use thrift to push documents in production because we've seen a
    performance gain. Is there any downside of using thrift over plain json?

  3. Some of our queries uses regexp filter. In my comprehension this needs
    to load the target field of every document to see if it matches, so it's
    pretty costly for an index of 1 billion docs?

We're also benchmarking our ES setup but your advice and experience are
very appreciated! Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/563d3c8f-cb3c-4c8c-a13d-2f8fcf6c42ce%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Otis Gospodnetić) #2

Hi,

On Tuesday, December 24, 2013 8:47:54 AM UTC-5, Han JU wrote:

Hi,

We're approaching the first release of our product and we use
ElasticSearch as a key component in our system. But there's still some
questions and doubts so I'd like to listen to the more experienced users
and ElasticSearch folks here.

  1. We use ElasticSearch as a search tool but also the storage of all
    documents. It means that the front-end retrieves fields from ES just as if
    it's a database. We've already disable the index (index: no) on the fields
    that don't need to be searched (list of ids etc.) but is this a good usage
    of ElasticSearch? Given that we expected to have ~ 1 billion documents (~
    1.4kb each) in our first 3 months in a single index.

1.4KB is pretty small, so that's fine. Often keeping it all in ES is
simpler - doesn't require another hope to another server (e.g. a DB) to
retrieve display data, there is one moving piece fewer, which makes
everything simple. I'd keep your display data in ES and worry about
changing it later IFF you have issues.

  1. We will use thrift to push documents in production because we've seen a
    performance gain. Is there any downside of using thrift over plain json?

  2. Some of our queries uses regexp filter. In my comprehension this needs
    to load the target field of every document to see if it matches, so it's
    pretty costly for an index of 1 billion docs?

Yes, regexps are not the fastest. What are you trying to do that requires
regexp filter?

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f26e7656-61cf-4609-8182-d7c6d406a5cc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Han JU) #3

Thanks Otis.

The reason that we use regexp filter is also the simplicity. We have a
layer that translates front-end format query to ElasticSearch query DSL.
Definitly we need use something like phrase_query and some custom analyser.

在 2013年12月25日星期三UTC+1上午4时44分11秒,Otis Gospodnetic写道:

Hi,

On Tuesday, December 24, 2013 8:47:54 AM UTC-5, Han JU wrote:

Hi,

We're approaching the first release of our product and we use
ElasticSearch as a key component in our system. But there's still some
questions and doubts so I'd like to listen to the more experienced users
and ElasticSearch folks here.

  1. We use ElasticSearch as a search tool but also the storage of all
    documents. It means that the front-end retrieves fields from ES just as if
    it's a database. We've already disable the index (index: no) on the fields
    that don't need to be searched (list of ids etc.) but is this a good usage
    of ElasticSearch? Given that we expected to have ~ 1 billion documents (~
    1.4kb each) in our first 3 months in a single index.

1.4KB is pretty small, so that's fine. Often keeping it all in ES is
simpler - doesn't require another hope to another server (e.g. a DB) to
retrieve display data, there is one moving piece fewer, which makes
everything simple. I'd keep your display data in ES and worry about
changing it later IFF you have issues.

  1. We will use thrift to push documents in production because we've seen
    a performance gain. Is there any downside of using thrift over plain json?

  2. Some of our queries uses regexp filter. In my comprehension this needs
    to load the target field of every document to see if it matches, so it's
    pretty costly for an index of 1 billion docs?

Yes, regexps are not the fastest. What are you trying to do that requires
regexp filter?

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4665964d-b3f2-42ca-980f-b543496686f5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4