Elastic search with lots of small writes

I had the great idea (and sometimes i regret it) of using elasticsearch as
the primary (and only) db for a web project I am working on. It has been
great with the large reads and writes (a few kb a document), But there are
a lot of small metrics that I also want to be able to store and i am not
sure how elasticsearch likes small data sets, or is it better to add
another database for working with these small data sets that are going to
need complex querys to pull out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

There's an overhead (especially cause by transport) caused by having many
small writes. But that's the case for any data store.

On the other hand, indexing in Elasticsearch is pretty async. It uses a
transaction log to make sure it doesn't commit for every operation:

For indexing performance, it's important to know how often you want the
indexed data to be available for search. By default, it's refreshed every
second, but you can change refresh_interval in the index settings:

Or in the Elasticsearch configuration:

But the biggest gain you'll probably get is to use the bulk API to combine
multiple small writes in the same chunk of transport:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Feb 10, 2013 at 10:49 AM, Wojons Tech wojonstech@gmail.com wrote:

I had the great idea (and sometimes i regret it) of using elasticsearch as
the primary (and only) db for a web project I am working on. It has been
great with the large reads and writes (a few kb a document), But there are
a lot of small metrics that I also want to be able to store and i am not
sure how elasticsearch likes small data sets, or is it better to add
another database for working with these small data sets that are going to
need complex querys to pull out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Radu,

Thank you very much, i was not thinking about those features that i have
read about, I am using the php driver (well currently writing a php driver
at the same time as i am working on my project. I will need to ad bulk
writes to. How much extra perfomance do you think one could gain changing
the index to happen every 5 seconds vs every second, the way this
application is made i could almost wait an entire minute before the index
needs to refresh.

On Sun, Feb 10, 2013 at 2:19 AM, Radu Gheorghe
radu.gheorghe@sematext.comwrote:

Hello,

There's an overhead (especially cause by transport) caused by having many
small writes. But that's the case for any data store.

On the other hand, indexing in Elasticsearch is pretty async. It uses a
transaction log to make sure it doesn't commit for every operation:
Elasticsearch Platform — Find real-time answers at scale | Elastic

For indexing performance, it's important to know how often you want the
indexed data to be available for search. By default, it's refreshed every
second, but you can change refresh_interval in the index settings:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Or in the Elasticsearch configuration:
Elasticsearch Platform — Find real-time answers at scale | Elastic

But the biggest gain you'll probably get is to use the bulk API to combine
multiple small writes in the same chunk of transport:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Feb 10, 2013 at 10:49 AM, Wojons Tech wojonstech@gmail.comwrote:

I had the great idea (and sometimes i regret it) of using elasticsearch
as the primary (and only) db for a web project I am working on. It has been
great with the large reads and writes (a few kb a document), But there are
a lot of small metrics that I also want to be able to store and i am not
sure how elasticsearch likes small data sets, or is it better to add
another database for working with these small data sets that are going to
need complex querys to pull out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Enjoy,
Alexis Okuwa
WojonsTech
424.835.1223

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Honestly, I'd just start throwing data at ES and worry about optimization
when you feel that the QPS is not keeping up with your desired rate. You'd
be surprised with how much ES can handle before touching any of the
settings.

The Refresh operation is actually pretty light, so you probably won't get a
huge gain there. The Bulk API would be the biggest performance gain.
Another source of performance gain is ensuring that your PHP client
uses persistent HTTP connections. Building and tearing down an HTTP
connection for every request is expensive and slow, for both your client
and for ES.

-Zach

On Sunday, February 10, 2013 6:09:55 AM UTC-5, Wojons Tech wrote:

Radu,

Thank you very much, i was not thinking about those features that i have
read about, I am using the php driver (well currently writing a php driver
at the same time as i am working on my project. I will need to ad bulk
writes to. How much extra perfomance do you think one could gain changing
the index to happen every 5 seconds vs every second, the way this
application is made i could almost wait an entire minute before the index
needs to refresh.

On Sun, Feb 10, 2013 at 2:19 AM, Radu Gheorghe <radu.g...@sematext.com<javascript:>

wrote:

Hello,

There's an overhead (especially cause by transport) caused by having many
small writes. But that's the case for any data store.

On the other hand, indexing in Elasticsearch is pretty async. It uses a
transaction log to make sure it doesn't commit for every operation:
Elasticsearch Platform — Find real-time answers at scale | Elastic

For indexing performance, it's important to know how often you want the
indexed data to be available for search. By default, it's refreshed every
second, but you can change refresh_interval in the index settings:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Or in the Elasticsearch configuration:
Elasticsearch Platform — Find real-time answers at scale | Elastic

But the biggest gain you'll probably get is to use the bulk API to
combine multiple small writes in the same chunk of transport:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Feb 10, 2013 at 10:49 AM, Wojons Tech <wojon...@gmail.com<javascript:>

wrote:

I had the great idea (and sometimes i regret it) of using elasticsearch
as the primary (and only) db for a web project I am working on. It has been
great with the large reads and writes (a few kb a document), But there are
a lot of small metrics that I also want to be able to store and i am not
sure how elasticsearch likes small data sets, or is it better to add
another database for working with these small data sets that are going to
need complex querys to pull out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Enjoy,
Alexis Okuwa
WojonsTech
424.835.1223

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Okay thank you i was not sure how flexible and durable es was I am comming
out of being a mongodb dba and that's a whole other ball game. When you say
persestant connection you mean like httpkeepalive?
On Feb 10, 2013 6:52 AM, "Zachary Tong" zacharyjtong@gmail.com wrote:

Honestly, I'd just start throwing data at ES and worry about optimization
when you feel that the QPS is not keeping up with your desired rate. You'd
be surprised with how much ES can handle before touching any of the
settings.

The Refresh operation is actually pretty light, so you probably won't get
a huge gain there. The Bulk API would be the biggest performance gain.
Another source of performance gain is ensuring that your PHP client
uses persistent HTTP connections. Building and tearing down an HTTP
connection for every request is expensive and slow, for both your client
and for ES.

-Zach

On Sunday, February 10, 2013 6:09:55 AM UTC-5, Wojons Tech wrote:

Radu,

Thank you very much, i was not thinking about those features that i have
read about, I am using the php driver (well currently writing a php driver
at the same time as i am working on my project. I will need to ad bulk
writes to. How much extra perfomance do you think one could gain changing
the index to happen every 5 seconds vs every second, the way this
application is made i could almost wait an entire minute before the index
needs to refresh.

On Sun, Feb 10, 2013 at 2:19 AM, Radu Gheorghe radu.g...@sematext.comwrote:

Hello,

There's an overhead (especially cause by transport) caused by having
many small writes. But that's the case for any data store.

On the other hand, indexing in Elasticsearch is pretty async. It uses a
transaction log to make sure it doesn't commit for every operation:
Elasticsearch Platform — Find real-time answers at scale | Elastic**
translog.htmlhttp://www.elasticsearch.org/guide/reference/index-modules/translog.html

For indexing performance, it's important to know how often you want the
indexed data to be available for search. By default, it's refreshed every
second, but you can change refresh_interval in the index settings:
Elasticsearch Platform — Find real-time answers at scale | Elastic**
indices-update-settings.htmlhttp://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

Or in the Elasticsearch configuration:
Elasticsearch Platform — Find real-time answers at scale | Elastichttp://www.elasticsearch.org/guide/reference/index-modules/

But the biggest gain you'll probably get is to use the bulk API to
combine multiple small writes in the same chunk of transport:
Elasticsearch Platform — Find real-time answers at scale | Elastichttp://www.elasticsearch.org/guide/reference/api/bulk.html

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Feb 10, 2013 at 10:49 AM, Wojons Tech wojon...@gmail.comwrote:

I had the great idea (and sometimes i regret it) of using elasticsearch
as the primary (and only) db for a web project I am working on. It has been
great with the large reads and writes (a few kb a document), But there are
a lot of small metrics that I also want to be able to store and i am not
sure how elasticsearch likes small data sets, or is it better to add
another database for working with these small data sets that are going to
need complex querys to pull out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Enjoy,
Alexis Okuwa
WojonsTech
424.835.1223

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Also wanted to know if it's better if I keep this this small dataset its
own index or type
On Feb 10, 2013 11:09 AM, "Wojons Tech" wojonstech@gmail.com wrote:

Okay thank you i was not sure how flexible and durable es was I am comming
out of being a mongodb dba and that's a whole other ball game. When you say
persestant connection you mean like httpkeepalive?
On Feb 10, 2013 6:52 AM, "Zachary Tong" zacharyjtong@gmail.com wrote:

Honestly, I'd just start throwing data at ES and worry about optimization
when you feel that the QPS is not keeping up with your desired rate. You'd
be surprised with how much ES can handle before touching any of the
settings.

The Refresh operation is actually pretty light, so you probably won't get
a huge gain there. The Bulk API would be the biggest performance gain.
Another source of performance gain is ensuring that your PHP client
uses persistent HTTP connections. Building and tearing down an HTTP
connection for every request is expensive and slow, for both your client
and for ES.

-Zach

On Sunday, February 10, 2013 6:09:55 AM UTC-5, Wojons Tech wrote:

Radu,

Thank you very much, i was not thinking about those features that i have
read about, I am using the php driver (well currently writing a php driver
at the same time as i am working on my project. I will need to ad bulk
writes to. How much extra perfomance do you think one could gain changing
the index to happen every 5 seconds vs every second, the way this
application is made i could almost wait an entire minute before the index
needs to refresh.

On Sun, Feb 10, 2013 at 2:19 AM, Radu Gheorghe radu.g...@sematext.comwrote:

Hello,

There's an overhead (especially cause by transport) caused by having
many small writes. But that's the case for any data store.

On the other hand, indexing in Elasticsearch is pretty async. It uses a
transaction log to make sure it doesn't commit for every operation:
Elasticsearch Platform — Find real-time answers at scale | Elastic**
translog.htmlhttp://www.elasticsearch.org/guide/reference/index-modules/translog.html

For indexing performance, it's important to know how often you want the
indexed data to be available for search. By default, it's refreshed every
second, but you can change refresh_interval in the index settings:
Elasticsearch Platform — Find real-time answers at scale | Elastic**
indices-update-settings.htmlhttp://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

Or in the Elasticsearch configuration:
Elasticsearch Platform — Find real-time answers at scale | Elastichttp://www.elasticsearch.org/guide/reference/index-modules/

But the biggest gain you'll probably get is to use the bulk API to
combine multiple small writes in the same chunk of transport:
Elasticsearch Platform — Find real-time answers at scale | Elastichttp://www.elasticsearch.org/guide/reference/api/bulk.html

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Feb 10, 2013 at 10:49 AM, Wojons Tech wojon...@gmail.comwrote:

I had the great idea (and sometimes i regret it) of using
elasticsearch as the primary (and only) db for a web project I am working
on. It has been great with the large reads and writes (a few kb a
document), But there are a lot of small metrics that I also want to be able
to store and i am not sure how elasticsearch likes small data sets, or is
it better to add another database for working with these small data sets
that are going to need complex querys to pull out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Enjoy,
Alexis Okuwa
WojonsTech
424.835.1223

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Sun, 2013-02-10 at 11:14 -0800, Wojons Tech wrote:

Also wanted to know if it's better if I keep this this small dataset
its own index or type

That would be a good idea - as you index more data, your segments will
merge, which can use quite a bit of IO. So better not to mix this
frequently data with data which changes less often.

Also, if you have segments which consist entirely of deleted (or
reindexed) documents can just be dropped, without rewriting them

clint

On Feb 10, 2013 11:09 AM, "Wojons Tech" wojonstech@gmail.com wrote:
Okay thank you i was not sure how flexible and durable es was
I am comming out of being a mongodb dba and that's a whole
other ball game. When you say persestant connection you mean
like httpkeepalive?

    On Feb 10, 2013 6:52 AM, "Zachary Tong"
    <zacharyjtong@gmail.com> wrote:
            Honestly, I'd just start throwing data at ES and worry
            about optimization when you feel that the QPS is not
            keeping up with your desired rate.  You'd be surprised
            with how much ES can handle before touching any of the
            settings.
            
            
            The Refresh operation is actually pretty light, so you
            probably won't get a huge gain there.  The Bulk API
            would be the biggest performance gain.  Another source
            of performance gain is ensuring that your PHP client
            uses persistent HTTP connections.  Building and
            tearing down an HTTP connection for every request is
            expensive and slow, for both your client and for ES.
            
            
            -Zach
            
            
            
            
            On Sunday, February 10, 2013 6:09:55 AM UTC-5, Wojons
            Tech wrote:
                    Radu,
                    
                    
                    Thank you very much, i was not thinking about
                    those features that i have read about, I am
                    using the php driver (well currently writing a
                    php driver at the same time as i am working on
                    my project. I will need to ad bulk writes to.
                    How much extra perfomance do you think one
                    could gain changing the index to happen every
                    5 seconds vs every second, the way this
                    application is made i could almost wait an
                    entire minute before the index needs to
                    refresh.
                    
                    On Sun, Feb 10, 2013 at 2:19 AM, Radu Gheorghe
                    <radu.g...@sematext.com> wrote:
                            Hello,
                            
                            
                            There's an overhead (especially cause
                            by transport) caused by having many
                            small writes. But that's the case for
                            any data store.
                            
                            
                            On the other hand, indexing in
                            Elasticsearch is pretty async. It uses
                            a transaction log to make sure it
                            doesn't commit for every operation:
                            http://www.elasticsearch.org/guide/reference/index-modules/translog.html
                            
                            
                            For indexing performance, it's
                            important to know how often you want
                            the indexed data to be available for
                            search. By default, it's refreshed
                            every second, but you can change
                            refresh_interval in the index
                            settings:
                            http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html
                            
                            Or in the Elasticsearch configuration:
                            http://www.elasticsearch.org/guide/reference/index-modules/
                            
                            
                            But the biggest gain you'll probably
                            get is to use the bulk API to combine
                            multiple small writes in the same
                            chunk of transport:
                            http://www.elasticsearch.org/guide/reference/api/bulk.html
                            
                            
                            Best regards,
                            Radu
                            -- 
                            http://sematext.com/ -- ElasticSearch
                            -- Solr -- Lucene 
                            
                            On Sun, Feb 10, 2013 at 10:49 AM,
                            Wojons Tech <wojon...@gmail.com>
                            wrote:
                                    I had the great idea (and
                                    sometimes i regret it) of
                                    using elasticsearch as the
                                    primary (and only) db for a
                                    web project I am working on.
                                    It has been great with the
                                    large reads and writes (a few
                                    kb a document), But there are
                                    a lot of small metrics that I
                                    also want to be able to store
                                    and i am not sure how
                                    elasticsearch likes small data
                                    sets, or is it better to add
                                    another database for working
                                    with these small data sets
                                    that are going to need complex
                                    querys to pull out. 
                                    
                                    -- 
                                    You received this message
                                    because you are subscribed to
                                    the Google Groups
                                    "elasticsearch" group.
                                    To unsubscribe from this group
                                    and stop receiving emails from
                                    it, send an email to
                                    elasticsearc...@googlegroups.com.
                                    For more options, visit
                                    https://groups.google.com/groups/opt_out.
                                     
                                     
                            -- 
                            You received this message because you
                            are subscribed to the Google Groups
                            "elasticsearch" group.
                            To unsubscribe from this group and
                            stop receiving emails from it, send an
                            email to
                            elasticsearc...@googlegroups.com.
                            For more options, visit
                            https://groups.google.com/groups/opt_out.
                             
                             
                            
                    
                    
                    
                    
                    -- 
                    Enjoy,
                    Alexis Okuwa
                    
                    WojonsTech
                    424.835.1223

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.