Is anyway to bulk huge data to ES without rest

dancer · June 28, 2013, 4:56am

I want bulk huge data to es. but the current plugin use REST, I think it
may very good for performance. So, is any other way, such as write lucene
use MR?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · June 28, 2013, 5:33am

What about Elasticsearch Java API?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 06:56, dancer chuanhua.deng@gmail.com a écrit :

I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance. So, is any other way, such as write lucene use MR?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dancer · June 28, 2013, 6:05am

thanks your reply. Actually, java api's performance does match my need.
在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：

What about Elasticsearch Java API?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 06:56, dancer <chuanh...@gmail.com <javascript:>> a
écrit :

I want bulk huge data to es. but the current plugin use REST, I think it
may very good for performance. So, is any other way, such as write lucene
use MR?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · June 28, 2013, 6:31am

What is your need?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 08:05, dancer chuanhua.deng@gmail.com a écrit :

thanks your reply. Actually, java api's performance does match my need.
在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：

What about Elasticsearch Java API?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 06:56, dancer chuanh...@gmail.com a écrit :

I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance. So, is any other way, such as write lucene use MR?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dancer · June 28, 2013, 7:07am

high performance, and suit for bulk load.
the plugin elasticsearch-hadoop is a good way, but I don't know if it is stable
as it use REST.
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道：

What is your need?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 08:05, dancer <chuanh...@gmail.com <javascript:>> a
écrit :

thanks your reply. Actually, java api's performance does match my need.
在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：

What about Elasticsearch Java API?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 06:56, dancer chuanh...@gmail.com a écrit :

I want bulk huge data to es. but the current plugin use REST, I think it
may very good for performance. So, is any other way, such as write
lucene use MR?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · June 28, 2013, 7:08am

What is high performance?

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 28 juin 2013 à 09:07, dancer chuanhua.deng@gmail.com a écrit :

high performance, and suit for bulk load.
the plugin elasticsearch-hadoop is a good way, but I don't know if it is stable as it use REST.
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道：
What is your need?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 08:05, dancer chuanh...@gmail.com a écrit :

thanks your reply. Actually, java api's performance does match my need.
在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：
What about Elasticsearch Java API?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 06:56, dancer chuanh...@gmail.com a écrit :

I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance. So, is any other way, such as write lucene use MR?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dancer · June 28, 2013, 8:08am

hi, thanks you reply.
my high performance means I need to synchronization the difference between
to ES and other system for increment, or fix the difference for full dose
in some realtime case. but the current way use REST use many resource that
may lead to other problem,.
so I try to find a way use as hbase bulk load, I think it may have a more
high performance.
在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道：

What is high performance?
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 28 juin 2013 à 09:07, dancer <chuanh...@gmail.com <javascript:>> a
écrit :

high performance, and suit for bulk load.
the plugin elasticsearch-hadoop is a good way, but I don't know if it is stable
as it use REST.
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道：

What is your need?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 08:05, dancer chuanh...@gmail.com a écrit :

thanks your reply. Actually, java api's performance does match my need.
在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：

What about Elasticsearch Java API?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 06:56, dancer chuanh...@gmail.com a écrit :

I want bulk huge data to es. but the current plugin use REST, I think it
may very good for performance. So, is any other way, such as write
lucene use MR?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · June 28, 2013, 8:20am

My question was more "what do you mean by high perf?" 1k doc/s? 10k? 100k? 1m?
What insertion do you have right now and what is your goal?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 28 juin 2013 à 10:08, dancer chuanhua.deng@gmail.com a écrit :

hi, thanks you reply.
my high performance means I need to synchronization the difference between to ES and other system for increment, or fix the difference for full dose in some realtime case. but the current way use REST use many resource that may lead to other problem,.
so I try to find a way use as hbase bulk load, I think it may have a more high performance.
在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道：
What is high performance?

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 28 juin 2013 à 09:07, dancer chuanh...@gmail.com a écrit :

high performance, and suit for bulk load.
the plugin elasticsearch-hadoop is a good way, but I don't know if it is stable as it use REST.
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道：
What is your need?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 08:05, dancer chuanh...@gmail.com a écrit :

thanks your reply. Actually, java api's performance does match my need.
在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：
What about Elasticsearch Java API?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 06:56, dancer chuanh...@gmail.com a écrit :

I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance. So, is any other way, such as write lucene use MR?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dancer · June 28, 2013, 10:23am

concurrency 10k.
在 2013年6月28日星期五UTC+8下午4时20分21秒，David Pilato写道：

My question was more "what do you mean by high perf?" 1k doc/s? 10k? 100k?
1m?
What insertion do you have right now and what is your goal?
-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 28 juin 2013 à 10:08, dancer <chuanh...@gmail.com <javascript:>> a
écrit :

hi, thanks you reply.
my high performance means I need to synchronization the difference
between to ES and other system for increment, or fix the difference for full
dose in some realtime case. but the current way use REST use many resource
that may lead to other problem,.
so I try to find a way use as hbase bulk load, I think it may have a more
high performance.
在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道：

What is high performance?
--
David Pilato | Technical Advocate | *Elasticsearch.com http://elasticsearch.com/
*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 28 juin 2013 à 09:07, dancer chuanh...@gmail.com a écrit :

high performance, and suit for bulk load.
the plugin elasticsearch-hadoop is a good way, but I don't know if it is stable
as it use REST.
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道：

What is your need?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 08:05, dancer chuanh...@gmail.com a écrit :

thanks your reply. Actually, java api's performance does match my need.
在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：

What about Elasticsearch Java API?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 juin 2013 à 06:56, dancer chuanh...@gmail.com a écrit :

I want bulk huge data to es. but the current plugin use REST, I think
it may very good for performance. So, is any other way, such as write
lucene use MR?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · June 28, 2013, 10:37am

Not sure what you mean by "REST use many resource" - have you done some benchmarks/measures or are you simply guessing?
If you look at the way ES uses REST (and thus elasticsearch-hadoop) you'll notice that most of the payload is the actual
data that is being sent.
You can collocate ES with your data which means the network connection is local and thus even more efficient.

However all of these make sense if/when you do some benchmarks and find out they are the bottleneck in one way or the other.
As David said, do you have some requirement that isn't met? Have you done some tests and see lack of performance?

On 28/06/2013 11:08 AM, dancer wrote:

hi, thanks you reply.
my high performance means I need to synchronization the difference between to ES and other system for increment, or fix
the difference for full dose in some realtime case. but the current way use REST use many resource that may lead to
other problem,.
so I try to find a way use as hbase bulk load, I think it may have a more high performance.
在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道：

What is high performance?
-- 
*David Pilato* | /Technical Advocate/ | *Elasticsearch.com <http://Elasticsearch.com>*
@dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <https://twitter.com/elasticsearchfr> |@scrutmydocs
<https://twitter.com/scrutmydocs>



Le 28 juin 2013 à 09:07, dancer <chuanh...@gmail.com <javascript:>> a écrit :

high performance, and suit for bulk load.
the plugin elasticsearch-hadoop is a good way, but I don't know if it is stable as it use REST.
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道：

    What is your need?

    --
    David ;-)
    Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

    Le 28 juin 2013 à 08:05, dancer <chuanh...@gmail.com> a écrit :

    thanks your reply. Actually, java api's performance does match my need.
    在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：

        What about Elasticsearch Java API?

        --
        David ;-)
        Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


        Le 28 juin 2013 à 06:56, dancer <chuanh...@gmail.com> a écrit :

        I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance.
        So, is any other way, such as write lucene use MR?

        -- 
        You received this message because you are subscribed to the Google Groups "elasticsearch" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to
        elasticsearc...@googlegroups.com.
        For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.



    -- 
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to
    elasticsearc...@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dancer · June 29, 2013, 9:39am

My requirement is building the index quickly if the index didn't synchronize
with the actual data. that cannot meet.
在 2013年6月28日星期五UTC+8下午6时37分31秒，Costin Leau写道：

Not sure what you mean by "REST use many resource" - have you done some
benchmarks/measures or are you simply guessing?
If you look at the way ES uses REST (and thus elasticsearch-hadoop) you'll
notice that most of the payload is the actual
data that is being sent.
You can collocate ES with your data which means the network connection is
local and thus even more efficient.

However all of these make sense if/when you do some benchmarks and find
out they are the bottleneck in one way or the other.
As David said, do you have some requirement that isn't met? Have you done
some tests and see lack of performance?

On 28/06/2013 11:08 AM, dancer wrote:
hi, thanks you reply.
my high performance means I need to synchronization the difference
between to ES and other system for increment, or fix
the difference for full dose in some realtime case. but the current way
use REST use many resource that may lead to
other problem,.
so I try to find a way use as hbase bulk load, I think it may have a
more high performance.
在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道：
What is high performance? 
-- 
*David Pilato* | /Technical Advocate/ | *Elasticsearch.com <
http://Elasticsearch.com>*
@dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <
https://twitter.com/elasticsearchfr> |@scrutmydocs
<https://twitter.com/scrutmydocs> 



Le 28 juin 2013 à 09:07, dancer <chuanh...@gmail.com <javascript:>> 
a écrit :
high performance, and suit for bulk load. 
the plugin elasticsearch-hadoop is a good way, but I don't know if 
it is stable as it use REST.
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道： 

    What is your need? 

    -- 
    David ;-) 
    Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 

    Le 28 juin 2013 à 08:05, dancer <chuanh...@gmail.com> a écrit 
:
    thanks your reply. Actually, java api's performance does match 
my need.
    在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道： 

        What about Elasticsearch Java API? 

        -- 
        David ;-) 
        Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 


        Le 28 juin 2013 à 06:56, dancer <chuanh...@gmail.com> a 
écrit :
        I want bulk huge data to es. but the current plugin use 
REST, I think it may very good for performance.
        So, is any other way, such as write lucene use MR? 

        -- 
        You received this message because you are subscribed to 
the Google Groups "elasticsearch" group.
        To unsubscribe from this group and stop receiving emails 
from it, send an email to
        elasticsearc...@googlegroups.com. 
        For more options, visit 
https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
    -- 
    You received this message because you are subscribed to the 
Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from 
it, send an email to
    elasticsearc...@googlegroups.com. 
    For more options, visit 
https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
-- 
You received this message because you are subscribed to the Google 
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, 
send an email to
elasticsearc...@googlegroups.com <javascript:>. 
For more options, visit https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · June 29, 2013, 9:59am

I'm sorry but I don't understand the full picture here.
Could you describe a little what you are trying to do?
What is your use case? Where does your docs come from?
Can you imagine that as soon as your service layer persist a document somewhere, on the same time, you send it to Elasticsearch?

I'm speaking here about this but once again as I don't understand what you are trying to do, it's really hard to give you advices.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 29 juin 2013 à 11:39, dancer chuanhua.deng@gmail.com a écrit :

My requirement is building the index quickly if the index didn't synchronize with the actual data. that cannot meet.
在 2013年6月28日星期五UTC+8下午6时37分31秒，Costin Leau写道：

Not sure what you mean by "REST use many resource" - have you done some benchmarks/measures or are you simply guessing?
If you look at the way ES uses REST (and thus elasticsearch-hadoop) you'll notice that most of the payload is the actual
data that is being sent.
You can collocate ES with your data which means the network connection is local and thus even more efficient.

However all of these make sense if/when you do some benchmarks and find out they are the bottleneck in one way or the other.
As David said, do you have some requirement that isn't met? Have you done some tests and see lack of performance?

On 28/06/2013 11:08 AM, dancer wrote:
hi, thanks you reply.
my high performance means I need to synchronization the difference between to ES and other system for increment, or fix
the difference for full dose in some realtime case. but the current way use REST use many resource that may lead to
other problem,.
so I try to find a way use as hbase bulk load, I think it may have a more high performance.
在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道：
What is high performance? 
-- 
*David Pilato* | /Technical Advocate/ | *Elasticsearch.com <http://Elasticsearch.com>* 
@dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <https://twitter.com/elasticsearchfr> |@scrutmydocs 
<https://twitter.com/scrutmydocs> 



Le 28 juin 2013 à 09:07, dancer <chuanh...@gmail.com <javascript:>> a écrit : 
high performance, and suit for bulk load. 
the plugin elasticsearch-hadoop is a good way, but I don't know if it is stable as it use REST. 
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道： 

    What is your need? 

    -- 
    David ;-) 
    Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 

    Le 28 juin 2013 à 08:05, dancer <chuanh...@gmail.com> a écrit : 
    thanks your reply. Actually, java api's performance does match my need. 
    在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道： 

        What about Elasticsearch Java API? 

        -- 
        David ;-) 
        Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 


        Le 28 juin 2013 à 06:56, dancer <chuanh...@gmail.com> a écrit : 

        I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance. 
        So, is any other way, such as write lucene use MR? 

        -- 
        You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
        To unsubscribe from this group and stop receiving emails from it, send an email to 
        elasticsearc...@googlegroups.com. 
        For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>. 



    -- 
    You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
    To unsubscribe from this group and stop receiving emails from it, send an email to 
    elasticsearc...@googlegroups.com. 
    For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>. 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to 
elasticsearc...@googlegroups.com <javascript:>. 
For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>. 
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · June 30, 2013, 12:11am

Ok, Dancer. Here's how you can bulk-load in Java faster and more reliably
than you can possibly imagine. Faster than any other database engine I've
ever used or seen. This text was written back when I was using ES 0.20.4
but it got even faster with version 0.90.0. The result: 90 million
documents in 2 hours and 41 minutes. (And again, note that it gets
measurably faster when using 0.90)

The following is a very skeletal form of the Java-based bulk request
builder. I originally based it on one of Shay's examples. Of course, I've
added much more extensive error checking and statistics tracking to my
production version. But this is enough to give you the basic idea. I never
use curl for bulk loading anymore; doing it in Java is vastly better, there
are no curl limitations to work around, and the statistics are so much
better and more useful during huge testing runs of nearly 100 million
documents.

// Create transport client: Settings should specify cluster.name at least
TransportClient client = new TransportClient(client_settings);

// Add at least one address
InetSocketTransportAddress server_address = new
InetSocketTransportAddress(hostName, port);
client.addTransportAddress(server_address);

// Create initial bulk request builder
BulkRequestBuilder bulkRequest = client.prepareBulk();
bulkRequest.setRefresh(false);

boolean last = false;
for (;
{
// Get next line: action-and-meta-data
// Get next line: source

if (EOF)
{
last = true;
}
else
{
// Call either prepareIndex (for create, index actions),
// or prepareDelete (for delete actions) and set up the
// resulting object as required

 // Add the properly set-up object to the bulk request builder
 bulkRequest.add( resulting-object );

}

// If our bulk limit is reached, or if at the end of the input and,
// some actions remain: Send the accumulated action requests to ES
int actions = bulkRequest.numberOfActions();
if ((actions >= 4096) || (last && actions != 0)))
{
BulkResponse bulkResponse = bulkRequest.execute().actionGet();

 // Handle failures (have only seen them during testing)
 if (bulkResponse.hasFailures())
 {
    for (BulkItemResponse item : bulkResponse.items())
    {
       // Log errors; limiting them to about 128 or so
       // to keep from flooding logs if the entire bulk
       // input is bad for some reason (since I write the
       // converters, failures never happen!!!)
    }
 }

 // Create a new BulkRequestBuilder for the next iteration
 BulkRequestBuilder bulkRequest = client.prepareBulk();
 bulkRequest.setRefresh(false);

 if (last == true)
    break;

}

bulkRequest.setRefresh(true);
}

The index is configured with a 1s refresh. But during the bulk load, the
refresh rate is temporarily changed to 120s (as per a recommendation).

Previously, I had configured 16 shards for this index. But when build times
started climbing to near 4 hours (due partly to the use of the asciifolding
token filter for all of the English language string fields), I started
looking at the shard count. And I noticed that the Elasticsearch Head
interface has seriously ugly alignment issues when it tries to display
shard IDs in double digits (10 through 15). So I wonder if 99.9% of
Elasticsearch users specify less than 11 shards, and I tried an experiment
with 10 shards (IDs 0 through 9).

A serial conversion to JSON and bulk loading of 90 million records with all
index actions (some duplicates) and no delete actions now takes 2:41 (2
hours and 41 minutes). Awesome!

I had thought that Elasticsearch was slowing down as the previous runs
progressed, so I also added the ability to track the counts in each
15-minute window during the build (the size of the window is configuration,
of course!).

Starting: 4096 per bulk-load action
Running totals to be shown at every 15m interval
AT 2013-03-25T21:54:05.537Z :: WINDOW: Total=0 , create=0 , index=0
, delete=0 CURRENT: Total=0
AT 2013-03-25T22:09:05.557Z :: WINDOW: Total=14995457 , create=0 ,
index=14995457 , delete=0 CURRENT: Total=14995457
AT 2013-03-25T22:24:05.615Z :: WINDOW: Total=13635584 , create=0 ,
index=13635584 , delete=0 CURRENT: Total=28631041
AT 2013-03-25T22:39:05.792Z :: WINDOW: Total=13197312 , create=0 ,
index=13197312 , delete=0 CURRENT: Total=41828353
AT 2013-03-25T22:54:05.793Z :: WINDOW: Total=12587184 , create=0 ,
index=12587184 , delete=0 CURRENT: Total=54415537
AT 2013-03-25T23:09:06.677Z :: WINDOW: Total=6508368 , create=0 ,
index=6508368 , delete=0 CURRENT: Total=60923905
AT 2013-03-25T23:24:07.210Z :: WINDOW: Total=3436544 , create=0 ,
index=3436544 , delete=0 CURRENT: Total=64360449
AT 2013-03-25T23:39:07.288Z :: WINDOW: Total=3383296 , create=0 ,
index=3383296 , delete=0 CURRENT: Total=67743745
AT 2013-03-25T23:54:07.443Z :: WINDOW: Total=3407872 , create=0 ,
index=3407872 , delete=0 CURRENT: Total=71151617
AT 2013-03-26T00:09:08.337Z :: WINDOW: Total=3809280 , create=0 ,
index=3809280 , delete=0 CURRENT: Total=74960897
AT 2013-03-26T00:24:08.676Z :: WINDOW: Total=7581696 , create=0 ,
index=7581696 , delete=0 CURRENT: Total=82542593
AT 2013-03-26T00:35:10.585Z :: WINDOW: Total=7924767 , create=0 ,
index=7924767 , delete=0 CURRENT: Total=90467360

SUMMARY: { Total=90467360 , create=0 , index=90467360 , delete=0 }

Done: 90467360 documents in 9665.03312 seconds (02:41:05.033):
9360.274183933681 documents/second

Again, this is all done using one index so I don't need to route updates
based on the index and can just pump them through to ES. This may not be
the best strategy, but it is pushing ES in a direction that I never thought
a database could go when running just on my little old laptop with 8 GB
RAM, quad-core i7, and one relatively slow disk that is both reading the
input data and writing the ES database.

And currently using ES version 0.20.4 with Java 6 (yeah, I know. But that's
out of my control at the moment). However, it still works great! Up to 54M
documents, I was getting an index rate of about 14K documents per second;
for the full 90 million load it averages to a respectable rate of just over
9K documents per second. Ad-hoc query times seem to be better with only 10
shards than with the 16 I had been using. And query-by-id is still stellar.

Brian

On Friday, June 28, 2013 12:56:41 AM UTC-4, dancer wrote:

I want bulk huge data to es. but the current plugin use REST, I think it
may very good for performance. So, is any other way, such as write lucene
use MR?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dancer · July 6, 2013, 8:24am

HI, thanks you reply, but I don't kown why "actions >= 4096"?
在 2013年6月30日星期日UTC+8上午8时11分40秒，InquiringMind写道：

Ok, Dancer. Here's how you can bulk-load in Java faster and more reliably
than you can possibly imagine. Faster than any other database engine I've
ever used or seen. This text was written back when I was using ES 0.20.4
but it got even faster with version 0.90.0. The result: 90 million
documents in 2 hours and 41 minutes. (And again, note that it gets
measurably faster when using 0.90)

The following is a very skeletal form of the Java-based bulk request
builder. I originally based it on one of Shay's examples. Of course, I've
added much more extensive error checking and statistics tracking to my
production version. But this is enough to give you the basic idea. I never
use curl for bulk loading anymore; doing it in Java is vastly better, there
are no curl limitations to work around, and the statistics are so much
better and more useful during huge testing runs of nearly 100 million
documents.

// Create transport client: Settings should specify cluster.name at least
TransportClient client = new TransportClient(client_settings);

// Add at least one address
InetSocketTransportAddress server_address = new
InetSocketTransportAddress(hostName, port);
client.addTransportAddress(server_address);

// Create initial bulk request builder
BulkRequestBuilder bulkRequest = client.prepareBulk();
bulkRequest.setRefresh(false);

boolean last = false;
for (;
{
// Get next line: action-and-meta-data
// Get next line: source

if (EOF)
{
last = true;
}
else
{
// Call either prepareIndex (for create, index actions),
// or prepareDelete (for delete actions) and set up the
// resulting object as required
 // Add the properly set-up object to the bulk request builder
 bulkRequest.add( resulting-object );
}

// If our bulk limit is reached, or if at the end of the input and,
// some actions remain: Send the accumulated action requests to ES
int actions = bulkRequest.numberOfActions();
if ((actions >= 4096) || (last && actions != 0)))
{
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
 // Handle failures (have only seen them during testing)
 if (bulkResponse.hasFailures())
 {
    for (BulkItemResponse item : bulkResponse.items())
    {
       // Log errors; limiting them to about 128 or so
       // to keep from flooding logs if the entire bulk
       // input is bad for some reason (since I write the
       // converters, failures never happen!!!)
    }
 }

 // Create a new BulkRequestBuilder for the next iteration
 BulkRequestBuilder bulkRequest = client.prepareBulk();
 bulkRequest.setRefresh(false);

 if (last == true)
    break;
}

bulkRequest.setRefresh(true);
}

The index is configured with a 1s refresh. But during the bulk load, the
refresh rate is temporarily changed to 120s (as per a recommendation).

Previously, I had configured 16 shards for this index. But when build
times started climbing to near 4 hours (due partly to the use of the
asciifolding token filter for all of the English language string fields), I
started looking at the shard count. And I noticed that the Elasticsearch
Head interface has seriously ugly alignment issues when it tries to display
shard IDs in double digits (10 through 15). So I wonder if 99.9% of
Elasticsearch users specify less than 11 shards, and I tried an experiment
with 10 shards (IDs 0 through 9).

A serial conversion to JSON and bulk loading of 90 million records with
all index actions (some duplicates) and no delete actions now takes 2:41 (2
hours and 41 minutes). Awesome!

I had thought that Elasticsearch was slowing down as the previous runs
progressed, so I also added the ability to track the counts in each
15-minute window during the build (the size of the window is configuration,
of course!).

Starting: 4096 per bulk-load action
Running totals to be shown at every 15m interval
AT 2013-03-25T21:54:05.537Z :: WINDOW: Total=0 , create=0 , index=0
, delete=0 CURRENT: Total=0
AT 2013-03-25T22:09:05.557Z :: WINDOW: Total=14995457 , create=0 ,
index=14995457 , delete=0 CURRENT: Total=14995457
AT 2013-03-25T22:24:05.615Z :: WINDOW: Total=13635584 , create=0 ,
index=13635584 , delete=0 CURRENT: Total=28631041
AT 2013-03-25T22:39:05.792Z :: WINDOW: Total=13197312 , create=0 ,
index=13197312 , delete=0 CURRENT: Total=41828353
AT 2013-03-25T22:54:05.793Z :: WINDOW: Total=12587184 , create=0 ,
index=12587184 , delete=0 CURRENT: Total=54415537
AT 2013-03-25T23:09:06.677Z :: WINDOW: Total=6508368 , create=0 ,
index=6508368 , delete=0 CURRENT: Total=60923905
AT 2013-03-25T23:24:07.210Z :: WINDOW: Total=3436544 , create=0 ,
index=3436544 , delete=0 CURRENT: Total=64360449
AT 2013-03-25T23:39:07.288Z :: WINDOW: Total=3383296 , create=0 ,
index=3383296 , delete=0 CURRENT: Total=67743745
AT 2013-03-25T23:54:07.443Z :: WINDOW: Total=3407872 , create=0 ,
index=3407872 , delete=0 CURRENT: Total=71151617
AT 2013-03-26T00:09:08.337Z :: WINDOW: Total=3809280 , create=0 ,
index=3809280 , delete=0 CURRENT: Total=74960897
AT 2013-03-26T00:24:08.676Z :: WINDOW: Total=7581696 , create=0 ,
index=7581696 , delete=0 CURRENT: Total=82542593
AT 2013-03-26T00:35:10.585Z :: WINDOW: Total=7924767 , create=0 ,
index=7924767 , delete=0 CURRENT: Total=90467360

SUMMARY: { Total=90467360 , create=0 , index=90467360 , delete=0 }

Done: 90467360 documents in 9665.03312 seconds (02:41:05.033):
9360.274183933681 documents/second

Again, this is all done using one index so I don't need to route updates
based on the index and can just pump them through to ES. This may not be
the best strategy, but it is pushing ES in a direction that I never thought
a database could go when running just on my little old laptop with 8 GB
RAM, quad-core i7, and one relatively slow disk that is both reading the
input data and writing the ES database.

And currently using ES version 0.20.4 with Java 6 (yeah, I know. But
that's out of my control at the moment). However, it still works great! Up
to 54M documents, I was getting an index rate of about 14K documents per
second; for the full 90 million load it averages to a respectable rate of
just over 9K documents per second. Ad-hoc query times seem to be better
with only 10 shards than with the 16 I had been using. And query-by-id is
still stellar.

Brian

On Friday, June 28, 2013 12:56:41 AM UTC-4, dancer wrote:

I want bulk huge data to es. but the current plugin use REST, I think it
may very good for performance. So, is any other way, such as write
lucene use MR?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dancer · July 6, 2013, 8:28am

hi, thanks for your reply.
I'm sorry for my express. I just want save the index data to ES while my
data write to hbase.
but If the hbase and ES was unsynchronization, I need load all my data to
ES from ES.
在 2013年6月29日星期六UTC+8下午5时59分28秒，David Pilato写道：

I'm sorry but I don't understand the full picture here.
Could you describe a little what you are trying to do?
What is your use case? Where does your docs come from?
Can you imagine that as soon as your service layer persist a document
somewhere, on the same time, you send it to Elasticsearch?

I'm speaking here about this but once again as I don't understand what you
are trying to do, it's really hard to give you advices.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 29 juin 2013 à 11:39, dancer <chuanh...@gmail.com <javascript:>> a
écrit :

My requirement is building the index quickly if the index didn't synchronize
with the actual data. that cannot meet.
在 2013年6月28日星期五UTC+8下午6时37分31秒，Costin Leau写道：
Not sure what you mean by "REST use many resource" - have you done some
benchmarks/measures or are you simply guessing?
If you look at the way ES uses REST (and thus elasticsearch-hadoop)
you'll notice that most of the payload is the actual
data that is being sent.
You can collocate ES with your data which means the network connection is
local and thus even more efficient.

However all of these make sense if/when you do some benchmarks and find
out they are the bottleneck in one way or the other.
As David said, do you have some requirement that isn't met? Have you done
some tests and see lack of performance?

On 28/06/2013 11:08 AM, dancer wrote:
hi, thanks you reply.
my high performance means I need to synchronization the difference
between to ES and other system for increment, or fix
the difference for full dose in some realtime case. but the current way
use REST use many resource that may lead to
other problem,.
so I try to find a way use as hbase bulk load, I think it may have a
more high performance.
在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道：
What is high performance? 
-- 
*David Pilato* | /Technical Advocate/ | *Elasticsearch.com <
http://Elasticsearch.com>*
@dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <
https://twitter.com/elasticsearchfr> |@scrutmydocs
<https://twitter.com/scrutmydocs> 



Le 28 juin 2013 à 09:07, dancer <chuanh...@gmail.com<javascript:>> a écrit : 
high performance, and suit for bulk load. 
the plugin elasticsearch-hadoop is a good way, but I don't know if 
it is stable as it use REST.
在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道： 

    What is your need? 

    -- 
    David ;-) 
    Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 

    Le 28 juin 2013 à 08:05, dancer <chuanh...@gmail.com> a écrit 
:
    thanks your reply. Actually, java api's performance does 
match my need.
    在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道： 

        What about Elasticsearch Java API? 

        -- 
        David ;-) 
        Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 


        Le 28 juin 2013 à 06:56, dancer <chuanh...@gmail.com> a 
écrit :
        I want bulk huge data to es. but the current plugin use 
REST, I think it may very good for performance.
        So, is any other way, such as write lucene use MR? 

        -- 
        You received this message because you are subscribed to 
the Google Groups "elasticsearch" group.
        To unsubscribe from this group and stop receiving emails 
from it, send an email to
        elasticsearc...@googlegroups.com. 
        For more options, visit 
https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
    -- 
    You received this message because you are subscribed to the 
Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from 
it, send an email to
    elasticsearc...@googlegroups.com. 
    For more options, visit 
https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
-- 
You received this message because you are subscribed to the Google 
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, 
send an email to
elasticsearc...@googlegroups.com <javascript:>. 
For more options, visit https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · July 6, 2013, 9:14am

For what is worth, we're planning on adding integration with HBase in Elasticsearch-Hadoop however there's nothing yet
at this point to share with the public.
We hope to have something soon on github.

Cheers,

On 06/07/2013 11:28 AM, dancer wrote:

hi, thanks for your reply.
I'm sorry for my express. I just want save the index data to ES while my data write to hbase.
but If the hbase and ES was unsynchronization, I need load all my data to ES from ES.
在 2013年6月29日星期六UTC+8下午5时59分28秒，David Pilato写道：

I'm sorry but I don't understand the full picture here.
Could you describe a little what you are trying to do?
What is your use case? Where does your docs come from?
Can you imagine that as soon as your service layer persist a document somewhere, on the same time, you send it to
Elasticsearch?

I'm speaking here about this but once again as I don't understand what you are trying to do, it's really hard to
give you advices.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 29 juin 2013 à 11:39, dancer <chuanh...@gmail.com <javascript:>> a écrit :

My requirement is building the index quickly if the index didn't synchronize with the actual data. that cannot meet.
在 2013年6月28日星期五UTC+8下午6时37分31秒，Costin Leau写道：

    Not sure what you mean by "REST use many resource" - have you done some benchmarks/measures or are you simply
    guessing?
    If you look at the way ES uses REST (and thus elasticsearch-hadoop) you'll notice that most of the payload is
    the actual
    data that is being sent.
    You can collocate ES with your data which means the network connection is local and thus even more efficient.

    However all of these make sense if/when you do some benchmarks and find out they are the bottleneck in one way
    or the other.
    As David said, do you have some requirement that isn't met? Have you done some tests and see lack of performance?

    On 28/06/2013 11:08 AM, dancer wrote:
    > hi, thanks you reply.
    > my high performance means I need to synchronization the difference between to ES and other system for increment, or fix
    > the difference for full dose in some realtime case. but the current way use REST use many  resource that may lead to
    > other problem,.
    >   so I try to find a way use as hbase bulk load, I think it may have a more high performance.
    > 在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道：
    >
    >     What is high performance?
    >     --
    >     *David Pilato* | /Technical Advocate/ | *Elasticsearch.com <http://Elasticsearch.com> <http://Elasticsearch.com>*
    >     @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <https://twitter.com/elasticsearchfr
    <https://twitter.com/elasticsearchfr>> |@scrutmydocs
    >     <https://twitter.com/scrutmydocs <https://twitter.com/scrutmydocs>>
    >
    >
    >
    >     Le 28 juin 2013 à 09:07, dancer <chuanh...@gmail.com <javascript:>> a écrit :
    >
    >>     high performance, and suit for bulk load.
    >>     the plugin elasticsearch-hadoop is a good way, but I don't know if it is stable as it use REST.
    >>     在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道：
    >>
    >>         What is your need?
    >>
    >>         --
    >>         David ;-)
    >>         Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
    >>
    >>         Le 28 juin 2013 à 08:05, dancer <chuanh...@gmail.com> a écrit :
    >>
    >>>         thanks your reply. Actually, java api's performance does match my need.
    >>>         在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道：
    >>>
    >>>             What about Elasticsearch Java API?
    >>>
    >>>             --
    >>>             David ;-)
    >>>             Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
    >>>
    >>>
    >>>             Le 28 juin 2013 à 06:56, dancer <chuanh...@gmail.com> a écrit :
    >>>
    >>>             I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance.
    >>>             So, is any other way, such as write lucene use MR?
    >>>
    >>>             --
    >>>             You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    >>>             To unsubscribe from this group and stop receiving emails from it, send an email to
    >>>elasticsearc...@googlegroups.com.
    >>>             For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>
    <https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>>.
    >>>
    >>>
    >>>
    >>>         --
    >>>         You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    >>>         To unsubscribe from this group and stop receiving emails from it, send an email to
    >>>elasticsearc...@googlegroups.com.
    >>>         For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>
    <https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>>.
    >>>
    >>>
    >>
    >>     --
    >>     You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    >>     To unsubscribe from this group and stop receiving emails from it, send an email to
    >>elasticsearc...@googlegroups.com <javascript:>.
    >>     For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>
    <https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>>.
    >>
    >>
    >
    > --
    > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    > To unsubscribe from this group and stop receiving emails from it, send an email to
    >elasticsearc...@googlegroups.com.
    > For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.
    >
    >

    --
    Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com
<javascript:>.
For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dancer · July 6, 2013, 9:28am

There is many problem to solve. especially the failure between two system.

在 2013年7月6日星期六UTC+8下午5时14分19秒，Costin Leau写道：

For what is worth, we're planning on adding integration with HBase in
Elasticsearch-Hadoop however there's nothing yet
at this point to share with the public.
We hope to have something soon on github.

Cheers,

On 06/07/2013 11:28 AM, dancer wrote:
hi, thanks for your reply.
I'm sorry for my express. I just want save the index data to ES while my
data write to hbase.
but If the hbase and ES was unsynchronization, I need load all my data
to ES from ES.
在 2013年6月29日星期六UTC+8下午5时59分28秒，David Pilato写道：
I'm sorry but I don't understand the full picture here. 
Could you describe a little what you are trying to do? 
What is your use case? Where does your docs come from? 
Can you imagine that as soon as your service layer persist a 
document somewhere, on the same time, you send it to
Elasticsearch? 

I'm speaking here about this but once again as I don't understand 
what you are trying to do, it's really hard to
give you advices. 

-- 
David ;-) 
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 


Le 29 juin 2013 à 11:39, dancer <chuanh...@gmail.com <javascript:>> 
a écrit :
My requirement is building the index quickly if the index didn't 
synchronize with the actual data. that cannot meet.
在 2013年6月28日星期五UTC+8下午6时37分31秒，Costin Leau写道： 

    Not sure what you mean by "REST use many resource" - have you 
done some benchmarks/measures or are you simply
    guessing? 
    If you look at the way ES uses REST (and thus 
elasticsearch-hadoop) you'll notice that most of the payload is
    the actual 
    data that is being sent. 
    You can collocate ES with your data which means the network 
connection is local and thus even more efficient.
    However all of these make sense if/when you do some benchmarks 
and find out they are the bottleneck in one way
    or the other. 
    As David said, do you have some requirement that isn't met? Have 
you done some tests and see lack of performance?
    On 28/06/2013 11:08 AM, dancer wrote: 
    > hi, thanks you reply. 
    > my high performance means I need to synchronization the 
difference between to ES and other system for increment, or fix
    > the difference for full dose in some realtime case. but the 
current way use REST use many resource that may lead to
    > other problem,. 
    >   so I try to find a way use as hbase bulk load, I think it 
may have a more high performance.
    > 在 2013年6月28日星期五UTC+8下午3时08分14秒，David Pilato写道： 
    > 
    >     What is high performance? 
    >     -- 
    >     *David Pilato* | /Technical Advocate/ | *Elasticsearch.com 
http://Elasticsearch.com http://Elasticsearch.com*
    >     @dadoonet <https://twitter.com/dadoonet> | 
@elasticsearchfr <https://twitter.com/elasticsearchfr
    <https://twitter.com/elasticsearchfr>> |@scrutmydocs 
    >     <https://twitter.com/scrutmydocs <
https://twitter.com/scrutmydocs>>
    > 
    > 
    > 
    >     Le 28 juin 2013 à 09:07, dancer <chuanh...@gmail.com<javascript:>> a écrit : 
    > 
    >>     high performance, and suit for bulk load. 
    >>     the plugin elasticsearch-hadoop is a good way, but I 
don't know if it is stable as it use REST.
    >>     在 2013年6月28日星期五UTC+8下午2时31分16秒，David Pilato写道： 
    >> 
    >>         What is your need? 
    >> 
    >>         -- 
    >>         David ;-) 
    >>         Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 
    >> 
    >>         Le 28 juin 2013 à 08:05, dancer <chuanh...@gmail.com> 
a écrit :
    >> 
    >>>         thanks your reply. Actually, java api's performance 
does match my need.
    >>>         在 2013年6月28日星期五UTC+8下午1时33分52秒，David Pilato写道： 
    >>> 
    >>>             What about Elasticsearch Java API? 
    >>> 
    >>>             -- 
    >>>             David ;-) 
    >>>             Twitter : @dadoonet / @elasticsearchfr / 
@scrutmydocs
    >>> 
    >>> 
    >>>             Le 28 juin 2013 à 06:56, dancer <
chuanh...@gmail.com> a écrit :
    >>> 
    >>>             I want bulk huge data to es. but the current 
plugin use REST, I think it may very good for performance.
    >>>             So, is any other way, such as write lucene use 
MR?
    >>> 
    >>>             -- 
    >>>             You received this message because you are 
subscribed to the Google Groups "elasticsearch" group.
    >>>             To unsubscribe from this group and stop 
receiving emails from it, send an email to
    >>>elasticsearc...@googlegroups.com. 
    >>>             For more options, visithttps://
groups.google.com/groups/opt_out https://groups.google.com/groups/opt_out
    <https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>>.
    >>> 
    >>> 
    >>> 
    >>>         -- 
    >>>         You received this message because you are subscribed 
to the Google Groups "elasticsearch" group.
    >>>         To unsubscribe from this group and stop receiving 
emails from it, send an email to
    >>>elasticsearc...@googlegroups.com. 
    >>>         For more options, visithttps://
groups.google.com/groups/opt_out https://groups.google.com/groups/opt_out
    <https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>>.
    >>> 
    >>> 
    >> 
    >>     -- 
    >>     You received this message because you are subscribed to 
the Google Groups "elasticsearch" group.
    >>     To unsubscribe from this group and stop receiving emails 
from it, send an email to
    >>elasticsearc...@googlegroups.com <javascript:>. 
    >>     For more options, visithttps://
groups.google.com/groups/opt_out https://groups.google.com/groups/opt_out
    <https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>>.
    >> 
    >> 
    > 
    > -- 
    > You received this message because you are subscribed to the 
Google Groups "elasticsearch" group.
    > To unsubscribe from this group and stop receiving emails from 
it, send an email to
    >elasticsearc...@googlegroups.com. 
    > For more options, visithttps://
groups.google.com/groups/opt_out https://groups.google.com/groups/opt_out.
    > 
    > 

    -- 
    Costin 

-- 
You received this message because you are subscribed to the Google 
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, 
send an email to elasticsearc...@googlegroups.com
<javascript:>. 
For more options, visit https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Looking for advice on bulk loading Elasticsearch	5	990	February 18, 2013
Elasticsearch Performance Analysis Elasticsearch	5	904	March 12, 2014
Not getting Good Write Performance Elasticsearch	5	448	January 28, 2013
Issue Indexing 50mil Docs via Bulk API Elasticsearch	22	2542	July 30, 2015
Improving Bulk Indexing Elasticsearch	11	4812	February 5, 2014

Is anyway to bulk huge data to ES without rest

I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance. So, is any other way, such as write lucene use MR?

I want bulk huge data to es. but the current plugin use REST, I think it may very good for performance. So, is any other way, such as write lucene use MR?

What is high performance?

SUMMARY: { Total=90467360 , create=0 , index=90467360 , delete=0 }

SUMMARY: { Total=90467360 , create=0 , index=90467360 , delete=0 }

Related topics