Timeout while indexing 1gb document

JohnB1 · May 14, 2013, 12:35pm

Hello,
I'm pretty new to elastic search and I was doing some load tests. One of
the tests is to write a 1GB document which fails with the following error
message:

14-May-2013 14:44:44 org.elasticsearch.plugins
INFO: [Aegis] loaded [], sites []
14-May-2013 15:28:34 org.elasticsearch.client.transport
INFO: [Aegis] failed to get node info for
[#transport#-1][inet[/172.16.65.22:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[][inet[/172.16.65.22:9300]][cluster/nodes/info] request_id [522] timed out
after [10846ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:342)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246)
at
org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:84)
at
org.elasticsearch.transport.TransportService$Adapter$2$1.run(TransportService.java:305)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I'm using one elasticsearch node and the Java client and I've increased the
heap size, I've set the timeout parameter for prepareindexand also I've set
the http.max_content_length: 2048mb and client.transport.ping_timeout:
3600s in the elasticsearch.yaml configuration file.

Could you please tell what else can I do in order to insert that 1gb
document?
Thanks!

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · May 14, 2013, 1:54pm

John, can you elaborate why you want to index a 1GB document? I mean this
seems to be pretty nasty big, r u indexing some bio data? Maybe you should
think of breaking the data up a bit? What is the usecase of such a massive
doc.

simon
On Tuesday, May 14, 2013 2:35:30 PM UTC+2, Ioan Badarinza wrote:

Hello,
I'm pretty new to Elasticsearch and I was doing some load tests. One of
the tests is to write a 1GB document which fails with the following error
message:

14-May-2013 14:44:44 org.elasticsearch.plugins
INFO: [Aegis] loaded , sites
14-May-2013 15:28:34 org.elasticsearch.client.transport
INFO: [Aegis] failed to get node info for
[#transport#-1][inet[/172.16.65.22:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[inet[/172.16.65.22:9300]][cluster/nodes/info] request_id [522] timed out
after [10846ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:342)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246)
at
org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:84)
at
org.elasticsearch.transport.TransportService$Adapter$2$1.run(TransportService.java:305)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I'm using one elasticsearch node and the Java client and I've increased
the heap size, I've set the timeout parameter for prepareindexand also I've
set the http.max_content_length: 2048mb and client.transport.ping_timeout:
3600s in the elasticsearch.yaml configuration file.

Could you please tell what else can I do in order to insert that 1gb
document?
Thanks!

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

JohnB1 · May 14, 2013, 1:59pm

Hi Simon,
I'm just doing some load tests for elasticsearch and one of the tests is
about indexing 1gb document and it fails.
Thanks for replaying!

John

On Tue, May 14, 2013 at 4:54 PM, simonw
simon.willnauer@elasticsearch.comwrote:

John, can you elaborate why you want to index a 1GB document? I mean this
seems to be pretty nasty big, r u indexing some bio data? Maybe you should
think of breaking the data up a bit? What is the usecase of such a massive
doc.

simon

On Tuesday, May 14, 2013 2:35:30 PM UTC+2, Ioan Badarinza wrote:

Hello,
I'm pretty new to Elasticsearch and I was doing some load tests. One of
the tests is to write a 1GB document which fails with the following error
message:

14-May-2013 14:44:44 org.elasticsearch.plugins
INFO: [Aegis] loaded , sites
14-May-2013 15:28:34 org.elasticsearch.client.**transport
INFO: [Aegis] failed to get node info for [#transport#-1][inet[/172.16.**65.22:9300]],
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[inet[/172.16.65.22:9300]][**cluster/nodes/info] request_id [522]
timed out after [10846ms]
at org.elasticsearch.transport.**TransportService$TimeoutHandler.run(
TransportService.java:342)
at java.util.concurrent.ThreadPoolExecutor$Worker.
runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

org.elasticsearch.client.**transport.**NoNodeAvailableException: No node
available
at org.elasticsearch.client.**transport.TransportClientNodesService$
RetryListener.onFailure(**TransportClientNodesService.**java:246)
at org.elasticsearch.action.TransportActionNodeProxy$1.
handleException(**TransportActionNodeProxy.java:**84)
at org.elasticsearch.transport.TransportService$Adapter$2$1.
run(TransportService.java:305)
at java.util.concurrent.ThreadPoolExecutor$Worker.
runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

I'm using one elasticsearch node and the Java client and I've increased
the heap size, I've set the timeout parameter for prepareindexand also I've
set the http.max_content_length: 2048mb and client.transport.ping_timeout:
3600s in the elasticsearch.yaml configuration file.

Could you please tell what else can I do in order to insert that 1gb
document?
Thanks!

John

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ojsMX-mJha4/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Ioan Badarinza
Intermediate Developer

*3PILLAR *GLOBAL
MOBILE | +40 763 632 361
DIRECT | +40 364 402 269
ioan.badarinza@3pillarglobal.com

DISCIPLINED INNOVATION

            *CONFIDENTIALITY INFORMATION AND DISCLAIMER:*  This e-mail

message and/or attachments are privileged, confidential, and directed to
and for the use of the addressee only. if this message has been addressed
to you in error, please alert the sender immediately by reply e-mail and
then delete this message and any attachments. Further, any review,
distribution or other use of the contents of this message by anyone other
than the intended recipient is strictly prohibited

P* *Please consider the environment before printing

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · May 14, 2013, 2:20pm

you would likely need to raise some timeouts and give ES enough memory to
hold this doc in memory but I'd doubt the usecase here and if ES is the
right tool to index such a massive document.

simon

On Tuesday, May 14, 2013 2:35:30 PM UTC+2, Ioan Badarinza wrote:

Hello,
I'm pretty new to Elasticsearch and I was doing some load tests. One of
the tests is to write a 1GB document which fails with the following error
message:

14-May-2013 14:44:44 org.elasticsearch.plugins
INFO: [Aegis] loaded , sites
14-May-2013 15:28:34 org.elasticsearch.client.transport
INFO: [Aegis] failed to get node info for
[#transport#-1][inet[/172.16.65.22:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[inet[/172.16.65.22:9300]][cluster/nodes/info] request_id [522] timed out
after [10846ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:342)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246)
at
org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:84)
at
org.elasticsearch.transport.TransportService$Adapter$2$1.run(TransportService.java:305)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I'm using one elasticsearch node and the Java client and I've increased
the heap size, I've set the timeout parameter for prepareindexand also I've
set the http.max_content_length: 2048mb and client.transport.ping_timeout:
3600s in the elasticsearch.yaml configuration file.

Could you please tell what else can I do in order to insert that 1gb
document?
Thanks!

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · May 14, 2013, 4:04pm

Hi, John.

Just for reference. Our Oracle DBA has fits when one of us even thinks
about stuffing 10KB CLOB's (let alone 2MB BLOB images) in our Oracle DB.

I think it's safe to say that pushing a 1GB document into ES is way past
the envelop unless you really have some stellar hardware (and a stellar
budget to match).

If this is a valid use case for you (and not just a case of "let's see what
happens when I push the accelerator to the floor and enter Dead Man's
Curve), then you'll probably want to create a carefully organized file
system layout to store those 1 GB documents and only store a reference to
them (and perhaps some extracted metadata for queries if needed) in the
database itself.

Cheers!

Brian

On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:

Hi Simon,
I'm just doing some load tests for elasticsearch and one of the tests is
about indexing 1gb document and it fails.
Thanks for replaying!

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

JohnB1 · May 15, 2013, 12:37pm

Hi guys,
Thanks for your answers!

But, what I'm trying to do is to compare load test results for ES with load
test results for MySQL and try to figure out why should I use a noSQL db
instead of MySQL. In MySQL the 1gb data insert finished in aprox. 12 mins
on an Intel proc with 8GB RAM and in ES I'm not able to do this. I'm just
trying to compare results and that is why I need to insert in ES a 1gb
document.

I'm still trying to get this done but I had no luck , so if you have any
suggestions please let me know.

Thanks!

John

On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:

Hi, John.

Just for reference. Our Oracle DBA has fits when one of us even thinks
about stuffing 10KB CLOB's (let alone 2MB BLOB images) in our Oracle DB.

I think it's safe to say that pushing a 1GB document into ES is way past
the envelop unless you really have some stellar hardware (and a stellar
budget to match).

If this is a valid use case for you (and not just a case of "let's see
what happens when I push the accelerator to the floor and enter Dead Man's
Curve), then you'll probably want to create a carefully organized file
system layout to store those 1 GB documents and only store a reference to
them (and perhaps some extracted metadata for queries if needed) in the
database itself.

Cheers!

Brian

On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:

Hi Simon,
I'm just doing some load tests for elasticsearch and one of the tests is
about indexing 1gb document and it fails.
Thanks for replaying!

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · May 15, 2013, 1:11pm

Seriously? You want to store in a search engine a file that you will never use for search and you are expecting to get any conclusion with this test case?

My opinion is that you should try to run a test that push 1Gb to ES using bulk with 1kb per document.
So 1000 bulk iteration with 1000 docs each time will help to measure something significant.

Or I completely misunderstood what test you are trying to run...

My 2 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mai 2013 à 14:37, John ioan.badarinza@3pillarglobal.com a écrit :

Hi guys,
Thanks for your answers!

But, what I'm trying to do is to compare load test results for ES with load test results for MySQL and try to figure out why should I use a noSQL db instead of MySQL. In MySQL the 1gb data insert finished in aprox. 12 mins on an Intel proc with 8GB RAM and in ES I'm not able to do this. I'm just trying to compare results and that is why I need to insert in ES a 1gb document.

I'm still trying to get this done but I had no luck , so if you have any suggestions please let me know.

Thanks!

John

On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:
Hi, John.

Just for reference. Our Oracle DBA has fits when one of us even thinks about stuffing 10KB CLOB's (let alone 2MB BLOB images) in our Oracle DB.

I think it's safe to say that pushing a 1GB document into ES is way past the envelop unless you really have some stellar hardware (and a stellar budget to match).

If this is a valid use case for you (and not just a case of "let's see what happens when I push the accelerator to the floor and enter Dead Man's Curve), then you'll probably want to create a carefully organized file system layout to store those 1 GB documents and only store a reference to them (and perhaps some extracted metadata for queries if needed) in the database itself.

Cheers!

Brian

On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:
Hi Simon,
I'm just doing some load tests for elasticsearch and one of the tests is about indexing 1gb document and it fails.
Thanks for replaying!

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · May 15, 2013, 2:53pm

John,

your case seems to be confusing because a document of 1GB will be
unusable in search as long as you handle it as an opaque entity for a
single field in a single document.

In MySQL, I think you issued a "load data infile" and the data is
tabular, with many columns.

These columns must be translated to JSON structures for ES indexing to
make sense.

If you can give the MySQL command you use to push the document and the
column names, I might help to give some adivse how to push the same
document to ES and to translate columns. One possibility is the JDBC river.

Jörg

Am 15.05.13 15:13, schrieb David Pilato:

Seriously? You want to store in a search engine a file that you will
never use for search and you are expecting to get any conclusion with
this test case?

My opinion is that you should try to run a test that push 1Gb to ES
using bulk with 1kb per document.
So 1000 bulk iteration with 1000 docs each time will help to measure
something significant.

Or I completely misunderstood what test you are trying to run...

My 2 cents.

--
David Pilato | /Technical Advocate/ | Elasticsearch.com
http://Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr |@scrutmydocs
https://twitter.com/scrutmydocs

Le 15 mai 2013 à 14:37, John <ioan.badarinza@3pillarglobal.com
mailto:ioan.badarinza@3pillarglobal.com> a écrit :
Hi guys,
Thanks for your answers!

But, what I'm trying to do is to compare load test results for ES
with load test results for MySQL and try to figure out why should I
use a noSQL db instead of MySQL. In MySQL the 1gb data insert
finished in aprox. 12 mins on an Intel proc with 8GB RAM and in ES
I'm not able to do this. I'm just trying to compare results and that
is why I need to insert in ES a 1gb document.

I'm still trying to get this done but I had no luck , so if you
have any suggestions please let me know.

Thanks!

John

On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:
Hi, John.

Just for reference. Our Oracle DBA has fits when one of us even
thinks about stuffing 10KB CLOB's (let alone 2MB BLOB images) in
our Oracle DB.

I think it's safe to say that pushing a 1GB document into ES is
way past the envelop unless you really have some stellar hardware
(and a stellar budget to match).

If this is a valid use case for you (and not just a case of
"let's see what happens when I push the accelerator to the floor
and enter Dead Man's Curve), then you'll probably want to create
a carefully organized file system layout to store those 1 GB
documents and only store a reference to them (and perhaps some
extracted metadata for queries if needed) in the database itself.

Cheers!

Brian

On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:

    Hi Simon,
    I'm just doing some load tests for elasticsearch and one of
    the tests is about indexing 1gb document and it fails.
    Thanks for replaying!

    John
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

JohnB1 · May 15, 2013, 3:05pm

Hi David,
What you are describing is another separate test in my list which worked
well.
Thanks!
John

On Wednesday, May 15, 2013 4:13:20 PM UTC+3, David Pilato wrote:

Seriously? You want to store in a search engine a file that you will never
use for search and you are expecting to get any conclusion with this test
case?

My opinion is that you should try to run a test that push 1Gb to ES using
bulk with 1kb per document.
So 1000 bulk iteration with 1000 docs each time will help to measure
something significant.

Or I completely misunderstood what test you are trying to run...

My 2 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 15 mai 2013 à 14:37, John <ioan.ba...@3pillarglobal.com <javascript:>>
a écrit :

Hi guys,
Thanks for your answers!

But, what I'm trying to do is to compare load test results for ES with
load test results for MySQL and try to figure out why should I use a noSQL
db instead of MySQL. In MySQL the 1gb data insert finished in aprox. 12
mins on an Intel proc with 8GB RAM and in ES I'm not able to do this. I'm
just trying to compare results and that is why I need to insert in ES a 1gb
document.

I'm still trying to get this done but I had no luck , so if you have
any suggestions please let me know.

Thanks!

John

On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:

Hi, John.

Just for reference. Our Oracle DBA has fits when one of us even thinks
about stuffing 10KB CLOB's (let alone 2MB BLOB images) in our Oracle DB.

I think it's safe to say that pushing a 1GB document into ES is way past
the envelop unless you really have some stellar hardware (and a stellar
budget to match).

If this is a valid use case for you (and not just a case of "let's see
what happens when I push the accelerator to the floor and enter Dead Man's
Curve), then you'll probably want to create a carefully organized file
system layout to store those 1 GB documents and only store a reference to
them (and perhaps some extracted metadata for queries if needed) in the
database itself.

Cheers!

Brian

On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:

Hi Simon,
I'm just doing some load tests for elasticsearch and one of the tests is
about indexing 1gb document and it fails.
Thanks for replaying!

John

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · May 15, 2013, 3:05pm

Hi John!

You're actually not loading a 1 GB document into ES, because MySQL does not
allow a 1 GB row in any table.

http://dev.mysql.com/doc/refman/5.5/en/column-count-limit.html states:

Every table (regardless of storage engine) has a maximum row size of
65,535 bytes. Storage engines may place additional constraints on this
limit, reducing the effective maximum row size.

My own experience on a little ol' MacBook Pro (quad-core i7 CPU, but still
one set of spindles on a relatively slow disk) showed that Elasticsearch
could bulk-load 76 million documents in 2.3 hours with a bulk limit of 4096
per load. The resulting database takes about 14 GB of disk space (ES
0.90.0; it was about twice that size using ES 0.20.4).

The very same data was loaded into another department's MySQL database: On
a true server machine with a high-performance disk array, the load took 8
hours.

But if you are really comparing SQL vrs. NoSQL, there's a lot more to look
at than just Elasticsearch's superior load and query performance.

SQL solves a specific class of problems and it solves them well. If you
need SQL, use Oracle (lots of $$$ but it does work) or PostgreSQL (just as
free as MySQL but much more capable and robust).

If you want the kinds of loads and queries that NoSQL provides, you can
hardly do better than Elasticsearch, but even it's not the only "No SQL"
choice. But after writing my own "No SQL" database in the past, I knew what
I was looking for and Elasticsearch has nearly everything I was looking for
along with a whole lot more that I didn't know I could ask for. Awesome!

Regards,
Brian

On Wednesday, May 15, 2013 8:37:23 AM UTC-4, John wrote:

Hi guys,
Thanks for your answers!

But, what I'm trying to do is to compare load test results for ES with
load test results for MySQL and try to figure out why should I use a noSQL
db instead of MySQL. In MySQL the 1gb data insert finished in aprox. 12
mins on an Intel proc with 8GB RAM and in ES I'm not able to do this. I'm
just trying to compare results and that is why I need to insert in ES a 1gb
document.

I'm still trying to get this done but I had no luck , so if you have
any suggestions please let me know.

Thanks!

John

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

JohnB1 · May 15, 2013, 3:07pm

Hello Jörg,
Here are my two (mysql + ES) dummy tests:

MySQL:
@Test
public void testWrite1GBMysql() {

    byte[] b = new byte[1073741724];
    for (int i = 0; i < 1073741724; i++) {
        b[i] = 'a';
    }
    assertEquals(b.length, 1073741724);

    long startTime = System.currentTimeMillis();

    try {
        PreparedStatement pst = conn.prepareStatement("insert into

test.load(id, image) values(2, ?)");
pst.setBytes(1, b);
pst.executeUpdate();
} catch (SQLException e) {
System.out.println(e.toString());
assertTrue(false);
}

    long duration = System.currentTimeMillis() - startTime;

    System.out.println("1GB bytes were written in " + duration + "

miliseconds");
assertTrue(true);
}

My ES test:

@Test
public void testWrite1GB() {

    byte[] b = new byte[1073741824];
    b[0] = '{';
    b[1] = '\"';
    b[2] = 's';
    b[3] = 'o';
    b[4] = 'm';
    b[5] = 'e';
    b[6] = 'K';
    b[7] = 'e';
    b[8] = 'y';
    b[9] = '\"';
    b[10] = ':';
    b[11] = '\"';
    for (int i = 12; i < 1073741822; i++) {
        b[i] = 'a';
    }
    b[1073741822] = '\"';
    b[1073741823] = '}';
    assertEquals(b.length, 1073741824);

    long startTime = System.currentTimeMillis();
    client.prepareIndex("myindex", "quizItem")
            .setSource(b)
            .setId("oneGBKey")
            .setTimeout(TimeValue.timeValueHours(3))
            .execute()
            .actionGet();

    long duration = System.currentTimeMillis() - startTime;

    System.out.println("1GB write finished in " + duration + "

miliseconds");
assertTrue(true);
}

Thanks!

John

On Wednesday, May 15, 2013 5:53:53 PM UTC+3, Jörg Prante wrote:

John,

your case seems to be confusing because a document of 1GB will be
unusable in search as long as you handle it as an opaque entity for a
single field in a single document.

In MySQL, I think you issued a "load data infile" and the data is
tabular, with many columns.

These columns must be translated to JSON structures for ES indexing to
make sense.

If you can give the MySQL command you use to push the document and the
column names, I might help to give some adivse how to push the same
document to ES and to translate columns. One possibility is the JDBC
river.

Jörg

Am 15.05.13 15:13, schrieb David Pilato:
Seriously? You want to store in a search engine a file that you will
never use for search and you are expecting to get any conclusion with
this test case?

My opinion is that you should try to run a test that push 1Gb to ES
using bulk with 1kb per document.
So 1000 bulk iteration with 1000 docs each time will help to measure
something significant.

Or I completely misunderstood what test you are trying to run...

My 2 cents.

--
David Pilato | /Technical Advocate/ | Elasticsearch.com
http://Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr |@scrutmydocs
https://twitter.com/scrutmydocs

Le 15 mai 2013 à 14:37, John <ioan.ba...@3pillarglobal.com <javascript:>
<mailto:ioan.ba...@3pillarglobal.com <javascript:>>> a écrit :
Hi guys,
Thanks for your answers!

But, what I'm trying to do is to compare load test results for ES
with load test results for MySQL and try to figure out why should I
use a noSQL db instead of MySQL. In MySQL the 1gb data insert
finished in aprox. 12 mins on an Intel proc with 8GB RAM and in ES
I'm not able to do this. I'm just trying to compare results and that
is why I need to insert in ES a 1gb document.

I'm still trying to get this done but I had no luck , so if you
have any suggestions please let me know.

Thanks!

John

On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:
Hi, John. 

Just for reference. Our Oracle DBA has fits when one of us even 
thinks about stuffing 10KB CLOB's (let alone 2MB BLOB images) in 
our Oracle DB. 

I think it's safe to say that pushing a 1GB document into ES is 
way past the envelop unless you really have some stellar hardware 
(and a stellar budget to match). 

If this is a valid use case for you (and not just a case of 
"let's see what happens when I push the accelerator to the floor 
and enter Dead Man's Curve), then you'll probably want to create 
a carefully organized file system layout to store those 1 GB 
documents and only store a reference to them (and perhaps some 
extracted metadata for queries if needed) in the database itself. 

Cheers! 

Brian 

On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote: 

    Hi Simon, 
    I'm just doing some load tests for elasticsearch and one of 
    the tests is about indexing 1gb document and it fails. 
    Thanks for replaying! 

    John 
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com <javascript:>
<mailto:elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · May 15, 2013, 9:34pm

John,

I am not sure I understand what you are trying to do here and I dont' know
your usecase. Once thing that I can tell here is that loading a 1GB file in
ES is something ES was not made for and if you are looking for a blobstore
you might want to look at other solutions. There might be ways to make this
work even with ES but you won't like the implications memory wise and all
the tuning so my advice here is, don't bother to try. I'd rather spend my
time to tell us about your usecases where you are considering ES as a
replacement for mySQL you might get some pretty good advice here on the
list.

That said, from my perspective the benchmark is very artificial for ES (no
offence at all!!) and I hope you understand that most of us don't have time
to support you with this rather artificial test.

would love to know more about your usecase.

simon

On Wednesday, May 15, 2013 5:05:07 PM UTC+2, John wrote:

Hi David,
What you are describing is another separate test in my list which worked
well.
Thanks!
John

On Wednesday, May 15, 2013 4:13:20 PM UTC+3, David Pilato wrote:

Seriously? You want to store in a search engine a file that you will
never use for search and you are expecting to get any conclusion with this
test case?

My opinion is that you should try to run a test that push 1Gb to ES using
bulk with 1kb per document.
So 1000 bulk iteration with 1000 docs each time will help to measure
something significant.

Or I completely misunderstood what test you are trying to run...

My 2 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 15 mai 2013 à 14:37, John ioan.ba...@3pillarglobal.com a écrit :

Hi guys,
Thanks for your answers!

But, what I'm trying to do is to compare load test results for ES with
load test results for MySQL and try to figure out why should I use a noSQL
db instead of MySQL. In MySQL the 1gb data insert finished in aprox. 12
mins on an Intel proc with 8GB RAM and in ES I'm not able to do this. I'm
just trying to compare results and that is why I need to insert in ES a 1gb
document.

I'm still trying to get this done but I had no luck , so if you have
any suggestions please let me know.

Thanks!

John

On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:

Hi, John.

Just for reference. Our Oracle DBA has fits when one of us even thinks
about stuffing 10KB CLOB's (let alone 2MB BLOB images) in our Oracle DB.

I think it's safe to say that pushing a 1GB document into ES is way past
the envelop unless you really have some stellar hardware (and a stellar
budget to match).

If this is a valid use case for you (and not just a case of "let's see
what happens when I push the accelerator to the floor and enter Dead Man's
Curve), then you'll probably want to create a carefully organized file
system layout to store those 1 GB documents and only store a reference to
them (and perhaps some extracted metadata for queries if needed) in the
database itself.

Cheers!

Brian

On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:

Hi Simon,
I'm just doing some load tests for elasticsearch and one of the tests
is about indexing 1gb document and it fails.
Thanks for replaying!

John

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

JohnB1 · May 16, 2013, 6:41am

Hi guys,

Thanks very much for your fast responses.
I want to start implementing a minimum viable product with a noSQL
database, and since there are a lot of them, I wanted to test at least some
of them before starting the work. Until now I've tested Elasticsearch,
MongoDB and CouchDB.
Thanks again!

And to close this thread with a conclusion, from all the responses here I
think we can say that ES was not designed to store such big documents.

John

On Thursday, May 16, 2013 12:34:51 AM UTC+3, simonw wrote:

John,

I am not sure I understand what you are trying to do here and I dont' know
your usecase. Once thing that I can tell here is that loading a 1GB file in
ES is something ES was not made for and if you are looking for a blobstore
you might want to look at other solutions. There might be ways to make this
work even with ES but you won't like the implications memory wise and all
the tuning so my advice here is, don't bother to try. I'd rather spend my
time to tell us about your usecases where you are considering ES as a
replacement for mySQL you might get some pretty good advice here on the
list.

That said, from my perspective the benchmark is very artificial for ES (no
offence at all!!) and I hope you understand that most of us don't have time
to support you with this rather artificial test.

would love to know more about your usecase.

simon

On Wednesday, May 15, 2013 5:05:07 PM UTC+2, John wrote:

Hi David,
What you are describing is another separate test in my list which worked
well.
Thanks!
John

On Wednesday, May 15, 2013 4:13:20 PM UTC+3, David Pilato wrote:

Seriously? You want to store in a search engine a file that you will
never use for search and you are expecting to get any conclusion with this
test case?

My opinion is that you should try to run a test that push 1Gb to ES
using bulk with 1kb per document.
So 1000 bulk iteration with 1000 docs each time will help to measure
something significant.

Or I completely misunderstood what test you are trying to run...

My 2 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 15 mai 2013 à 14:37, John ioan.ba...@3pillarglobal.com a écrit :

Hi guys,
Thanks for your answers!

But, what I'm trying to do is to compare load test results for ES with
load test results for MySQL and try to figure out why should I use a noSQL
db instead of MySQL. In MySQL the 1gb data insert finished in aprox. 12
mins on an Intel proc with 8GB RAM and in ES I'm not able to do this. I'm
just trying to compare results and that is why I need to insert in ES a 1gb
document.

I'm still trying to get this done but I had no luck , so if you have
any suggestions please let me know.

Thanks!

John

On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:

Hi, John.

Just for reference. Our Oracle DBA has fits when one of us even thinks
about stuffing 10KB CLOB's (let alone 2MB BLOB images) in our Oracle DB.

I think it's safe to say that pushing a 1GB document into ES is way
past the envelop unless you really have some stellar hardware (and a
stellar budget to match).

If this is a valid use case for you (and not just a case of "let's see
what happens when I push the accelerator to the floor and enter Dead Man's
Curve), then you'll probably want to create a carefully organized file
system layout to store those 1 GB documents and only store a reference to
them (and perhaps some extracted metadata for queries if needed) in the
database itself.

Cheers!

Brian

On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:

Hi Simon,
I'm just doing some load tests for elasticsearch and one of the tests
is about indexing 1gb document and it fails.
Thanks for replaying!

John

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · May 16, 2013, 3:32pm

By "we can say", you must mean you and your tapeworm. No one else is
included in your conclusion.

By "document" you really mean "input data stream". In strict terms, an
Elasticsearch "document" is a MySQL "row". You will never succeed in
loading a 1 GB row into MySQL. But from your posts, I am guessing that
MySQL has a tool that slurps one huge 1 GB input stream into the multiple
rows it represents and loads them optimally. OK, Elasticsearch doesn't come
with such a tool, but it comes with wonderful APIs that let you dream up
and implement all manner of input streams. There are many third-party tools
for pulling in data from many sources (rivers, they call them), and I wrote
my own converters with proper bulk-load coding to push bulk data into
Elasticsearch.

I can easily and successfully load a 3.1 GB "document" into Elasticsearch.
Even on my laptop with decent CPU power but low end disk performance, I can
load this 3.1 GB monster in just under 3 hours. The MacBook fans sound like
a (quiet) jet engine, but the system is
still surprisingly responsive during its efforts. And there are no memory
issues, exceptions thrown, or any other issues at all. And note that this
exact same 3.1 GB input "document" was loaded into MySQL in 8 hours on a
production server with a proper disk array; Elasticsearch did the same job
on my laptop and single slow disk in less than half the time.

And that 3.1 GB document is a gzip'd CSV file. Of course, I needed my Java
skills to take the gunzip'd output (using gunzip -c to decompress to stdout
but not on disk. Yay!), then convert that (probably about 7 or 8 GB by now)
uncompressed CSV stream into the desired JSON stream, and the use the
excellent examples as a model for my bulk loader that properly loaded that
huge document into Elasticsearch.

And it gets better:
http://thedatachef.blogspot.com/2011/01/bulk-indexing-with-elasticsearch-and.htmltells us that:

At Infochimps we recently indexed over 2.5 billion documents for a total
of 4TB total indexed size. This would not have been possible without
Elasticsearch and the Hadoop bulk loader we wrote, wonderdog. I'll go into
the technical details in a later post but for now here's how you can get
started with Elasticsearch and Hadoop.
*
*
[thedatachef] @About Trust, We plan on a blog post with just those
technical details soon. For now, the 4TB was after indexing and the data
was raw text. We used 16 m2.xlarge ec2 nodes for the elasticsearch cluster
and 5 m1.large hadoop nodes. Took 2-5 minutes per input GB.
*
*
[thedatachef] @Michael Yes. Indexing speed varied from 2 minutes per input
GB (at best) to 5 minutes per input GB (at worse). That is all given the
setup explained in the previous comment.
*
*
[jasonInKorea] I have done same thing that you did, and I checked
wonderful speed. But I didn't use hadoop storage.

And so we can all conclude that Elasticsearch will easily, smoothly, and
gracefully load and process and query documents that are many, many times
larger than a relatively tiny 1 GB document!

Regards,
Brian

On Thursday, May 16, 2013 2:41:44 AM UTC-4, John wrote:

Hi guys,

Thanks very much for your fast responses.
I want to start implementing a minimum viable product with a noSQL
database, and since there are a lot of them, I wanted to test at least some
of them before starting the work. Until now I've tested Elasticsearch,
MongoDB and CouchDB.
Thanks again!

And to close this thread with a conclusion, from all the responses here I
think we can say that ES was not designed to store such big documents.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Steve_4 · May 16, 2013, 4:12pm

I just ended up in this thread because of a similar problem. We have huge
log files that should be parsed and indexed by certain data. (I know it can
be done in various ways but it's simpler to process and to keep them as
huge as possible - don't ask it's just business details let's say FX like).
I think John's statement is correct in the scenario that he provided and
proved with code. Now Brian if you can be more specific about what has been
done with Hadoop what by Elastic Search by those guys where you have copy
pasted it would be also nice. Also what was the hardware that they were
running on, what version of operating system they were running, what
internet connection did they have would be useful for a profesional rather
then some text that states "i'm right, you're wrong because someone told me
so". From my perspective if I run john's code i see just that ES can't deal
with a record of 1G. I also understand that there are workarounds for this

Thanks,
S.

On Thursday, May 16, 2013 6:32:17 PM UTC+3, InquiringMind wrote:

By "we can say", you must mean you and your tapeworm. No one else is
included in your conclusion.

By "document" you really mean "input data stream". In strict terms, an
Elasticsearch "document" is a MySQL "row". You will never succeed in
loading a 1 GB row into MySQL. But from your posts, I am guessing that
MySQL has a tool that slurps one huge 1 GB input stream into the multiple
rows it represents and loads them optimally. OK, Elasticsearch doesn't come
with such a tool, but it comes with wonderful APIs that let you dream up
and implement all manner of input streams. There are many third-party tools
for pulling in data from many sources (rivers, they call them), and I wrote
my own converters with proper bulk-load coding to push bulk data into
Elasticsearch.

I can easily and successfully load a 3.1 GB "document" into Elasticsearch.
Even on my laptop with decent CPU power but low end disk performance, I can
load this 3.1 GB monster in just under 3 hours. The MacBook fans sound like
a (quiet) jet engine, but the system is
still surprisingly responsive during its efforts. And there are no memory
issues, exceptions thrown, or any other issues at all. And note that this
exact same 3.1 GB input "document" was loaded into MySQL in 8 hours on a
production server with a proper disk array; Elasticsearch did the same job
on my laptop and single slow disk in less than half the time.

And that 3.1 GB document is a gzip'd CSV file. Of course, I needed my Java
skills to take the gunzip'd output (using gunzip -c to decompress to stdout
but not on disk. Yay!), then convert that (probably about 7 or 8 GB by now)
uncompressed CSV stream into the desired JSON stream, and the use the
excellent examples as a model for my bulk loader that properly loaded that
huge document into Elasticsearch.

And it gets better:
http://thedatachef.blogspot.com/2011/01/bulk-indexing-with-elasticsearch-and.htmltells us that:

At Infochimps we recently indexed over 2.5 billion documents for a total
of 4TB total indexed size. This would not have been possible without
Elasticsearch and the Hadoop bulk loader we wrote, wonderdog. I'll go into
the technical details in a later post but for now here's how you can get
started with Elasticsearch and Hadoop.
*
*
[thedatachef] @About Trust, We plan on a blog post with just those
technical details soon. For now, the 4TB was after indexing and the data
was raw text. We used 16 m2.xlarge ec2 nodes for the elasticsearch cluster
and 5 m1.large hadoop nodes. Took 2-5 minutes per input GB.
*
*
[thedatachef] @Michael Yes. Indexing speed varied from 2 minutes per
input GB (at best) to 5 minutes per input GB (at worse). That is all given
the setup explained in the previous comment.
*
*
[jasonInKorea] I have done same thing that you did, and I checked
wonderful speed. But I didn't use hadoop storage.

And so we can all conclude that Elasticsearch will easily, smoothly,
and gracefully load and process and query documents that are many, many
times larger than a relatively tiny 1 GB document!

Regards,
Brian

On Thursday, May 16, 2013 2:41:44 AM UTC-4, John wrote:

Hi guys,

Thanks very much for your fast responses.
I want to start implementing a minimum viable product with a noSQL
database, and since there are a lot of them, I wanted to test at least some
of them before starting the work. Until now I've tested Elasticsearch,
MongoDB and CouchDB.
Thanks again!

And to close this thread with a conclusion, from all the responses here I
think we can say that ES was not designed to store such big documents.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · May 16, 2013, 5:32pm

Hi guys,

On the Hadoop angle, I want to mention the ES-Hadoop project [1]. The
readme should be clear enough but basically it gives native ES integration
for both reading and writing to/from ES in Hadoop, Cascading, Pig, Hive
(and it does fully map on top of the Map/Reduce model meaning your searches
are executed in parallel).

Cheers,

[1] https://github.com/elasticsearch/elasticsearch-hadoop

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · May 16, 2013, 6:04pm

You can read about Infochimp's Wonderdog (and of course reuse it) at

It's far more than just loading large tabular data or json files into
ES. With Apache Pig and Hadoop you also aim at parallel data processing
tasks, which is usually outside the scope of ES. With Wonderdog
interface, you can just push those data chunks from Hadoop into ES - it
is not river pull-style but parallel push style.

Jörg

Am 16.05.13 18:12, schrieb Steve:

I just ended up in this thread because of a similar problem. We have
huge log files that should be parsed and indexed by certain data. (I
know it can be done in various ways but it's simpler to process and to
keep them as huge as possible - don't ask it's just business details
let's say FX like). I think John's statement is correct in the
scenario that he provided and proved with code. Now Brian if you can
be more specific about what has been done with Hadoop what by Elastic
Search by those guys where you have copy pasted it would be also nice.
Also what was the hardware that they were running on, what version of
operating system they were running, what internet connection did they
have would be useful for a profesional rather then some text that
states "i'm right, you're wrong because someone told me so". From my
perspective if I run john's code i see just that ES can't deal with a
record of 1G. I also understand that there are workarounds for this

Thanks,
S.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · May 16, 2013, 6:44pm

John,

thanks for the interesting case. Now I understand you want to store
blobs, like images. I think it is worth to think about ES support of
such large blobs for various reasons.

Right now the ES client methods are somehow limited, under the hood the
data is converted to Jackson-JSON-friendly data structures. Binary data
is thought to set with the XContentBuilder by using field(name, BytesRef
byes) API method. Without tweaking ES for 1 GB transfer you will run
into several challenges before success (I need some spare time to find
out the settings).

I also thought once about a blob store in ES with all the goodies of
replicated data and so on, but the way ES was set on top of Lucene 3 the
system was limited in creating a "store-only" large blob store. With
Lucene 4 it is possible to design an additional ES index format that
could theoretically absorb an arbitrary long Java InputStream, or, other
way round, there could be a scenario where an ES client write to a Java
OutputStream continuously, and the result is a blob, managed by a
special Lucene 4 store codec (which is not there yet).

How do you think you will make use of such blobs, would it be enough to
deliver the stream back to requesting clients?

I'm afraid ES queries on blobs will not make much sense, and the REST
API too. Or should the stream get processed on the fly while ingesting?
New entity stream analyzers/filters could be handy to capture subtitles
on video streams, or image features could get extracted to metadata, or
music notations from melodies, and moved into a special ES metadata
index by configuration, and this index could be queried.

Jörg

Am 15.05.13 17:07, schrieb John:

Hello Jörg,
Here are my two (mysql + ES) dummy tests:

MySQL:
@Test
public void testWrite1GBMysql() {

    byte[] b = new byte[1073741724];
    for (int i = 0; i < 1073741724; i++) {
        b[i] = 'a';
    }
    assertEquals(b.length, 1073741724);

    long startTime = System.currentTimeMillis();

    try {
        PreparedStatement pst = conn.prepareStatement("insert into

test.load(id, image) values(2, ?)");
pst.setBytes(1, b);
pst.executeUpdate();
} catch (SQLException e) {
System.out.println(e.toString());
assertTrue(false);
}

    long duration = System.currentTimeMillis() - startTime;

    System.out.println("1GB bytes were written in " + duration + "

miliseconds");
assertTrue(true);
}

My ES test:

@Test
public void testWrite1GB() {

    byte[] b = new byte[1073741824];
    b[0] = '{';
    b[1] = '\"';
    b[2] = 's';
    b[3] = 'o';
    b[4] = 'm';
    b[5] = 'e';
    b[6] = 'K';
    b[7] = 'e';
    b[8] = 'y';
    b[9] = '\"';
    b[10] = ':';
    b[11] = '\"';
    for (int i = 12; i < 1073741822; i++) {
        b[i] = 'a';
    }
    b[1073741822] = '\"';
    b[1073741823] = '}';
    assertEquals(b.length, 1073741824);

    long startTime = System.currentTimeMillis();
    client.prepareIndex("myindex", "quizItem")
            .setSource(b)
            .setId("oneGBKey")
            .setTimeout(TimeValue.timeValueHours(3))
            .execute()
            .actionGet();

    long duration = System.currentTimeMillis() - startTime;

    System.out.println("1GB write finished in " + duration + "

miliseconds");
assertTrue(true);
}

Thanks!

John

On Wednesday, May 15, 2013 5:53:53 PM UTC+3, Jörg Prante wrote:

John,

your case seems to be confusing because a document of 1GB will be
unusable in search as long as you handle it as an opaque entity for a
single field in a single document.

In MySQL, I think you issued a "load data infile" and the data is
tabular, with many columns.

These columns must be translated to JSON structures for ES
indexing to
make sense.

If you can give the MySQL command you use to push the document and
the
column names, I might help to give some adivse how to push the same
document to ES and to translate columns. One possibility is the
JDBC river.

Jörg

Am 15.05.13 15:13, schrieb David Pilato:
> Seriously? You want to store in a search engine a file that you
will
> never use for search and you are expecting to get any conclusion
with
> this test case?
>
> My opinion is that you should try to run a test that push 1Gb to ES
> using bulk with 1kb per document.
> So 1000 bulk iteration with 1000 docs each time will help to
measure
> something significant.
>
> Or I completely misunderstood what test you are trying to run...
>
> My 2 cents.
>
> --
> *David Pilato* | /Technical Advocate/ | *Elasticsearch.com
> <http://Elasticsearch.com>*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
> <https://twitter.com/elasticsearchfr
<https://twitter.com/elasticsearchfr>> |@scrutmydocs
> <https://twitter.com/scrutmydocs <https://twitter.com/scrutmydocs>>
>
>
>
> Le 15 mai 2013 à 14:37, John <ioan.ba...@3pillarglobal.com
> <mailto:ioan.ba...@3pillarglobal.com>> a écrit :
>
>> Hi guys,
>> Thanks for your answers!
>>
>> But, what I'm trying to do is to compare load test results for ES
>> with load test results for MySQL and try to figure out why
should I
>> use a noSQL db instead of MySQL. In MySQL the 1gb data insert
>> finished in aprox. 12 mins on an Intel proc with 8GB RAM and in ES
>> I'm not able to do this. I'm just trying to compare results and
that
>> is why I need to insert in ES a 1gb document.
>>
>> I'm still trying to get this done but I had no luck :( , so if you
>> have any suggestions please let me know.
>>
>> Thanks!
>>
>> John
>>
>>
>>
>>
>>
>> On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:
>>
>>     Hi, John.
>>
>>     Just for reference. Our Oracle DBA has fits when one of us
even
>>     thinks about stuffing 10KB CLOB's (let alone 2MB BLOB
images) in
>>     our Oracle DB.
>>
>>     I think it's safe to say that pushing a 1GB document into
ES is
>>     way past the envelop unless you really have some stellar
hardware
>>     (and a stellar budget to match).
>>
>>     If this is a valid use case for you (and not just a case of
>>     "let's see what happens when I push the accelerator to the
floor
>>     and enter Dead Man's Curve), then you'll probably want to
create
>>     a carefully organized file system layout to store those 1 GB
>>     documents and only store a reference to them (and perhaps some
>>     extracted metadata for queries if needed) in the database
itself.
>>
>>     Cheers!
>>
>>     Brian
>>
>>     On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:
>>
>>         Hi Simon,
>>         I'm just doing some load tests for elasticsearch and
one of
>>         the tests is about indexing 1gb document and it fails.
>>         Thanks for replaying!
>>
>>         John
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to elasticsearc...@googlegroups.com
>> <mailto:elasticsearch+unsubscribe@googlegroups.com>.
>> For more options, visit
https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>.
>>
>>
>

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Is anyway to bulk huge data to ES without rest Elasticsearch	17	535	July 6, 2017
Looking for advice on bulk loading Elasticsearch	6	941	July 6, 2017
Adding millions of documents, performance decay Elasticsearch	6	671	July 6, 2017
Slow inserts on bigger documents Elasticsearch	7	451	July 6, 2017
Improving Bulk Indexing Elasticsearch	12	4572	July 6, 2017

Timeout while indexing 1gb document

Related topics