John,
thanks for the interesting case. Now I understand you want to store
blobs, like images. I think it is worth to think about ES support of
such large blobs for various reasons.
Right now the ES client methods are somehow limited, under the hood the
data is converted to Jackson-JSON-friendly data structures. Binary data
is thought to set with the XContentBuilder by using field(name, BytesRef
byes) API method. Without tweaking ES for 1 GB transfer you will run
into several challenges before success (I need some spare time to find
out the settings).
I also thought once about a blob store in ES with all the goodies of
replicated data and so on, but the way ES was set on top of Lucene 3 the
system was limited in creating a "store-only" large blob store. With
Lucene 4 it is possible to design an additional ES index format that
could theoretically absorb an arbitrary long Java InputStream, or, other
way round, there could be a scenario where an ES client write to a Java
OutputStream continuously, and the result is a blob, managed by a
special Lucene 4 store codec (which is not there yet).
How do you think you will make use of such blobs, would it be enough to
deliver the stream back to requesting clients?
I'm afraid ES queries on blobs will not make much sense, and the REST
API too. Or should the stream get processed on the fly while ingesting?
New entity stream analyzers/filters could be handy to capture subtitles
on video streams, or image features could get extracted to metadata, or
music notations from melodies, and moved into a special ES metadata
index by configuration, and this index could be queried.
Jörg
Am 15.05.13 17:07, schrieb John:
Hello Jörg,
Here are my two (mysql + ES) dummy tests:
MySQL:
@Test
public void testWrite1GBMysql() {
byte[] b = new byte[1073741724];
for (int i = 0; i < 1073741724; i++) {
b[i] = 'a';
}
assertEquals(b.length, 1073741724);
long startTime = System.currentTimeMillis();
try {
PreparedStatement pst = conn.prepareStatement("insert into
test.load(id, image) values(2, ?)");
pst.setBytes(1, b);
pst.executeUpdate();
} catch (SQLException e) {
System.out.println(e.toString());
assertTrue(false);
}
long duration = System.currentTimeMillis() - startTime;
System.out.println("1GB bytes were written in " + duration + "
miliseconds");
assertTrue(true);
}
My ES test:
@Test
public void testWrite1GB() {
byte[] b = new byte[1073741824];
b[0] = '{';
b[1] = '\"';
b[2] = 's';
b[3] = 'o';
b[4] = 'm';
b[5] = 'e';
b[6] = 'K';
b[7] = 'e';
b[8] = 'y';
b[9] = '\"';
b[10] = ':';
b[11] = '\"';
for (int i = 12; i < 1073741822; i++) {
b[i] = 'a';
}
b[1073741822] = '\"';
b[1073741823] = '}';
assertEquals(b.length, 1073741824);
long startTime = System.currentTimeMillis();
client.prepareIndex("myindex", "quizItem")
.setSource(b)
.setId("oneGBKey")
.setTimeout(TimeValue.timeValueHours(3))
.execute()
.actionGet();
long duration = System.currentTimeMillis() - startTime;
System.out.println("1GB write finished in " + duration + "
miliseconds");
assertTrue(true);
}
Thanks!
John
On Wednesday, May 15, 2013 5:53:53 PM UTC+3, Jörg Prante wrote:
John,
your case seems to be confusing because a document of 1GB will be
unusable in search as long as you handle it as an opaque entity for a
single field in a single document.
In MySQL, I think you issued a "load data infile" and the data is
tabular, with many columns.
These columns must be translated to JSON structures for ES
indexing to
make sense.
If you can give the MySQL command you use to push the document and
the
column names, I might help to give some adivse how to push the same
document to ES and to translate columns. One possibility is the
JDBC river.
Jörg
Am 15.05.13 15:13, schrieb David Pilato:
> Seriously? You want to store in a search engine a file that you
will
> never use for search and you are expecting to get any conclusion
with
> this test case?
>
> My opinion is that you should try to run a test that push 1Gb to ES
> using bulk with 1kb per document.
> So 1000 bulk iteration with 1000 docs each time will help to
measure
> something significant.
>
> Or I completely misunderstood what test you are trying to run...
>
> My 2 cents.
>
> --
> *David Pilato* | /Technical Advocate/ | *Elasticsearch.com
> <http://Elasticsearch.com>*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
> <https://twitter.com/elasticsearchfr
<https://twitter.com/elasticsearchfr>> |@scrutmydocs
> <https://twitter.com/scrutmydocs <https://twitter.com/scrutmydocs>>
>
>
>
> Le 15 mai 2013 à 14:37, John <ioan.ba...@3pillarglobal.com
> <mailto:ioan.ba...@3pillarglobal.com>> a écrit :
>
>> Hi guys,
>> Thanks for your answers!
>>
>> But, what I'm trying to do is to compare load test results for ES
>> with load test results for MySQL and try to figure out why
should I
>> use a noSQL db instead of MySQL. In MySQL the 1gb data insert
>> finished in aprox. 12 mins on an Intel proc with 8GB RAM and in ES
>> I'm not able to do this. I'm just trying to compare results and
that
>> is why I need to insert in ES a 1gb document.
>>
>> I'm still trying to get this done but I had no luck :( , so if you
>> have any suggestions please let me know.
>>
>> Thanks!
>>
>> John
>>
>>
>>
>>
>>
>> On Tuesday, May 14, 2013 7:04:51 PM UTC+3, InquiringMind wrote:
>>
>> Hi, John.
>>
>> Just for reference. Our Oracle DBA has fits when one of us
even
>> thinks about stuffing 10KB CLOB's (let alone 2MB BLOB
images) in
>> our Oracle DB.
>>
>> I think it's safe to say that pushing a 1GB document into
ES is
>> way past the envelop unless you really have some stellar
hardware
>> (and a stellar budget to match).
>>
>> If this is a valid use case for you (and not just a case of
>> "let's see what happens when I push the accelerator to the
floor
>> and enter Dead Man's Curve), then you'll probably want to
create
>> a carefully organized file system layout to store those 1 GB
>> documents and only store a reference to them (and perhaps some
>> extracted metadata for queries if needed) in the database
itself.
>>
>> Cheers!
>>
>> Brian
>>
>> On Tuesday, May 14, 2013 9:59:01 AM UTC-4, John wrote:
>>
>> Hi Simon,
>> I'm just doing some load tests for elasticsearch and
one of
>> the tests is about indexing 1gb document and it fails.
>> Thanks for replaying!
>>
>> John
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to elasticsearc...@googlegroups.com
>> <mailto:elasticsearch+unsubscribe@googlegroups.com>.
>> For more options, visit
https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>.
>>
>>
>
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.