Excellent! You rock.
You can track down the progress through the github tracker [1] - I've
cleaned it up so you can see how issues are
assigned per milestone. In short, the plan is to wrap up the current
release (working on the docs now) and push this out
as soon as possible - probably by early next week.
The second milestone should follow that shortly probably in about 1.5
month. I'm talking about the actual release date -
the features will obviously be available in master (and through the
nightly builds) long before.
For your use-case, you might be interested in issue #69 [2] - feel free to
add context to it or add another issue if
you'd like for Hive; the more info there is, the better we can define it.
Thanks,
[1]
Issues · elastic/elasticsearch-hadoop · GitHub
[2] Index aware document writes · Issue #69 · elastic/elasticsearch-hadoop · GitHub
On 14/08/2013 2:26 PM, Val Vakar wrote:
Yes, you're right - I would like #3. It's awesome that you guys are
working on that.
For my project it would be very useful to know:
- What might your solution look like (client-side / server-side /
etc)?
- When do you expect to release it?
Thanks a lot!
-Val
On Wednesday, August 14, 2013 7:09:00 AM UTC-4, Costin Leau wrote:
Awareness of the id is something that we're planning on addressing
after releasing the current version of
elasticsearch-hadoop.
This means that there are three main strategies for sending a
document to ES:
1. create/put-if-absent (applies also when there's no id)
2. index op-type=index or create document if missing, _replace_ if
present
3. upsert or create document if missing, _update if present
If my understanding is correct, you're looking for version 3 - to
update/merge the document and not replace it (or
upsert).
Let me know if I'm missed/misunderstood anything.
Thanks,
P.S. we're also planning on adding parent/child support - which is
also based on id awareness.
On 14/08/2013 12:13 AM, Val Vakar wrote:
> Hey Costin,
>
> You're right that Hive doesn't have the concept of an id. ES
handles such cases elegantly through the _id path mapping
> so the client doesn't need to understand id's; I would like that
same exact concept for bulk loading through Hive.
>
> My goal is to run different processing streams (concurrently or
days/weeks apart) on hadoop - and possibly elsewhere -
> that compute various parts of the same document at their own pace.
Whenever something is computed, it's upserted into
> ES. It's hard to describe my exact business case, but as a very
contrived example let's say we're tracking sales data
> for electronic devices: there's a data stream from
brick-and-mortar stores, another stream from web sales, etc - we
> would like to accumulate all that under each device's own document
in ES ( say { "device": "iPhone", "sales_data": {
> "January": { "brick_and_mortar": 305, "website": 298 },
"February": { "brick_and_mortar": 225, "website": 168 },
> "March": ... } . Basically, it's in the upsert sweet spot.
>
> What I'm thinking of is - say we define the table using
>
> |TBLPROPERTIES(|
> |||'es.host'| |= ||'myhost'||,|
> |||'es.resource'| |= ||'myindex/mytype',||
> 'es.insert.strategy' = 'upsert',
> 'es.id.path' = 'page_id'
>
> |
> where es.insert.strategy = [index]/upsert and es.id.path is the
path to the id property (much like the server-side _id
> path mapping).
>
> This would make it possible to construct the rest request with
doc_as_upsert semantics - but since we need that id
> mapping, I realize this feels pretty awkward to do in ESSerde and
elsewhere.
>
> What do you think?
>
> Thanks,
> -Val
>
>
>
>
>
>
> On Tuesday, August 13, 2013 1:07:55 PM UTC-4, Val Vakar wrote:
>
> Hello Experts,
>
> I'd really like to leverage bulk upserts from Hive. I could
implement that in elasticsearch-hadoop, but I need a
> good way to specify the id (seehttps://
Redirecting to Google Groups
<
Redirecting to Google Groups>
> <
Redirecting to Google Groups
<
Redirecting to Google Groups>>).
That's totally doable, but it's
> also awkward since the last time we have the deserialized
object is in ESSerDe.serialize().
>
> Has there been any thought on this on your side?
>
> Thanks!
> -Val
>
> --
> You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it,
send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> For more options, visithttps://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
>
>
--
Costin
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin