Excellent! You rock.
You can track down the progress through the github tracker [1] - I've
cleaned it up so you can see how issues are
assigned per milestone. In short, the plan is to wrap up the current
release (working on the docs now) and push this out
as soon as possible - probably by early next week.
The second milestone should follow that shortly probably in about 1.5
month. I'm talking about the actual release date -
the features will obviously be available in master (and through the
nightly builds) long before.
For your use-case, you might be interested in issue #69 [2] - feel free to
add context to it or add another issue if
you'd like for Hive; the more info there is, the better we can define it.
Thanks,
[1]
Issues · elastic/elasticsearch-hadoop · GitHub
[2] Index aware document writes · Issue #69 · elastic/elasticsearch-hadoop · GitHub
On 14/08/2013 2:26 PM, Val Vakar wrote:
Yes, you're right - I would like #3. It's awesome that you guys are
working on that.
For my project it would be very useful to know:
- What might your solution look like (client-side / server-side /
etc)? 
- When do you expect to release it?
 
Thanks a lot!
-Val
On Wednesday, August 14, 2013 7:09:00 AM UTC-4, Costin Leau wrote:
Awareness of the id is something that we're planning on addressing 
after releasing the current version of
elasticsearch-hadoop. 
This means that there are three main strategies for sending a 
document to ES:
1. create/put-if-absent (applies also when there's no id) 
2. index op-type=index or create document if missing, _replace_ if 
present
3. upsert or create document if missing, _update if present 
If my understanding is correct, you're looking for version 3 - to 
update/merge the document and not replace it (or
upsert). 
Let me know if I'm missed/misunderstood anything. 
Thanks, 
P.S. we're also planning on adding parent/child support - which is 
also based on id awareness.
On 14/08/2013 12:13 AM, Val Vakar wrote: 
> Hey Costin, 
> 
> You're right that Hive doesn't have the concept of an id. ES 
handles such cases elegantly through the _id path mapping
> so the client doesn't need to understand id's; I would like that 
same exact concept for bulk loading through Hive.
> 
> My goal is to run different processing streams (concurrently or 
days/weeks apart) on hadoop - and possibly elsewhere -
> that compute various parts of the same document at their own pace. 
Whenever something is computed, it's upserted into
> ES. It's hard to describe my exact business case, but as a very 
contrived example let's say we're tracking sales data
> for electronic devices: there's a data stream from 
brick-and-mortar stores, another stream from web sales, etc - we
> would like to accumulate all that under each device's own document 
in ES ( say { "device": "iPhone", "sales_data": {
> "January": { "brick_and_mortar": 305, "website": 298 }, 
"February": { "brick_and_mortar": 225, "website": 168 },
> "March": ... } . Basically, it's in the upsert sweet spot. 
> 
> What I'm thinking of is - say we define the table using 
> 
> |TBLPROPERTIES(| 
> |||'es.host'| |= ||'myhost'||,| 
> |||'es.resource'| |= ||'myindex/mytype',|| 
>            'es.insert.strategy' = 'upsert', 
>            'es.id.path' = 'page_id' 
> 
> | 
> where es.insert.strategy = [index]/upsert and es.id.path is the 
path to the id property (much like the server-side _id
> path mapping). 
> 
> This would make it possible to construct the rest request with 
doc_as_upsert semantics - but since we need that id
> mapping, I realize this feels pretty awkward to do in ESSerde and 
elsewhere.
> 
> What do you think? 
> 
> Thanks, 
> -Val 
> 
> 
> 
> 
> 
> 
> On Tuesday, August 13, 2013 1:07:55 PM UTC-4, Val Vakar wrote: 
> 
>     Hello Experts, 
> 
>     I'd really like to leverage bulk upserts from Hive. I could 
implement that in elasticsearch-hadoop, but I need a
>     good way to specify the id (seehttps://
Redirecting to Google Groups
<
Redirecting to Google Groups>
>     <
Redirecting to Google Groups
<
Redirecting to Google Groups>>).
That's totally doable, but it's
>     also awkward since the last time we have the deserialized 
object is in ESSerDe.serialize().
> 
>     Has there been any thought on this on your side? 
> 
>     Thanks! 
>     -Val 
> 
> -- 
> You received this message because you are subscribed to the Google 
Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, 
send an email to
>elasticsearc...@googlegroups.com <javascript:>. 
> For more options, visithttps://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
> 
> 
-- 
Costin 
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin