Another "indexing _ids in 0.16" question set


(Alex Piggott) #1

I didn't see this discussed in any of the previous threads on this
topic, apologies if I've missed it.

I have a large index with two types: "parent" and "child", where
"child" is obviously a child of "parent" (ie I used "has_child"
queries). I have many children per parent.

I set the "id" to be of the format "<parent_id>", for reasons
discussed below. I don't know if this was sensible, but I now have a
large index of such objects, so probably doing anything else at this
point is difficult (though I suppose possible).

In 0.15 I deleted a parent's children from the parent's "_id" using a
prefix query, based on the above format. This had the advantage of not
needing to know anything else about the parent (eg the number of
children that it has)

In 0.16 this is no longer possible. I can't easily use the ids query
because I don't always know how many children the parent has.

So four questions:

0] Is there a standard way of deleting all children of a parent, given
only that parent's _id, and I've just missed it and it will magically
solve all my problems?!

(If so, ignore all these other questions! Though [1] may be a bug?)

1] Interestingly when I first upgraded to 0.16.2 (from the last 0.15
branch) all documents still had their _ids indexed (eg a prefix query
on _id would work). I thought the default was to lose the indexing
immediately?

When I added a new document, its _id field was not indexed (ie as
expected).

2] I hoped that setting the mapping for the child type to include _id:
{ index: not_analyzed } at the top level would update the mapping
dynamically, but this did not happen. When I created a new type (for
an existing index) with the same mapping, this did start indexing
_ids. Is this expected? (ie that the _id mapping only gets applied to
new indexes and doesn't get merged)

3] I don't want to set the "index.mapping._id.indexed"="true" at the
node level, because I have lots of other objects whose _ids I don't
need to index. Can I set this for just a single index/type pair? Note
that I set up all my indexes and types programmatically from Java
(using gson), ie the mapping information is not encoded in any
configuration files.

Thanks in advance to anyone for any thoughts!

Alex


(Shay Banon) #2

When you index a child document, a field called _parent is added to it with the parent UID. The _parent field is in the form of parent_type#parent_id (text form). So, you can delete based on that. If that answer does not answer the rest of the questions, ask, I might have missed something.

On Saturday, June 4, 2011 at 1:45 AM, Alex at Ikanow wrote:

I didn't see this discussed in any of the previous threads on this
topic, apologies if I've missed it.

I have a large index with two types: "parent" and "child", where
"child" is obviously a child of "parent" (ie I used "has_child"
queries). I have many children per parent.

I set the "id" to be of the format "<parent_id>", for reasons
discussed below. I don't know if this was sensible, but I now have a
large index of such objects, so probably doing anything else at this
point is difficult (though I suppose possible).

In 0.15 I deleted a parent's children from the parent's "_id" using a
prefix query, based on the above format. This had the advantage of not
needing to know anything else about the parent (eg the number of
children that it has)

In 0.16 this is no longer possible. I can't easily use the ids query
because I don't always know how many children the parent has.

So four questions:

0] Is there a standard way of deleting all children of a parent, given
only that parent's _id, and I've just missed it and it will magically
solve all my problems?!

(If so, ignore all these other questions! Though [1] may be a bug?)

1] Interestingly when I first upgraded to 0.16.2 (from the last 0.15
branch) all documents still had their _ids indexed (eg a prefix query
on _id would work). I thought the default was to lose the indexing
immediately?

When I added a new document, its _id field was not indexed (ie as
expected).

2] I hoped that setting the mapping for the child type to include _id:
{ index: not_analyzed } at the top level would update the mapping
dynamically, but this did not happen. When I created a new type (for
an existing index) with the same mapping, this did start indexing
_ids. Is this expected? (ie that the _id mapping only gets applied to
new indexes and doesn't get merged)

3] I don't want to set the "index.mapping._id.indexed"="true" at the
node level, because I have lots of other objects whose _ids I don't
need to index. Can I set this for just a single index/type pair? Note
that I set up all my indexes and types programmatically from Java
(using gson), ie the mapping information is not encoded in any
configuration files.

Thanks in advance to anyone for any thoughts!

Alex


(Alex Piggott) #3

Thanks - just tried that and (of course) it solves my problem.

The only other noteworthy thing was that when I migrated from
0.15.2(?) to 0.16.2, all the existing documents still had "_id"
indexed - is that the intended functionality? I'm mainly interested
out of curiosity.

On Jun 3, 7:19 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

When you index a child document, a field called _parent is added to it with the parent UID. The _parent field is in the form of parent_type#parent_id (text form). So, you can delete based on that. If that answer does not answer the rest of the questions, ask, I might have missed something.


(Shay Banon) #4

The _id field will not be removed from existing documents, it will simply not be indexed for new documents (unless you set the relevant backward comp. flag).

On Saturday, June 4, 2011 at 6:01 AM, Alex at Ikanow wrote:

Thanks - just tried that and (of course) it solves my problem.

The only other noteworthy thing was that when I migrated from
0.15.2(?) to 0.16.2, all the existing documents still had "_id"
indexed - is that the intended functionality? I'm mainly interested
out of curiosity.

On Jun 3, 7:19 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

When you index a child document, a field called _parent is added to it with the parent UID. The _parent field is in the form of parent_type#parent_id (text form). So, you can delete based on that. If that answer does not answer the rest of the questions, ask, I might have missed something.


(system) #5