I often need to get the _ids matching a query so I can retrieve the
appropriate children (still waiting on that has_parent query I've
found that faceting _id is faster than iterating/scanning through the
query results and extracting the _id. The new changes to _id seem to
make this difficult. Am I missing an alternative way to do what seems
to be a common task?
You get the _id for each document back in the search result, or am I missing something else?
Note, you can always enable indexing the _id as well. But lets see if you really need to.
On Thursday, April 28, 2011 at 11:32 PM, merrellb wrote:
I often need to get the _ids matching a query so I can retrieve the
appropriate children (still waiting on that has_parent query I've
found that faceting _id is faster than iterating/scanning through the
query results and extracting the _id. The new changes to _id seem to
make this difficult. Am I missing an alternative way to do what seems
to be a common task?
Before SCAN I got in the habit of faceting _id whenever I needed the
_ids from a query, it was much faster than iterating through the query
results (I often have several hundred thousand records in my
results). Using scan to extract the ids still seems a bit slower than
faceting (I imagine because there is more overhead per item
returned). However in the long run I realize I need to be careful
with faceting because of the potential for excessive memory usage.
I suppose my question is, what is the best workflow for what (in my
cases at least) is a common operation (ie performing queries on
parents and joining them to their children). My current process is:
Iterate through a scan query on the parent making sure to set
fields =
Extract the id field from the results
For each _id, perform a term query on the _parent field to
retrieve the proper child.
a) Is this the best approach using currently available methods?
b) Any more though to having a simple "join" or "has_parent" method?
You get the _id for each document back in the search result, or am I missing something else?
Note, you can always enable indexing the _id as well. But lets see if you really need to.
On Thursday, April 28, 2011 at 11:32 PM, merrellb wrote:
I often need to get the _ids matching a query so I can retrieve the
appropriate children (still waiting on that has_parent query I've
found that faceting _id is faster than iterating/scanning through the
query results and extracting the _id. The new changes to _id seem to
make this difficult. Am I missing an alternative way to do what seems
to be a common task?
I am still not sure how, based on what you specify, faceting on the id helps? What is the search query that you execute when you do the faceting?
When you say join, you mean that each document will also include the child documents, right? This is possible, just need to be slated to be implemented.
On Friday, April 29, 2011 at 12:21 AM, merrellb wrote:
Before SCAN I got in the habit of faceting _id whenever I needed the
_ids from a query, it was much faster than iterating through the query
results (I often have several hundred thousand records in my
results). Using scan to extract the ids still seems a bit slower than
faceting (I imagine because there is more overhead per item
returned). However in the long run I realize I need to be careful
with faceting because of the potential for excessive memory usage.
I suppose my question is, what is the best workflow for what (in my
cases at least) is a common operation (ie performing queries on
parents and joining them to their children). My current process is:
Iterate through a scan query on the parent making sure to set
fields =
Extract the id field from the results
For each _id, perform a term query on the _parent field to
retrieve the proper child.
a) Is this the best approach using currently available methods?
b) Any more though to having a simple "join" or "has_parent" method?
You get the _id for each document back in the search result, or am I missing something else?
Note, you can always enable indexing the _id as well. But lets see if you really need to.
On Thursday, April 28, 2011 at 11:32 PM, merrellb wrote:
I often need to get the _ids matching a query so I can retrieve the
appropriate children (still waiting on that has_parent query I've
found that faceting _id is faster than iterating/scanning through the
query results and extracting the _id. The new changes to _id seem to
make this difficult. Am I missing an alternative way to do what seems
to be a common task?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.