Term and Has_Child Query Optimization

Term and Has_Child Query Optimization

I'm doing large queries ( 20 terms and 20 has_child queries) and am looking
for ways to optimize the response time which is currently at 8 min on 4
million docs. A pure term query is just a few seconds. At a high level the
has_child query is for collections that users create. Since they change
they are in a child index. The query is meant to capture things the user
"likes" in the form of terms and other users collections so I can't require
any one item and I want to highly rank documents that have allot of liked
terms and collections. The question is are there alternative to the method
I've chosen that is faster? I've included an example.

Numbers
Documents: 4 million
Collection Items: 18 million
on two AWS m3.xlarge with ten shards

Small Example

Mapping

curl -XPUT 'http://localhost:9200/collection-test?pretty=true' -d '{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"mappings" : {

    "document": {
        "properties": {
            "bodyText": { "type": "string" }
        }
    },
 
    "collection_item": {
        "_parent": { "type": "document" },
        "_all" : {"enabled" : false},
        "properties": {                
            "collection_id": { "type": "integer", "index": 

"not_analyzed" }
}
}

}

}'

Documents

curl -XPUT 'http://localhost:9200/collection-test/document/1' -d '{
"bodyText" : "Creativity is inteligence having fun - Albert Einstein"
}'

curl -XPUT 'http://localhost:9200/collection-test/document/2' -d '{
"bodyText" : "Anything one man can imagine, other men can make real. -
Jules Verne"
}'

curl -XPUT 'http://localhost:9200/collection-test/document/3' -d '{
"bodyText" : "Man will become better when you show him what he is like.

  • Anton Chekhov"
    }'

Collections

curl -XPOST localhost:9200/collection-test/collection_item/1?parent=1 -d '{
"collection_id" : "1" }'

curl -XPOST localhost:9200/collection-test/collection_item/2?parent=1 -d '{
"collection_id" : "2" }'
curl -XPOST localhost:9200/collection-test/collection_item/4?parent=2 -d '{
"collection_id" : "2" }'

Multiple Term and Multiple Collection Query

curl -XPOST localhost:9200/collection-test/document/_search?pretty=true -d
'{
"query" : {
"bool" : {
"should" : [
{
"term" : { "bodyText" : { "value" : "anything", "boost" :
1.0 } }
},
{
"term" : { "bodyText" : { "value" : "man", "boost" : 1.0 }}
},
{
"has_child" : {
"type" : "collection_item",
"boost": "1.0",
"query" : {
"term" : { "collection_id" : "1" }
}
}
},
{
"has_child" : {
"type" : "collection_item",
"boost": "1.0",
"query" : {
"term" : { "collection_id" : "2" }
}
}
}
],
"minimum_number_should_match" : 1
}
}
}'

Delete Index

curl -XDELETE 'http://localhost:9200/collection-test/'

Large Query Example

curl -XPOST localhost:9200/collection-test /document/_search?pretty=true -d
'{
"fields" : ["_id", "title","summary"],
"query" : {
"bool" : {
"should" : [
{
"query_string" : { "default_field" : "bodyText", "query" :
""harry potter"^1.0" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""j.k. rowling"^0.4083824" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""final movie"^0.40137964" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""fantasy series"^0.3629825" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""box office records"^0.35038263" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""breaking dawn"^0.11963159" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""final installment"^0.11438772" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""film series"^0.35038263" }
},
{
"term" : { "bodyText" : { "value" : "potter", "boost" :
0.805837 } }
},
{
"term" : { "bodyText" : { "value" : "deathly", "boost" :
0.46554363 }
},
{
"term" : { "bodyText" : { "value" : "hallows", "boost" :
0.46430007 }}
},
{
"term" : { "bodyText" : { "value" : "rowling", "boost" :
0.3994508 } }
},
{
"term" : { "bodyText" : { "value" : "j.k.", "boost" :
0.39741242 }}
},
{
"term" : { "bodyText" : { "value" : "pottermore", "boost" :
0.36284378 } }
},
{
"term" : { "bodyText" : { "value" : "dumbledore", "boost" :
0.36096284 }}
},
{
"term" : { "bodyText" : { "value" : "muggles", "boost" :
0.3579579 } }
},
{
"term" : { "bodyText" : { "value" : "harry", "boost" :
0.17482029 }}
},
{
"term" : { "bodyText" : { "value" : "grint", "boost" :
0.12138573 } }
},
{
"term" : { "bodyText" : { "value" : "hogwarts", "boost" :
0.119226046 }}
},
{
"term" : { "bodyText" : { "value" : "blackly", "boost" :
0.11385573 } }
},
{
"has_child" : {
"type" : "collection_item",
"boost": "1.0",
"query" : {
"term" : { "collection_id" : "445" }
}
}
},
{
"has_child" : {
"type" : "collection_item",

                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "529" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "93" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "480" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "341" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "99" }
                }
            }
        },
        {

            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "563" } 
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "34" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "347" } 
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "355" } 
                }
            }
        },
        {

            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "571" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "95" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "96" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "108" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "435" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "474" }
                }
            }
        },
        {

            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "550" } 
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "326" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "514" } 
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "490" } 
                }
            }
        }

    ],

    "minimum_number_should_match" : 1
  }
}

}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Forgot to ask a question. Would one solution be to put the collection item
data in a list field on the document in the same type and pay the penalty
of doing adds and deletes to update but get the speedup from doing
everything as a term query? Its more likely that users will add new stuff
to collections so old stuff won't change as often and new docs could be
separated in their own index.

On Tuesday, 26 February 2013 17:14:28 UTC-6, David Hagar wrote:

Term and Has_Child Query Optimization

I'm doing large queries ( 20 terms and 20 has_child queries) and am
looking for ways to optimize the response time which is currently at 8 min
on 4 million docs. A pure term query is just a few seconds. At a high level
the has_child query is for collections that users create. Since they change
they are in a child index. The query is meant to capture things the user
"likes" in the form of terms and other users collections so I can't require
any one item and I want to highly rank documents that have allot of liked
terms and collections. The question is are there alternative to the method
I've chosen that is faster? I've included an example.

Numbers
Documents: 4 million
Collection Items: 18 million
on two AWS m3.xlarge with ten shards

Small Example

Mapping

curl -XPUT 'http://localhost:9200/collection-test?pretty=true' -d '{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"mappings" : {

    "document": {
        "properties": {
            "bodyText": { "type": "string" }
        }
    },
 
    "collection_item": {
        "_parent": { "type": "document" },
        "_all" : {"enabled" : false},
        "properties": {                
            "collection_id": { "type": "integer", "index": 

"not_analyzed" }
}
}

}

}'

Documents

curl -XPUT 'http://localhost:9200/collection-test/document/1' -d '{
"bodyText" : "Creativity is inteligence having fun - Albert Einstein"
}'

curl -XPUT 'http://localhost:9200/collection-test/document/2' -d '{
"bodyText" : "Anything one man can imagine, other men can make real. -
Jules Verne"
}'

curl -XPUT 'http://localhost:9200/collection-test/document/3' -d '{
"bodyText" : "Man will become better when you show him what he is
like. - Anton Chekhov"
}'

Collections

curl -XPOST localhost:9200/collection-test/collection_item/1?parent=1 -d
'{ "collection_id" : "1" }'

curl -XPOST localhost:9200/collection-test/collection_item/2?parent=1 -d
'{ "collection_id" : "2" }'
curl -XPOST localhost:9200/collection-test/collection_item/4?parent=2 -d
'{ "collection_id" : "2" }'

Multiple Term and Multiple Collection Query

curl -XPOST localhost:9200/collection-test/document/_search?pretty=true -d
'{
"query" : {
"bool" : {
"should" : [
{
"term" : { "bodyText" : { "value" : "anything", "boost" :
1.0 } }
},
{
"term" : { "bodyText" : { "value" : "man", "boost" : 1.0 }}
},
{
"has_child" : {
"type" : "collection_item",
"boost": "1.0",
"query" : {
"term" : { "collection_id" : "1" }
}
}
},
{
"has_child" : {
"type" : "collection_item",
"boost": "1.0",
"query" : {
"term" : { "collection_id" : "2" }
}
}
}
],
"minimum_number_should_match" : 1
}
}
}'

Delete Index

curl -XDELETE 'http://localhost:9200/collection-test/'

Large Query Example

curl -XPOST localhost:9200/collection-test /document/_search?pretty=true
-d '{
"fields" : ["_id", "title","summary"],
"query" : {
"bool" : {
"should" : [
{
"query_string" : { "default_field" : "bodyText", "query" :
""harry potter"^1.0" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""j.k. rowling"^0.4083824" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""final movie"^0.40137964" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""fantasy series"^0.3629825" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""box office records"^0.35038263" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""breaking dawn"^0.11963159" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""final installment"^0.11438772" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""film series"^0.35038263" }
},
{
"term" : { "bodyText" : { "value" : "potter", "boost" :
0.805837 } }
},
{
"term" : { "bodyText" : { "value" : "deathly", "boost" :
0.46554363 }
},
{
"term" : { "bodyText" : { "value" : "hallows", "boost" :
0.46430007 }}
},
{
"term" : { "bodyText" : { "value" : "rowling", "boost" :
0.3994508 } }
},
{
"term" : { "bodyText" : { "value" : "j.k.", "boost" :
0.39741242 }}
},
{
"term" : { "bodyText" : { "value" : "pottermore", "boost"
: 0.36284378 } }
},
{
"term" : { "bodyText" : { "value" : "dumbledore", "boost"
: 0.36096284 }}
},
{
"term" : { "bodyText" : { "value" : "muggles", "boost" :
0.3579579 } }
},
{
"term" : { "bodyText" : { "value" : "harry", "boost" :
0.17482029 }}
},
{
"term" : { "bodyText" : { "value" : "grint", "boost" :
0.12138573 } }
},
{
"term" : { "bodyText" : { "value" : "hogwarts", "boost" :
0.119226046 }}
},
{
"term" : { "bodyText" : { "value" : "blackly", "boost" :
0.11385573 } }
},
{
"has_child" : {
"type" : "collection_item",
"boost": "1.0",
"query" : {
"term" : { "collection_id" : "445" }
}
}
},
{
"has_child" : {
"type" : "collection_item",

                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "529" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "93" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "480" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "341" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "99" }
                }
            }
        },
        {

            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "563" } 
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "34" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "347" } 
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "355" } 
                }
            }
        },
        {

            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "571" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "95" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "96" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "108" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "435" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "474" }
                }
            }
        },
        {

            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "550" } 
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
                
                "boost": "1.0", 
                "query" : {
                    "term" : { "collection_id" : "326" }
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "514" } 
                }
            }
        },
        {
            "has_child" : {       
                "type" : "collection_item",
               
                "boost": "1.0", 
                "query" : {  
                    "term" : { "collection_id" : "490" } 
                }
            }
        }

    ],

    "minimum_number_should_match" : 1
  }
}

}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David

I'm doing large queries ( 20 terms and 20 has_child queries) and am
looking for ways to optimize the response time which is currently at 8
min on 4 million docs. A pure term query is just a few seconds. At a
high level the has_child query is for collections that users create.
Since they change they are in a child index. The query is meant to
capture things the user "likes" in the form of terms and other users
collections so I can't require any one item and I want to highly rank
documents that have allot of liked terms and collections. The question
is are there alternative to the method I've chosen that is faster?
I've included an example.

Your "large" example is performing a lot of queries, in an inefficient
manner. Use filters whenever possible - filters can be cached, while
queries cannot.

For these term queries, you could rewrite them as a custom_filter_score
query, so that they contribute to scoring, but perform more efficiently,
because they are cached filters:

        {
            "term" : { "bodyText" : { "value" : "potter",

"boost" : 0.805837 } }
},
{
"term" : { "bodyText" : { "value" : "deathly",
"boost" : 0.46554363 }
},

eg:

custom_filters_score: {
    query: { .... your full text queries ... },
    score_mode: "multiply",
    filters: [{
        boost: 0.805837,
        filter: { term: { bodyText: "potter" }}
    },{
        ... etc ...
    }]
}

Similarly, use has_child filters instead of queries, and wrap all the
clauses into a single has_child clause:

{ filtered: {
   query: { custom_filters_score: {... query from above ... }},
   filter: {
       has_child: {
           filter: {
               terms: { collection_id: [550,490,....]}
           }
       }
   }
}

I'm not sure of your intention with has_child. Do you want to check
whether it has children in any of these collections (ie yes/no) or do
you want the document to score higher the more collections it has?

The former is handled by my filter above. The latter you could rewrite
as a custom_filters_score query which is passed to a has_child query:

{ has_child: { 
     query: { 
        custom_filters_score: { the query above },
     },
     score_mode: "total",
     filters: [
       { filter: {term: { collection_id: 550}}, boost: 1},
       { filter: {term: { collection_id: 490}}, boost: 1},
       etc
     ]
 }}

Also, it's curious that you're doing 'term' queries on the bodyText,
because term queries look for exact terms, but your field is analyzed.
So for instance, this clause will never match:

"term" : { "bodyText" : { "value" : "j.k.", "boost" : 0.39741242 }}

The text "J.K." would be indexed as the terms ["j","k"], so there is no
"j.k." term to be found.

Also, the more queries you do, the more work Elasticsearch has to do,
and the longer searches will take.

hth

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The goal is to return the documents that have the most words and that are
in the most collections without requiring either. Each of the words and
collections also has a relevance weight driven by the user. I think of it
as a search vector of weighted terms and collection tags where the scoring
finds the docs that are most similar doing the appropriate weighting for
frequency of terms and collection sizes. If I were solving this without an
index I would just compute the cosine distance between the search vector
and each doc vector where both have weighted and normalized terms and tags.

My ideal doc score is similar to a dot product:

doc score = search_term_boost_1 * lucene_weighted_freq_of_term_1_in_doc +
search_term_boost_2 * lucene_weighted_freq_of_term_2_in_doc + ... +
balanceFactor * ( search_collection_boost_1 *
(collection_1_has_doc/collection_size_1) + search_collection_boost_2 *
(collection_2_has_doc/collection_size_2) ... )

where collection_x_has_doc is 1 if doc is in collection and 0 if not.

In the use of custom_filters_score, doesn't it require all results to pass
at least one of the filters? I need to have a result just match words if
thats all there is. Also in your examples of switching to filters, the
filter restricts the result of a query. In my case I don't have a root
query to be restricted, just a bunch of doc attributes (terms and found in
collections) that are a representative of what I'm looking for but are not
required.

Thanks for spotting the "j.k." tokenization issue. My simplification of our
schema left out our custom tokenizer, which sees it as a single token.

I've tried this single has_child with a sum score_type and the performance
is the same. Here is the example.

curl -XPOST localhost:9200/neuron/document/_search?pretty=true -d '{
"fields" : ["_id", "title","summary"],
"query" : {
"bool" : {
"should" : [
{
"query_string" : { "default_field" : "bodyText", "query" :
""harry potter"^1.0" }
},
{
"query_string" : { "default_field" : "bodyText", "query" :
""j.k. rowling"^0.4083824" }
},

       {

            "has_child" : {       
                "type" : "collection_item",
                "boost": "1.0", 
                "score_type" : "sum",
                "query" : {
                  "bool" : {       
                      "should" : [
                            { "term" : { "collection_id" : { "value" : 

"93", "boost" : 1.0 }}},
{ "term" : { "collection_id" : { "value" :
"480", "boost" : 1.0 }}},
...

                            { "term" : { "collection_id" : { "value" : 

"529", "boost" : 1.0 }}}
],

                    "minimum_number_should_match" : 1
                   }
                }
            }
        }
    ],
    "minimum_number_should_match" : 1
  }
}

}'

This is my third question here and you've answered all three and its much
appreciated. :slight_smile:

On Wednesday, 27 February 2013 04:37:36 UTC-6, Clinton Gormley wrote:

Hi David

I'm doing large queries ( 20 terms and 20 has_child queries) and am
looking for ways to optimize the response time which is currently at 8
min on 4 million docs. A pure term query is just a few seconds. At a
high level the has_child query is for collections that users create.
Since they change they are in a child index. The query is meant to
capture things the user "likes" in the form of terms and other users
collections so I can't require any one item and I want to highly rank
documents that have allot of liked terms and collections. The question
is are there alternative to the method I've chosen that is faster?
I've included an example.

Your "large" example is performing a lot of queries, in an inefficient
manner. Use filters whenever possible - filters can be cached, while
queries cannot.

For these term queries, you could rewrite them as a custom_filter_score
query, so that they contribute to scoring, but perform more efficiently,
because they are cached filters:

        { 
            "term" : { "bodyText" : { "value" : "potter", 

"boost" : 0.805837 } }
},
{
"term" : { "bodyText" : { "value" : "deathly",
"boost" : 0.46554363 }
},

eg:

custom_filters_score: { 
    query: { .... your full text queries ... }, 
    score_mode: "multiply", 
    filters: [{ 
        boost: 0.805837, 
        filter: { term: { bodyText: "potter" }} 
    },{ 
        ... etc ... 
    }] 
} 

Similarly, use has_child filters instead of queries, and wrap all the
clauses into a single has_child clause:

{ filtered: { 
   query: { custom_filters_score: {... query from above ... }}, 
   filter: { 
       has_child: { 
           filter: { 
               terms: { collection_id: [550,490,....]} 
           } 
       } 
   } 
} 

I'm not sure of your intention with has_child. Do you want to check
whether it has children in any of these collections (ie yes/no) or do
you want the document to score higher the more collections it has?

The former is handled by my filter above. The latter you could rewrite
as a custom_filters_score query which is passed to a has_child query:

{ has_child: { 
     query: { 
        custom_filters_score: { the query above }, 
     }, 
     score_mode: "total", 
     filters: [ 
       { filter: {term: { collection_id: 550}}, boost: 1}, 
       { filter: {term: { collection_id: 490}}, boost: 1}, 
       etc 
     ] 
 }} 

Also, it's curious that you're doing 'term' queries on the bodyText,
because term queries look for exact terms, but your field is analyzed.
So for instance, this clause will never match:

"term" : { "bodyText" : { "value" : "j.k.", "boost" : 0.39741242 }}

The text "J.K." would be indexed as the terms ["j","k"], so there is no
"j.k." term to be found.

Also, the more queries you do, the more work Elasticsearch has to do,
and the longer searches will take.

hth

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

In the use of custom_filters_score, doesn't it require all results to
pass at least one of the filters? I need to have a result just match
words if thats all there is.

No it doesn't. The custom_filters_score will tweak the score IF a filter
matches, but no match is required.

Also in your examples of switching to filters, the filter restricts
the result of a query. In my case I don't have a root query to be
restricted, just a bunch of doc attributes (terms and found in
collections) that are a representative of what I'm looking for but are
not required.

Well, your root query is the bool query containing the full text queries
(eg "harry potter" etc).

So the idea is to use queries just for the full text part, and filters
for everything else. A filter either matches or it doesn't, so you can
use that to apply a boost or not. It won't take TF/IDF into account at
all, which I think (not having understood the dot product bit :slight_smile: would
serve your purposes.

I've tried this single has_child with a sum score_type and the
performance is the same. Here is the example.

OK - wasn't sure if using multiple has_child's was having a big impact,
but it looks like it is just the number of queries. The more you can
use filters the better (assuming your filters will be reused in
subsequent searches -- filters are slightly faster anyway, but their
major contribution to performance is through caching).

This is my third question here and you've answered all three and its
much appreciated. :slight_smile:

Good questions are always welcome :slight_smile:

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

My problem may seem complicated but its actually simple. I want to query
with lots of terms and collection tags/ids on the document that behave
scoring wise like plain words without requiring any of them. Its
complicated because my collections change.

What do you think of moving the child index collection data into a document
field and pay the price of updating/reindexing the whole doc whenever a doc
is added to a collection? Running lots of plain term searches works well.
Have you seen numbers like X updates per hour on a index of size Y reduces
performance Z percent?

Also custom_filters_score, from what I'm reading overrides the query score
for what its filtering. If I have words A, B, and C and collections 1, 2,
and 3 and there is only one doc in common between B and 2 and no other
words or collections share a doc, that one doc should score the highest but
all the docs with A,B,C,1,2,and,3 should be in the results.

On Friday, 1 March 2013 06:35:51 UTC-6, Clinton Gormley wrote:

Hiya

In the use of custom_filters_score, doesn't it require all results to
pass at least one of the filters? I need to have a result just match
words if thats all there is.

No it doesn't. The custom_filters_score will tweak the score IF a filter
matches, but no match is required.

Also in your examples of switching to filters, the filter restricts
the result of a query. In my case I don't have a root query to be
restricted, just a bunch of doc attributes (terms and found in
collections) that are a representative of what I'm looking for but are
not required.

Well, your root query is the bool query containing the full text queries
(eg "harry potter" etc).

So the idea is to use queries just for the full text part, and filters
for everything else. A filter either matches or it doesn't, so you can
use that to apply a boost or not. It won't take TF/IDF into account at
all, which I think (not having understood the dot product bit :slight_smile: would
serve your purposes.

I've tried this single has_child with a sum score_type and the
performance is the same. Here is the example.

OK - wasn't sure if using multiple has_child's was having a big impact,
but it looks like it is just the number of queries. The more you can
use filters the better (assuming your filters will be reused in
subsequent searches -- filters are slightly faster anyway, but their
major contribution to performance is through caching).

This is my third question here and you've answered all three and its
much appreciated. :slight_smile:

Good questions are always welcome :slight_smile:

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What do you think of moving the child index collection data into a
document field and pay the price of updating/reindexing the whole doc
whenever a doc is added to a collection?

That's certainly an option, and it would improve performance. The amount
of performance improvement I wouldn't know.

Running lots of plain term searches works well. Have you seen
numbers like X updates per hour on a index of size Y reduces
performance Z percent?

Elasticsearch handles continual updating of data very well. But again,
exact numbers are hard to pin down. It'd be a case of try it and see.

Also custom_filters_score, from what I'm reading overrides the query
score for what its filtering.

No, it's COMBINED with the query score. The filters are used to tweak
the _score returned by the query. How the filters affect the score can
be controlled with the score_mode parameter.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.