"index": "not_analyzed" yet term filter still not working


(Kyle Lahnakoski) #1

I am using Elasticsearch 1.7.1

I am interested in finding a task by task.id. The task.id property is not_analyzed, so this query should work, but it does not.

{"query":{"filtered":{"filter":{"or":[
    {"term":{"task.id":"ER70OtGBQla6YOW5qeivnw"}}
]}}},"from":0,"size":10}

results in

{
    "took":644,
    "timed_out":false,
    "_shards":{"total":30,"successful":30,"failed":0},
    "hits":{"total":0,"max_score":null,"hits":[]}
}

I have tried deleting the document and re-adding it. I forced a flush and a refresh multiple times.

What can be causing this? My index has about 100million documents (30 shards), and some non-trivial number of documents can not be found using task.id. Is it the nature of the values?

Thank you!


Here is some background

This query works

{"query":{"filtered":{"filter":{"or":[
    {"term":{"_id":"tc.480019:48001141.30"}},
    {"term":{"task.id":"ER70OtGBQla6YOW5qeivnw"}}
]}}},"from":0,"size":10}

and returns the following document (I chopped most of the properties in this example):

{
    "took": 516,
    "timed_out": false,
    "_shards": {"total": 30, "successful": 30, "failed": 0},
    "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [{
            "_index": "task20170101_000000",
            "_type": "task",
            "_id": "tc.480019:48001141.30",
            "_score": 1,
            "_source": {
                "task": {
                    "signing": {"cert": [null]},
                    "features": ["taskclusterProxy", "relengAPIProxy", "chainOfTrust"],
                    "maxRunTime": 36000,
                    "image": {
                        "path": "public/image.tar.zst",
                        "type": "task-image",
                        "taskId": "ZqrJzMZ8QOOEoltAG7Y4JA"
                    },
                    "deadline": 1484417664.023,
                    "id": "ER70OtGBQla6YOW5qeivnw",
                    "group": {"id": "aEtr0ac7RA-qeANl9Sa63A"},
<CHOP>
                    "requires": "all-completed"
                },
<CHOP>
        }]
    }
}

Here is the schema (also chopped due to posting limits):

{
    "state": "open",
    "settings": {"index": {
        "refresh_interval": "600s",
        "number_of_shards": "30",
        "creation_date": "1483361238741",
        "recovery": {"initial_shards": "1"},
        "number_of_replicas": "2",
        "version": {"created": "1070199"},
        "uuid": "QxOeRX_gRYK74oOTh9IIdA"
    }},
    "mappings": {"task": {
        "_source": {"compress": true},
        "_id": {"index": "not_analyzed", "store": true},
        "dynamic_templates": [
            {"default_ids": {
                "mapping": {"index": "not_analyzed", "type": "string", "doc_values": true},
                "match": "id"
            }},
            {"default_strings": {
                "mapping": {"index": "not_analyzed", "type": "string", "doc_values": true},
                "match_mapping_type": "string",
                "match": "*"
            }},
            {"default_doubles": {
                "mapping": {"index": "not_analyzed", "type": "double", "doc_values": true},
                "match_mapping_type": "double",
                "match": "*"
            }},
            {"default_longs": {
                "match_pattern": "regex",
                "path_match": ".*",
                "mapping": {"index": "not_analyzed", "type": "long", "doc_values": true},
                "match_mapping_type": "long|integer"
            }}
        ],
        "_all": {"enabled": false},
        "properties": {
            "result": {"properties": {"missing_subtests": {"type": "boolean"}}},
            "resource_usage": {
                "dynamic": "true",
                "properties": {
                    "summary": {"dynamic": "true", "type": "nested"},
                    "samples": {"dynamic": "true", "type": "nested"}
                }
            },
            "task": {
                "dynamic": "true",
                "properties": {
                    "parent": {
                        "dynamic": "true",
                        "properties": {
                            "artifacts_url": {"index": "not_analyzed", "type": "string", "doc_values": true},
                            "id": {"index": "not_analyzed", "type": "string", "doc_values": true}
                        }
                    },
                    "expires": {"type": "double", "doc_values": true},
<CHOP>
                    "features": {"index": "not_analyzed", "type": "string", "doc_values": true},
                    "routes": {"index": "not_analyzed", "type": "string", "doc_values": true},
                    "id": {"index": "not_analyzed", "type": "string", "doc_values": true},
                    "state": {"index": "not_analyzed", "type": "string", "doc_values": true},
                    "deadline": {"type": "double", "doc_values": true},
<CHOP>
                    "requires": {"index": "not_analyzed", "type": "string", "doc_values": true}
                }
            },
<CHOP>
        }
    }},
    "aliases": ["task"]
}

(Kyle Lahnakoski) #2

It turns out the problem is with the search URL. If I use

http://activedata.allizom.org:9200/task/_search

I get the problem above. If I include the type, then the document gets returned:

http://activedata.allizom.org:9200/task/task/_search

This has ever been a problem before, and it is not a problem for all documents. I wonder if the alias name, type name, and top-level property name being all the same is a problem?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.