Elasticsearch query stops working with big amount of data


(Ilya Zayats) #1

The problem: I have 2 identical in terms of settings and mappings indices.

The first index contains only 1 document.
The second index contains the same document + 16M of others.

When I'm running the query on the first index it returns the document, but when I do the same query on the second — I receive nothing.

Indices settings:

{
  "tasks_test": {
    "settings": {
      "index": {
        "analysis": {
          "analyzer": {
            "tag_analyzer": {
              "filter": [
                "lowercase",
                "tag_filter"
              ],
              "tokenizer": "whitespace",
              "type": "custom"
            }
          },
          "filter": {
            "tag_filter": {
              "type": "word_delimiter",
              "type_table": "# => ALPHA"
            }
          }
        },
        "creation_date": "1444127141035",
        "number_of_replicas": "2",
        "number_of_shards": "5",
        "uuid": "wTe6WVtLRTq0XwmaLb7BLg",
        "version": {
          "created": "1050199"
        }
      }
    }
  }
}

Mappings:

{
  "tasks_test": {
    "mappings": {
      "Task": {
        "dynamic": "false",
        "properties": {
          "format": "dateOptionalTime",
          "include_in_all": false,
          "type": "date"
        },
        "is_private": {
          "type": "boolean"
        },
        "last_timestamp": {
          "type": "integer"
        },
        "name": {
          "analyzer": "tag_analyzer",
          "type": "string"
        },
        "project_id": {
          "include_in_all": false,
          "type": "integer"
        },
        "user_id": {
          "include_in_all": false,
          "type": "integer"
        }
      }
    }
  }
}

The document:

{
  "_index": "tasks_test",
  "_type": "Task",
  "_id": "1",
  "_source": {
    "is_private": false,
    "name": "135548- test with number",
    "project_id": 2,
    "user_id": 1
  }
}

The query:

{
  "query": {
    "filtered": {
      "query": {
        "bool": {
          "must": [
            [
              {
                "match": {
                  "_all": {
                    "query": "135548",
                    "type": "phrase_prefix"
                  }
                }
              }
            ]
          ]
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "is_private": false
              }
            },
            {
              "terms": {
                "project_id": [
                  2
                ]
              }
            },
            {
              "terms": {
                "user_id": [
                  1
                ]
              }
            }
          ]
        }
      }
    }
  }
}

Also, some findings:

if I replace _all with name everything works
if I replace match_phrase_prefix with match_phrase works too

ES version: 1.5.1

So, the question is: how to make the query work for the second index without mentioned hacks?


(system) #2