Elasticsearch 5.0.0_search?q=date returns non matching dates as well


(glagidse) #1

I'm currently reading this:
https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-analysis.html

GET /_search?q=date:2014/09/15

I figured out that I shouold replace the hypens with slashes, but I still get 11 results instead of just one.


I've inserted the data trough this bulk insert

{ "create": { "_index": "us", "_type": "user", "_id": "1" }}
{ "email" : "john@smith.com", "name" : "John Smith", "username" : "@john" }
{ "create": { "_index": "gb", "_type": "user", "_id": "2" }}
{ "email" : "mary@jones.com", "name" : "Mary Jones", "username" : "@mary" }
{ "create": { "_index": "gb", "_type": "tweet", "_id": "3" }}
{ "date" : "2014-09-13", "name" : "Mary Jones", "tweet" : "Elasticsearch means full text search has never been so easy", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "4" }}
{ "date" : "2014-09-14", "name" : "John Smith", "tweet" : "@mary it is not just text, it does everything", "user_id" : 1 }
{ "create": { "_index": "gb", "_type": "tweet", "_id": "5" }}
{ "date" : "2014-09-15", "name" : "Mary Jones", "tweet" : "However did I manage before Elasticsearch?", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "6" }}
{ "date" : "2014-09-16", "name" : "John Smith",  "tweet" : "The Elasticsearch API is really easy to use", "user_id" : 1 }
{ "create": { "_index": "gb", "_type": "tweet", "_id": "7" }}
{ "date" : "2014-09-17", "name" : "Mary Jones", "tweet" : "The Query DSL is really powerful and flexible", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "8" }}
{ "date" : "2014-09-18", "name" : "John Smith", "user_id" : 1 }
{ "create": { "_index": "gb", "_type": "tweet", "_id": "9" }}
{ "date" : "2014-09-19", "name" : "Mary Jones", "tweet" : "Geo-location aggregations are really cool", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "10" }}
{ "date" : "2014-09-20", "name" : "John Smith", "tweet" : "Elasticsearch surely is one of the hottest new NoSQL products", "user_id" : 1 }
{ "create": { "_index": "gb", "_type": "tweet", "_id": "11" }}
{ "date" : "2014-09-21", "name" : "Mary Jones", "tweet" : "Elasticsearch is built for the cloud, easy to scale", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "12" }}
{ "date" : "2014-09-22", "name" : "John Smith", "tweet" : "Elasticsearch and I have left the honeymoon stage, and I still love her.", "user_id" : 1 }
{ "create": { "_index": "gb", "_type": "tweet", "_id": "13" }}
{ "date" : "2014-09-23", "name" : "Mary Jones", "tweet" : "So yes, I am an Elasticsearch fanboy", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "14" }}
{ "date" : "2014-09-24", "name" : "John Smith", "tweet" : "How many more cheesy tweets do I have to write?", "user_id" : 1 }

If I run GET on /gb,us/_mapping I get this:

{"us":{"mappings":{"tweet":{"properties":{"date":{"type":"date"},"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"tweet":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"user_id":{"type":"long"}}},"user":{"properties":{"email":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"username":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}}

If I run GET /gb,us/_search I get this:

{"took":1,"timed_out":false,"_shards":{"total":10,"successful":10,"failed":0},"hits":{"total":13,"max_score":1.0,"hits":[{"_index":"gb","_type":"tweet","_id":"5","_score":1.0,"_source":{ "date" : "2014-09-15", "name" : "Mary Jones", "tweet" : "However did I manage before Elasticsearch?", "user_id" : 2 }},{"_index":"gb","_type":"tweet","_id":"9","_score":1.0,"_source":{ "date" : "2014-09-19", "name" : "Mary Jones", "tweet" : "Geo-location aggregations are really cool", "user_id" : 2 }},{"_index":"us","_type":"tweet","_id":"8","_score":1.0,"_source": ... removed because max characters reached

If I run GET /gb,us/_search?q=date:2014/09/15 I get this, I expected 1 result:

{"took":16,"timed_out":false,"_shards":{"total":10,"successful":10,"failed":0},"hits":{"total":11,"max_score":1.6099695,"hits":[{"_index":"gb","_type":"tweet","_id":"5","_score":1.6099695,"_source":{ "date" : "2014-09-15", "name" : "Mary Jones", "tweet" : "However did I manage before Elasticsearch?", "user_id" : 2 }},{"_index":"gb","_type":"tweet","_id":"9","_score":1.0,"_source":{ "date" : "2014-09-19", "name" : "Mary Jones", "tweet" : "Geo-location aggregations are really cool", "user_id" : 2 }},{"_index":"us","_type":"tweet","_id":"8","_score":1.0,"_source":{ "date" : "2014-09-18", "name" : "John Smith", "user_id" : 1 }},{"_index":"us","_type":"tweet","_id":"10","_score":1.0," ,  .... removed rest because max characters reached

(David Pilato) #2

Please provide a full example with mapping and so on.


(glagidse) #3

Okay I have updated the queston, here is the complete result of GET /gb,us/_search?q=date:2014/09/15

{"took":16,"timed_out":false,"_shards":{"total":10,"successful":10,"failed":0},"hits":{"total":11,"max_score":1.6099695,"hits":[{"_index":"gb","_type":"tweet","_id":"5","_score":1.6099695,"_source":{ "date" : "2014-09-15", "name" : "Mary Jones", "tweet" : "However did I manage before Elasticsearch?", "user_id" : 2 }},{"_index":"gb","_type":"tweet","_id":"9","_score":1.0,"_source":{ "date" : "2014-09-19", "name" : "Mary Jones", "tweet" : "Geo-location aggregations are really cool", "user_id" : 2 }},{"_index":"us","_type":"tweet","_id":"8","_score":1.0,"_source":{ "date" : "2014-09-18", "name" : "John Smith", "user_id" : 1 }},{"_index":"us","_type":"tweet","_id":"10","_score":1.0,"_source":{ "date" : "2014-09-20", "name" : "John Smith", "tweet" : "Elasticsearch surely is one of the hottest new NoSQL products", "user_id" : 1 }},{"_index":"us","_type":"tweet","_id":"12","_score":1.0,"_source":{ "date" : "2014-09-22", "name" : "John Smith", "tweet" : "Elasticsearch and I have left the honeymoon stage, and I still love her.", "user_id" : 1 }},{"_index":"us","_type":"tweet","_id":"4","_score":1.0,"_source":{ "date" : "2014-09-14", "name" : "John Smith", "tweet" : "@mary it is not just text, it does everything", "user_id" : 1 }},{"_index":"us","_type":"tweet","_id":"6","_score":1.0,"_source":{ "date" : "2014-09-16", "name" : "John Smith",  "tweet" : "The Elasticsearch API is really easy to use", "user_id" : 1 }},{"_index":"gb","_type":"tweet","_id":"7","_score":1.0,"_source":{ "date" : "2014-09-17", "name" : "Mary Jones", "tweet" : "The Query DSL is really powerful and flexible", "user_id" : 2 }},{"_index":"gb","_type":"tweet","_id":"13","_score":1.0,"_source":{ "date" : "2014-09-23", "name" : "Mary Jones", "tweet" : "So yes, I am an Elasticsearch fanboy", "user_id" : 2 }},{"_index":"gb","_type":"tweet","_id":"3","_score":1.0,"_source":{ "date" : "2014-09-13", "name" : "Mary Jones", "tweet" : "Elasticsearch means full text search has never been so easy", "user_id" : 2 }}]}}

could not fit in question because of character limit.


(glagidse) #4

Ah figured it out, at first I used GET /_search?q=date:2014-09-15, just like it says in the tutorial, but I get an error

{
"took": 47,
"timed_out": false,
"_shards": {
    "total": 20,
    "successful": 15,
    "failed": 5,
    "failures": [
        {
            "shard": 0,
            "index": "website",
            "node": "mQrh0NSTQCaiMZPgjUJwrQ",
            "reason": {
                "type": "query_shard_exception",
                "reason": "failed to create query: {\n  \"query_string\" : {\n    \"query\" : \"date:2014-09-15\",\n    \"fields\" : [ ],\n    \"use_dis_max\" : true,\n    \"tie_breaker\" : 0.0,\n    \"default_operator\" : \"or\",\n    \"auto_generate_phrase_queries\" : false,\n    \"max_determined_states\" : 10000,\n    \"lowercase_expanded_terms\" : true,\n    \"enable_position_increment\" : true,\n    \"fuzziness\" : \"AUTO\",\n    \"fuzzy_prefix_length\" : 0,\n    \"fuzzy_max_expansions\" : 50,\n    \"phrase_slop\" : 0,\n    \"analyze_wildcard\" : false,\n    \"locale\" : \"und\",\n    \"escape\" : false,\n    \"boost\" : 1.0\n  }\n}",
                "index_uuid": "17MbYpE8RVSOggNG-ILxgA",
                "index": "website",
                "caused_by": {
                    "type": "parse_exception",
                    "reason": "failed to parse date field [2014-09-15] with format [yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis]",
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "Parse failure at index [4] of [2014-09-15]"
                    }
                }
            }
        }
    ]
},
"hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
        {
            "_index": "gb",
            "_type": "tweet",
            "_id": "5",
            "_score": 1,
            "_source": {
                "date": "2014-09-15",
                "name": "Mary Jones",
                "tweet": "However did I manage before Elasticsearch?",
                "user_id": 2
            }
        }
    ]
}
}

Then I figured that I had to replace hypes with slashes, but then it gives incorrect results, but if I do this: then it works
GET /gb,us/_search?q=date:2014-09-15

{
"took": 1,
"timed_out": false,
"_shards": {
    "total": 10,
    "successful": 10,
    "failed": 0
},
"hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
        {
            "_index": "gb",
            "_type": "tweet",
            "_id": "5",
            "_score": 1,
            "_source": {
                "date": "2014-09-15",
                "name": "Mary Jones",
                "tweet": "However did I manage before Elasticsearch?",
                "user_id": 2
            }
        }
    ]
}
}

(system) #5