Elasticsearch - return records having a field falling in the given range


(Prakritidev Verma) #1

A subset of my data looks like:

{
  "6220":{
    "abstract":"We investigate the two-dimensional $\\mathcal{
                        N            
    }=(2,
                2)$ supersymmetric\nYang-Mills (SYM) theory on the discretized curved space (polyhedra). We first\nrevisit that the number of supersymmetries of the continuum $\\mathcal{
                        N            
    }=(2,
                2)$\nSYM theory on any curved manifold can be enhanced at least to two by\nintroducing an appropriate $U(1)$ gauge background associated with the\n$U(1)_{
                        V            
    }$ symmetry. We then show that the generalized Sugino model on the\ndiscretized curved space,
                 which was proposed in our previous work,
                 can be\nidentified to the discretization of this SUSY enhanced theory,
                 where one of the\nsupersymmetries remains and the other is broken but restored in the continuum\nlimit. We find that the $U(1)_{
                        A            
    }$ anomaly exists also in the discretized\ntheory as a result of an unbalance of the number of the fermions proportional\nto the Euler characteristics of the polyhedra. We then study this model by\nusing the numerical Monte-Carlo simulation. We propose a novel phase-quench\nmethod called \"anomaly-phase-quenched approximation\" with respect to the\n$U(1)_A$ anomaly. We numerically show that the Ward-Takahashi (WT) identity\nassociated with the remaining supersymmetry is realized by adopting this\napproximation. We figure out the relation between the sign (phase) problem and\npseudo-zero-modes of the Dirac operator. We also show that the divergent\nbehavior of the scalar one-point function gets milder as the genus of the\nbackground increases. These are the first numerical observations for the\nsupersymmetric lattice model on the curved space with generic topologies.",
    "arxiv_id":"1607.01260",
    "authors":[
      "Kamata Syo",
      "Matsuura So",
      "Misumi Tatsuhiro",
      "Ohta Kazutoshi"
    ],
    "categories":[
      "hep-th",
      "hep-lat"
    ],
    "created":"2016-07-05 00:00:00",
    "doi":"10.1093\/ptep\/ptw153",
    "primary_category":"physics",
    "title":"Anomaly and Sign problem in $\\mathcal{
                        N            
    }=(2,
                2)$ SYM on Polyhedra :\n  Numerical Analysis",
    "updated":1473724800000
  },
  "407":{
    "abstract":"In this paper,
                 we use the methods of subriemannian geometry to study the dual\nfoliation of the singular Riemannian foliation induced by isometric Lie group\nactions on a complete Riemannian manifold M. We show that under some\nconditions,
                 the dual foliation has only one leaf.",
    "arxiv_id":"1408.0060",
    "authors":[
      "Shi Yi"
    ],
    "categories":[
      "math.DG"
    ],
    "created":"2014-07-31T00:00:00",
    "doi":null,
    "primary_category":"math",
    "title":"The dual foliation of some singular Riemannian foliations",
    "updated":1483574400000
  }
}

I need to look for all records that have created in a given range.

I tried:

GET _search
{
    "query": {
        "range" : {
            "created" : {
                "gte": "2012-01-01",
                "lte": "2018-01-01",
                "format": "yyyy-MM-dd"
            }
        }
    }
}

I got:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 13,
    "successful" : 13,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

I expected a couple of hits for that query. I have verified that there are nearly 500 documents in my ES index.

What am I doing wrong here?


(David Pilato) #2

What is your mapping?


(Waqas) #3

What mapping are you talking about? @dadoonet


(David Pilato) #4

@waqashamid https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html


(Prakritidev Verma) #5

Hi David,

This is the mapping

PUT arxivdb 
{
  "mappings": {
    "_doc": { 
      "properties": { 
        "arxiv_id":    { "type": "text"  },
        "title" :  { "type": "text"  },
        "abstract":     { "type": "text"  }, 
        "authors":      { "type": "text" },  
        "categories": {"type": "text"},
        "created":  {
          "type":   "date", 
          "format": "yyyy-MM-dd'T'HH:mm:ss"
        },
        "doi": {"type":"text"},
        "primary_category": {"type": "text"},
        "updated": {"type":"date"}
      }
    }
  }
}

I see that the format I'm passing is wrong so I changed it to

GET _search
{
    "query": {
        "range" : {
            "created" : {
                "gte": "2012-01-01T00:00:00",
                "lte": "2018-01-01T00:00:00",
                "format": "yyyy-MM-dd"
            }
        }
    }
}

But I got nothing

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

(David Pilato) #6

Could you try:

GET _search
{
    "query": {
        "range" : {
            "created" : {
                "gte": "2012-01-01",
                "lte": "2018-01-01"
            }
        }
    }
}

If it does not work, could you share a document that should normaly match?

I'm surprised that your mapping is:

        "created":  {
          "type":   "date", 
          "format": "yyyy-MM-dd'T'HH:mm:ss"
        },

While a document has:

    "created":"2014-07-31 00:00:00",

(Prakritidev Verma) #7

This is the document that I have in my database

 "3": {
                "arxiv_id": "1502.05880",
                "abstract": """
    This paper describes a flexible architecture for implementing a new fast
    computation of the discrete Fourier and Hartley transforms, which is based on a
    matrix Laurent series. The device calculates the transforms based on a single
    bit selection operator. The hardware structure and synthesis are presented,
    which handled a 16-point fast transform in 65 nsec, with a Xilinx SPARTAN 3E
    device.
    """,
                "authors": [
                  "de Oliveira R. C.",
                  "de Oliveira H. M.",
                  "de Souza R. M. Campello",
                  "Santos E. J. P."
                ],
                "categories": [
                  "cs.NA",
                  "cs.DM",
                  "eess.SP"
                ],
                "created": "2014-07-31 00:00:00",
                "doi": "10.1109/SPL.2010.5483017",
                "primary_category": "eess",
                "title": "A Flexible Implementation of a Matrix Laurent Series-Based 16-Point Fast\n  Fourier and Hartley Transforms",
                "updated": " "
              }
            }
          }

I noticed that I have different date format soI have changed the data now.

If I run this :

GET _search
{
    "query": {
        "range" : {
            "created" : {
                "gte": "2012-01-01T00:00:00",
                "lte": "2018-01-01T00:00:00"
            }
        }
    }
}

Based on the range query I should get one document because my "created": "2014-07-31 00:00:00" is between 2012 and 2018, but I get this on console.

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

when I run this:

GET /_all/_search?q='Fourier'

I get two document

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.29152924,
    "hits": [
      {
        "_index": "arxivdb",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.29152924,
        "_source": {
          "3": {
            "arxiv_id": "1502.05880",
            "abstract": """
This paper describes a flexible architecture for implementing a new fast
computation of the discrete Fourier and Hartley transforms, which is based on a
matrix Laurent series. The device calculates the transforms based on a single
bit selection operator. The hardware structure and synthesis are presented,
which handled a 16-point fast transform in 65 nsec, with a Xilinx SPARTAN 3E
device.
""",
            "authors": [
              "de Oliveira R. C.",
              "de Oliveira H. M.",
              "de Souza R. M. Campello",
              "Santos E. J. P."
            ],
            "categories": [
              "cs.NA",
              "cs.DM",
              "eess.SP"
            ],
            "created": "2014-07-31 00:00:00",
            "doi": "10.1109/SPL.2010.5483017",
            "primary_category": "eess",
            "title": "A Flexible Implementation of a Matrix Laurent Series-Based 16-Point Fast\n  Fourier and Hartley Transforms",
            "updated": " "
          }
        }
      },
      {
        "_index": "arxivdb",
        "_type": "_doc",
        "_id": "5",
        "_score": 0.2876821,
        "_source": {
          "5": {
            "arxiv_id": "1503.02577",
            "abstract": """
This paper introduces the theory and hardware implementation of two new
algorithms for computing a single component of the discrete Fourier transform.
In terms of multiplicative complexity, both algorithms are more efficient, in
general, than the well known Goertzel Algorithm.
""",
            "authors": [
              "Silva G. Jerônimo da",
              "de Souza R. M. Campello",
              "de Oliveira H. M."
            ],
            "categories": [
              "cs.DM",
              "cs.DS",
              "eess.SP",
              "stat.ME"
            ],
            "created": "2016-10-06 00:00:00",
            "doi": null,
            "primary_category": "eess",
            "title": "New Algorithms for Computing a Single Component of the Discrete Fourier\n  Transform",
            "updated": " "
          }
        }
      },
      {
        "_index": "arxivdb",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.2876821,
        "_source": {
          "4": {
            "arxiv_id": "1503.02577",
            "abstract": """
This paper introduces the theory and hardware implementation of two new
algorithms for computing a single component of the discrete Fourier transform.
In terms of multiplicative complexity, both algorithms are more efficient, in
general, than the well known Goertzel Algorithm.
""",
            "authors": [
              "Silva G. Jerônimo da",
              "de Souza R. M. Campello",
              "de Oliveira H. M."
            ],
            "categories": [
              "cs.DM",
              "cs.DS",
              "eess.SP",
              "stat.ME"
            ],
            "created": 1425859200000,
            "doi": null,
            "primary_category": "eess",
            "title": "New Algorithms for Computing a Single Component of the Discrete Fourier\n  Transform",
            "updated": " "
          }
        }
      }
    ]
  }
}

I am not able to figure out what I am doing wrong.


(David Pilato) #8

I tried your example and I can't index it.

DELETE arxivdb
PUT arxivdb
{
  "mappings": {
    "properties": {
      "created": {
        "type": "date",
        "format": "yyyy-MM-dd'T'HH:mm:ss"
      }
    }
  }
}
PUT arxivdb/_doc/1
{
  "created": "2016-10-06 00:00:00"
}

It gives as expected:

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [created] of type [date] in document with id '1'"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [created] of type [date] in document with id '1'",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "failed to parse date field [2016-10-06 00:00:00] with format [yyyy-MM-dd'T'HH:mm:ss]",
      "caused_by": {
        "type": "date_time_parse_exception",
        "reason": "Text '2016-10-06 00:00:00' could not be parsed at index 10"
      }
    }
  },
  "status": 400
}

Note that I tested on 7.0.0-beta1 but the behavior is the same on 6.6.

But it sounds like you are searching on all indices instead of only arxivdb. That might explain the behavior. I can't tell as I don't see the full response but that's a guess.

So try to search with:

GET /arxivdb/_search

(Prakritidev Verma) #9

This error is because you are not giving right date format

Instead of

PUT arxivdb/_doc/1
{
  "created": "2016-10-06 00:00:00"
}

You have to this

PUT arxivdb/_doc/1
{
  "created": "2016-10-06T00:00:00"
}

You can use my Kibana, it's a open temp server so you can see the database


(David Pilato) #10

I know what I have to do. I'm telling you that this is inconsistent with what you pasted.
With the mapping you said you applied, you can not index one of the sample documents you shared with us.

You can use my Kibana, it's a open temp server so you can see the database

I'm removing your URL from the answer. That's super dangerous to do such a thing and living your Kibana instance opened like this without any protection.

BTW did you look at https://www.elastic.co/cloud and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, APM, Logs UI, Infra UI and what is coming next :slight_smile: ...


(David Pilato) #11

Here is your mapping:

{
  "arxivdb": {
    "mappings": {
      "_doc": {
        "properties": {
          "0": {
            "properties": {
              // Skipped for clarity
              "created": {
                "type": "long"
              },
              // Skipped for clarity
            }
          },
          "1": {
            "properties": {
              // Skipped for clarity
              "created": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              // Skipped for clarity
            }
          },
          "3": {
            "properties": {
              // Skipped for clarity
              "created": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              // Skipped for clarity
            }
          },
          "4": {
            "properties": {
              // Skipped for clarity
              "created": {
                "type": "long"
              },
              // Skipped for clarity
            }
          },
          "5": {
            "properties": {
              // Skipped for clarity
              "created": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              // Skipped for clarity
            }
          },
          // Skipped for clarity
          "created": {
            "type": "date",
            "format": "yyyy-MM-dd'T'HH:mm:ss"
          },
          // Skipped for clarity
        }
      }
    }
  }
}

It's definitely a mess. Multiple created fields, at different levels.

              "created": {
                "type": "long"
              },
              "created": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
          "created": {
            "type": "date",
            "format": "yyyy-MM-dd'T'HH:mm:ss"
          },

You probably need to start again from scratch.

If you have further questions please provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are exactly doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


(system) closed #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.