Query text which starting with slash by ElasticSearch 6.4.3


(Chia Wen Wu) #1

Hi all,
Here is my data structure in Elasticsearch

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 5.7112346,
    "hits" : [
      {
        "_index" : "psql-widget-raw-log-2019.03",
        "_type" : "fluentd",
        "_id" : "33590",
        "_score" : 5.7112346,
        "_source" : {
          "id" : 33590,
          "domain_info_id" : 1,
          "path_string" : "/test.html?p1=aaa&p2=bbb",
          "user_group_id" : "gid01",
          "user_id" : "uid-01",
          "ctime" : "2019-03-19 02:29:15.026200+0000",
          "@timestamp" : "2019-03-19T10:29:15.000000000+08:00"
        }
      }
    ]
  }
}

And I wanna get data which path_string is starting with "/test.html",

I've been tried prefix, wildcard , escaping slash,
but it's seems not working.

I can only get data by match

Any help would be greatly appreciated


(David Pilato) #2

What is the mapping?

BTW, not sure what your usecase is but may be you'd like to look at the path tokenizer.


(Chia Wen Wu) #3

Thanks, @dadoonet

Here is my mapping

PUT /_template/psql-widget-raw-log-format
{
  "index_patterns" : ["psql-widget-raw-log-*"],
   "settings":{  
    "number_of_shards": 2
   },
  "mappings": {
    "fluentd":{
        "properties":{
            "id":{
                "type":"integer"
            },
            "domain_info_id":{
                "type":"integer"
            }
        }
    }
  }
}

Usecase is like this in sql

select count(*) from psql-widget-raw-log where path_string like '/test.html%'

I still couldn't query by prefix, wildcard, regexp after changed value in field from '/test.html' to '%2Ftest.html' ( or '%2Ftest.html', '\%2Ftest.html')

It's worked when using other field like 'user_group_id' or 'user_id', so I thought that I querying it in right way

Thanks for your help


(David Pilato) #4

I'd use a path tokenizer. See https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pathhierarchy-tokenizer.html


(Chia Wen Wu) #5

So I should change my mapping like this

PUT /_template/psql-widget-raw-log-format
{
  "index_patterns" : ["psql-widget-raw-log-*"],
   "settings":{  
    "number_of_shards": 2
   },
  "mappings": {
    "fluentd":{
        "properties":{
            "id":{
                "type":"integer"
            },
            "domain_info_id":{
                "type":"integer"
            },
            "path_string":{
                "type": "string",
                "analyzer":{"tokenizer": "path_hierarchy"} 
            }
        }
    }
  }
}

and then using query like this

curl -X GET "localhost:9200/psql-widget-raw-log-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "bool": {
            "must": [
                {"prefix": {"path_string": "/test.html"}} 
            ],
            "filter":[ 
                {"range" : 
                    {"@timestamp" : 
                        {"gte" : "2019-03-19T10:00:00","lte" : "2019-03-20T00:59:59",
                        "time_zone": "+08:00"}
                    }
                },
                {"match": {"domain_info_id": "1"}}
            ]
        }
    }
}'

and get all data that path_string is starting with "/test.html", e.q. "/test.html%3Fa%3Dthanku", "/test.html",....etc

Is this correct?


(David Pilato) #6

Not exactly. You need to define an analyzer first. Then use the analyzer in your mapping.
Then you can try just with a term or match query instead of a prefix query.

If you need further help, please provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


(system) closed #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.