How to search a piece of URI

I want to search a piece of URL. I am using the sample weblogs in elasticsearch.

If I analyze the field:

GET /_analyze
{
"analyzer" : "standard",
"text" : ["http://nytimes.com/success/kevin-Kregel"]
}

I get:

{
"tokens": [
{
"token": "http",
"start_offset": 0,
"end_offset": 4,
"type": "",
"position": 0
},
{
"token": "nytimes.com",
"start_offset": 7,
"end_offset": 18,
"type": "",
"position": 1
},
{
"token": "success",
"start_offset": 19,
"end_offset": 26,
"type": "",
"position": 2
},
{
"token": "kevin",
"start_offset": 27,
"end_offset": 32,
"type": "",
"position": 3
},
{
"token": "kregel",
"start_offset": 33,
"end_offset": 39,
"type": "",
"position": 4
}
]
}

Now what if I want to search for 'kregel' or 'nytimes'. How do I do this? please help!

Hi @searchwithme

You tried to use match query? Look this doc.

GET /_search
{

"query": {
"match": {
"message": {
"query": "nytimes"
}
}
}

}

This is what I get :expressionless:
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 8,
"successful": 8,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits":
}
}

What is the mapping for that field in your index?

An option is use Pattern Tokenizer.

PUT idx_test
{
  "mappings": {
    "properties": {
      "url": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern"
        }
      }
    }
  }
}

     
POST idx_test/_doc
{
 "url":"http://nytimes.com/success/kevin-Kregel"
}


GET idx_test/_search
{
  "query": {
    "match": {
      "url": "nytimes"
    }
  }
}

This is the mapping:

{
"sample_data_logs": {
"mappings": {
"properties": {
"@timestamp": {
"type": "alias",
"path": "timestamp"
},
"agent": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"event": {
"properties": {
"dataset": {
"type": "keyword"
}
}
},
"extension": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"geo": {
"properties": {
"coordinates": {
"type": "geo_point"
},
"dest": {
"type": "keyword"
},
"src": {
"type": "keyword"
},
"srcdest": {
"type": "keyword"
}
}
},
"host": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"index": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ip": {
"type": "ip"
},
"machine": {
"properties": {
"os": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ram": {
"type": "long"
}
}
},
"memory": {
"type": "double"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"phpmemory": {
"type": "long"
},
"referer": {
"type": "keyword"
},
"request": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"response": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"tags": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timestamp": {
"type": "date"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"utc_time": {
"type": "date"
}
}
}
}
}

wow this worked! you are amazing. thank you!

1 Like

If you look at the output from the _analyze API you can see that the standard analyzer creates a token nytimes.com and not nytimes plus com, which is why you do not find anything when searching for just nytimes. If you instead searched for kregel you should find a match.

no kregel didn't work either

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.