Hi,
I am trying to find the way to search in our ES cluster for a substring contained within a doc field's string (where this substring may contain space, for example, as well as may contains a colon, hyphen, etc.).
I think it is best to demonstrate with an example so I put below a scenario that demonstrates what I'm trying to accomplish - specifically with usage of space - as I assume once I have the solution for that I will be able to apply it for strings with colon/hyphen/etc..
So for example let's say I have 2 documents:
{"_id": 1, "_source": {"email.subject": "one two three four"}}
{"_id": 2, "_source": {"email.subject": "two one three four"}}
And I would to search for a substring "wo thre"
, such that it matches only the first mentioned document (i.e. by regex ".*wo thre.*
"). The second should not match, of course.
Please help me understand:
- Am I doing something wrong here?
- How could I accomplish such substring search?
- How is it best to implement the doc structure (using some analyzers or something?) such that it allows this substring search?
Thanks!
here's the sample scenario I mentioned earlier which demonstrates what I'm failing to achieve
### Insert first doc
$ curl -X PUT "https://es-host/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'
> {
> "email.subject": "one two three four"
> }
> '
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
### Insert second doc
$ curl -X PUT "https://es-host/customer/_doc/2?pretty" -H 'Content-Type: application/json' -d'
> {
> "email.subject": "two one three four"
> }
> '
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
### Search with regex with space: ".*wo thre.*" - but no hits
$ curl -X POST "https://es-host/customer/_search?pretty" -H 'Content-Type: application/json' -d'
> {
> "query": {
> "regexp": {
> "email.subject": {
> "value": ".*wo thre.*"
> }
> }
> }
> }'
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
### Search with regex without space: "thre.*" - 2 hits
$ curl -X POST "https://es-host/customer/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"regexp": {
"email.subject": {
"value": "thre.*"
}
}
}
}'
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"email.subject" : "two one three four"
}
},
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"email.subject" : "one two three four"
}
}
]
}
}