I want to look for fields that contain smith from the middle of the email and .org or .com from the end of the email
What is the most suitable for this so that the query is very fast
I want the query to return all the emails according to the condition in the query
For example
GET index/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"email": {
"value": "smith"
}
}
},
{
"wildcard": {
"email": {
"value": "*org"
}
}
}
]
}
}
}
I want this query to return all the emails that contain in the middle of it the Im smith and at the end of it .org
This is Saud all the identical values whether they are in the middle or the beginning or the end and I do not want this I want the value that I specify in the query exactly
For example I want the one in the middle and I don't want any other similar values to query from the values at the end or the beginning I just want the one in the middle
you continue to use wildcards and the query will remain slow
you change the query / mapping to find a faster way to get your results
I proposed something which is must faster but you answered this is not answering your use case.
So I asked, please reproduce your use case so we can help you.
You answered that you don't have any problem with the query but it just slow.
Not sure how to help if you don't want to be helped.
Create the mapping David presented above and index the document he used as an example. You should then be able to query it as follows without requiring any wildcard queries, which should speed it up considerably.
Yes, your words are correct, but this returns any similar data and the use case does not allow that, so it is necessary to return the exact similar data
I do not understand what you mean by that. You need to provide explicit examples of what does and what does not work.
Please show an example with a query and sample data and what is returned that you do not want and a sample document that is not returned even though you expect it to be.
If you want to search on specific parts of the string and not treat it all as a list of tokens you can break out components into separate fields, e.g. like this:
POST /test/_doc
{
"email": "mary.smith@sakilacustomer.org",
"domain": "sakilacustomer.org",
"domain_suffix": "org"
}
Here you can search on the whole email address but also on the domain or just the suffix without using wildcards, which should be fast and scale well.
If you do not want to break out parts into separate fields in the source document you may also be able to achieve this through multi fields using different analysers.
In order to achieve what you want I believe you will need to change your mappings and get away from the slow wildcard queries. I would recommend setting up a small test index with the mappings David suggested and load this with a small subset of your data. You can then test and iterate on mappings and queries until you find something that works for you.
We have told you what to do but you do not seem to listen. I do not have time for this type of pointless back-and-fourth without any progress so wish you good luck.
If you insist on using wildcard queries, changing your mappings to include the wildcard field type would make this faster, even though it likely will be slower than the other approaches described.
And these are the outputs
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits":
}
}
Note that it does not return anything and if you delete * it will return the values when you can stand the word anywhere in the email and I do not want it to do that I want the word to be at the end of the email only if not at the end of the email do not repeat it
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.