I use Translate API to get the native elastic search queries from SQL queries.
At first try, we use the RLIKE query to match a partial word in a sentence.
POST /_xpack/sql/translate
{
"query":
"SELECT * FROM c1s_may_20
WHERE ((clause_number.keyword RLIKE '.*emisión.*' AND level = 'CLAUSE'))
AND data_source IN ('CFR', 'PANAMA', 'FINRA')
ORDER BY clause_number.keyword DESC
LIMIT 10"
}
The translated native query for the above-translated API is as follows:
GET c1s_may_20/_search
{
"size" : 10,
"query" : {
"bool" : {
"must" : [
{
"bool" : {
"must" : [
{
"query_string" : {
"query" : "/.*emisión.*/",
"fields" : [
"clause_number.keyword^1.0"
],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
{
"term" : {
"level.keyword" : {
"value" : "CLAUSE",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
{
"terms" : {
"data_source.keyword" : [
"CFR",
"PANAMA",
"FINRA"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"amendment",
"amendment_number",
"appendix",
"appendix_name",
"appendix_number",
"chapter_name",
"chapter_number",
"clause_number",
"considerations",
"data_source",
....(continues)
],
"excludes" : [\*]
},
"docvalue_fields" : [
{
"field" : "appendix_id",
"format" : "use_field_mapping"
},
{
"field" : "chapter_id",
"format" : "use_field_mapping"
},
{
"field" : "clause_id",
"format" : "use_field_mapping"
},
{
"field" : "division_id",
"format" : "use_field_mapping"
},
...(continues)
],
"sort" : [
{
"clause_number.keyword" : {
"order" : "desc",
"missing" : "_first",
"unmapped_type" : "keyword"
}
}
]
}
Here, it fails in case insensitive accented search.
So, we decided to take on another idea using the multi-match query with phrase_prefix.
The translate API is as follows.
POST /_xpack/sql/translate
{
"query": "
SELECT * FROM c1s_may_20
WHERE (level = 'CLAUSE' AND MATCH('clause_number', 'emisión','type=phrase_prefix'))"
}
It produces the following native elastic search query:
GET c1s_may_20/_search
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [
{
"term" : {
"level.keyword" : {
"value" : "CLAUSE",
"boost" : 1.0
}
}
},
{
"multi_match" : {
"query" : "emisión",
"fields" : [
"clause_number^1.0"
],
"type" : "phrase_prefix",
"operator" : "OR",
"slop" : 0,
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"amendment",
"amendment_number",
"appendix",
"appendix_name",
"appendix_number",
"chapter_name",
"chapter_number",
"clause_number",
"considerations",
"data_source",
...(continues)
],
"excludes" : [ ]
},
"docvalue_fields" : [
{
"field" : "appendix_id",
"format" : "use_field_mapping"
},
{
"field" : "chapter_id",
"format" : "use_field_mapping"
},
{
"field" : "clause_id",
"format" : "use_field_mapping"
},
{
"field" : "division_id",
"format" : "use_field_mapping"
},
...(continues)
],
"sort" : [
{
"_doc" : {
"order" : "asc"
}
}
]
}
The above query works for case insensitive accented search. But, there is another scenario.
For example,
A document of _id='0'
has a field "clause_number":"El numeral 7, literal d del artículo 2 del Acuerdo No. 5-2011, queda así:"
Another document of _id='1'
has a field "clause_number" : "El artículo 11 del Acuerdo No.005-2011 queda así:"
While searching the above query with the keyword "5-2011"
, it should return both documents, but it fails to return document of _id='1'
as it contains "005-2011"
and returns the only document of "_id":'0'
as we used phrase_prefix
here.
Is there any alternative way to solve this problem which satisfies both the scenarios?