I had a similar requirement recently and what worked for me was indexing one of the datasets 'as is' in ES, and enrich the other one using the elasticsearch filter in Logstash. This filter allowed me to make a lookup in the other dataset using queries (with field substitution even!), and copy all the fields needed to enrich my events, that matched my queries. I was matching Japanese prefectures, municipalities and address in both Japanese/Chinese characters and it all worked flawlessly.
Below a piece of my configuration and the query used to match my stuff:
elasticsearch {
hosts => "elasticserver:9200"
index => "logstash-zipcodes"
query_template => "../elastic_queries/es_query.json"
fields => [ ["cod_municipality","cod_municipality"],
["cod_area","cod_area"],
["address_complement","address_complement"],
["geoip","geoip"],
["latitude","latitude"],
["longitude","longitude"] ]
}
Above, the fields pair means the field name you are copying from the other index, and the destination field on your new event, the last one doesn't need to match the first.
es_query.json
{
"query": {
"bool": {
"must": [
{ "term" : { "prefecture.keyword" : "%{prefecture_jp}" }},
{ "term" : { "municipality.keyword" : "%{municipality_jp}" }}
],
"should": [
{ "wildcard" : { "address_complement.keyword": "%{address_jp}%{wildcard}" }}
]
}
},
"from": 0,
"size": 1,
"_source": [
"cod_municipality",
"cod_area",
"address_complement",
"geoip",
"latitude",
"longitude"
]
}
That %{wildcard} field contains only a '*' because I couldn't make it work adding this directly to the query.
The query brings back only the first document matched against the fields prefecture and municipality, and if there is anything that looks like the address_complement it will match that document. If not, we stay with the documents matched by the first 2 terms.
You need add to the _source, the fields you want returned from the dataset being looked up.
Documentation:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-elasticsearch.html