Words vs Guids, what is faster?

I'm in front of a design choice where I would need a bit of help to choose the most efficient way.

I'm currently indexing data tagged with a string of comma separated Guids, for example:
\"e659959d-392f-44c5-83a5-fb959cdbaccc\",\"ab2975e3-b9ca-4b1a-a93e-fb61a5d5c3a4\",\"c48e0bf4-7a1b-4ffd-893c-12e46e664f7f\",\"0074af4d-eb56-4e57-89b6-07c39c63c9c4\"

This is the exact look of the string which is in one field only (ingestion in logstash sql of a serialized list stored in the mysql db).

I can modify the logstash query to translate the Guids to english words, for example:
english, cute, irl

Which one of the above solution would yield the best search performance when executing the following:
GET myindex/_search { "query": { "bool":{ "must":[ {"bool":{"should": [{"match":{"tag_ids":{"query":"ab2975e3-b9ca-4b1a-a93e-fb61a5d5c3a4","operator":"AND"}}}]}} ] } } }

or of course

GET myindex/_search { "query": { "bool":{ "must":[ {"bool":{"should": [{"match":{"tag_ids":{"query":"english","operator":"AND"}}}]}} ] } } }

Complexity of ingestion is minor while search performance is critical. Thanks for your inputs

The best approach here is to test each and see what gets you the results you want.

not really, it would take about 2 weeks to ingest the data, ie 4 weeks to reach a conclusion, so if someone knows the logic behind the fuzzy search and how words/guids are indexed and how it affects performance that would be valuable, not only to me, but to anyone trying to design a performant ES index

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.