I have a one to many data set where a unique identifier could have between 1 and 5 attributes about that document and additional attributes about that attribute. My initial feeling is just to flatten it and run match queries against the 5 columns.
some data:
id, g1, c1, g2, c2, g3, c3
1234, text file, 0.2, csv file, 0.8, tsv file, 0.1
where 'g' stands for guess and 'c' stands for confidence. I want to be able to query for rows that have a 80% chance of being a csv file.
{
"query": {
"bool": {
"must": [{
"range": {
"c1": {
"gte": 0.8
}
}
}, {
"range": {
"c2": {
"gte": 0.8
}
}
}, {
"range": {
"c3": {
"gte": 0.8
}
}
}],
"should": [{
"match": {
"g1": "csv file"
}
}, {
"match": {
"g2": "csv file"
}
}, {
"match": {
"g3": "csv file"
}
}]
}
}
}
is there something I can do better here? is this too naive?
