Hi searching folks,
I new to elasticsearch.
I have the following need : index tabular data (CSV), as whole documents : 1 CSV dataset = 1 document in ES.
As my datasets are quite big, I consider indexing a pre-computed synthesis of the data we already have, roughly :
- list of the columns,
- for each of the column : values (string), and frequency of each value in the column.
1st question I have :
I guess I need to tweak the TF/IDF computation in ES. I need to tell that the frequency of terms in the doc is not what ES will count but add a weight to that.
What's the good way to do it ?
2nd question :
In the search result, I want to know what columns the matched terms belong to. (for highlighting)
How can I achieve this ?
3rd question :
Do you see any caveat in this way of indexing data I should pay attention to ? Anything else I should customize in the indexation or search ?
Thanks !
Mathieu