Hi!
I'm a ElasticSearch newbie and approaching the following project - I have
many millions (100+ and counting) of tables in JSON format like this:
{ "title" : "dogs species",
"col_names" : [ "name", "description", "country_of_origin" ],
"rows" : { [ "Boxer", "good dog", "Germany" ],
[ "Irish Setter", "great dog", "Irland" ]
}
}
{ "title" : "Classmates",
"col_names" : [ "name", "class", "age", "avg_grade" ],
"rows" : { [ "Alice", "A", "14", "85" ],
[ "Bob", "B", "15", "91" ]
}
{ "title" : "Misc stuff",
"col_names" : [ "foo" ]
"rows" : { [ "Setter is impotant" ],
[ "Irland is green" ]
}
I.e. tons of completely unrelated structural data. My goal is to make it
searchable.
My search requirements are:
- Just search for text inside the cells over all tables. I.e. searching
for "Boxer Irland" should find the first and last table above - Maching withing the row should have a higher score. I.e. searching
for "Irland Setter" should give the first table the higher score in results - Its also important to somehow preseve the data structure of each
table, so it could be fetched and converted back to structured JSON
Any ideas on how to approach this problem?
Thank you all very much in advance.
Zaar
--