Hi David,
We are getting text documents which are only timestamp and text detail. These files are actual speech text, hence contain timestamp (time without date) and space separated text.
So, I am splitting my data into time and message fields.
So my document having line 1 as:
00:00:00 - 00:01:00 This is first line
is split into 2 fields i.e.
time: 00:00:00 - 00:01:00
message: This is first line
This is exactly how I have the data in my documents.
Keyword: cricket, football
00:00:00 - 00:01:00 This is first line
00:01:01 - 00:02:30 This is second line
00:02:30 - 00:03:45 This is third line
00:03:45 - 00:05:00 This is fourth line
Keyword: tennis
00:05:00 - 00:06:55 This is fifth line
00:06:55 - 00:07:45 This is sixth line
...
So, I have 1 or more keywords(cricket, football) for a paragraph within time range 00:00:00 - 00:05:00 and based on keyword search(cricket) the entire paragraph should be returned.
Also, I am not sure how keyword will be stored, will it be separate table? how to define the relation?
We need to search text on basis of keyword and return data within the time range.
Thanks!!