I'm trying to find the best and most efficient way to index a transcript of long conversations, for example a 2-3 hour podcast between multiple members, or for another example a transcript of a large Zoom meeting with many speakers.
On average, there would be about 5-10 different speakers that will talk for about 2 hours and I have many of these podcasts that I'd like to index so I have a lot of text and I'm wondering how to index it into Elasticsearch in the best way that will allow me to do simple full text searches on the contents in a way that will allow me to either search for a specific phrase within a specific podcast transcript or a general full text search on all transcripts from all podcasts.
The format of the text is something along the lines of:
<time> <participant name>: <participant transcript>
Which I can break into whatever other format via code
Any suggestion on the best approach for this use case?
Thanks!