Not sure if you have tried/heard of Whisper. It automatically transcribes audio, I use it for meetings/lectures that don’t come with Closed Captioning, it supports audio/video files and a few languages. I had tried a few solutions with mixed results (e.g. Google is slow, many places limit lengths/sizes), IBM is supposed to be the best free/low cost cloud model but they would never approve my accounts. In the end locally with whisper in an Anaconda/Python environment was best cheap option for me.
Not OP but I’ve been looking for one to help me with meetings and disorganized notes. How well would you say it works? Does it only transcribe or will it help organize notes (create categories, cluster analysis, tags, action items, whatever)?
Only transcription, it outputs to a few formats that amount to plain text with or without time coding including srt subtitles. It transcribes really well, one bit of note is that sometimes with more technical discussions I find better results using the smaller models. My best theory is the technical words are less likely to be assumed to be an accent/variation.
Not sure if you have tried/heard of Whisper. It automatically transcribes audio, I use it for meetings/lectures that don’t come with Closed Captioning, it supports audio/video files and a few languages. I had tried a few solutions with mixed results (e.g. Google is slow, many places limit lengths/sizes), IBM is supposed to be the best free/low cost cloud model but they would never approve my accounts. In the end locally with whisper in an Anaconda/Python environment was best cheap option for me.
Not OP but I’ve been looking for one to help me with meetings and disorganized notes. How well would you say it works? Does it only transcribe or will it help organize notes (create categories, cluster analysis, tags, action items, whatever)?
Only transcription, it outputs to a few formats that amount to plain text with or without time coding including srt subtitles. It transcribes really well, one bit of note is that sometimes with more technical discussions I find better results using the smaller models. My best theory is the technical words are less likely to be assumed to be an accent/variation.
Thanks for posting, I’ll check it out :)