AI Meeting Notes POC | makarima.dev

This project explores a narrow but useful workflow: start with a recorded meeting video and end with notes that are easier to scan than the raw conversation. The CLI extracts audio, runs transcription, identifies speaker turns, and writes a summary plus intermediate artifacts to an output folder.

The pipeline is intentionally explicit. ffmpeg converts the source file into a mono 16kHz WAV, whisper-1 handles transcription, pyannote diarization estimates speaker segments, and a small overlap-based merge step assigns the most likely speaker label to each transcript segment.

Code flow

Pipeline from source video to structured notes and action items.

diagram

Rendering diagram...

I kept the implementation close to the processing stages so it is easy to inspect or swap pieces. Each step writes structured output, including raw transcript segments, diarization cache, merged speaker segments, a readable speaker transcript, and the final summary markdown.

It is still a POC, not a polished product, but it proves the shape of the workflow and the failure points that matter. The result is a practical baseline for turning long recordings into notes with decisions and action items without manually replaying the whole meeting.