VoiceNotes
CompletedSpeech-to-notes conversion using Whisper and LLM processing for structured output.
Date
2024-03
Duration
3 weeks
Team
solo
Difficulty
medium
Project Story
VoiceNotes captures spoken ideas and converts them into structured Markdown notes. It is focused on turning rough thoughts into usable documentation.

The pipeline uses Whisper for transcription and an LLM for post-processing into concise summaries, action items, and clean Markdown.
Technical Details
Tech Stack
PythonOpenAI WhisperLLM ProcessingMarkdownAudio Processing
Key Features
Speech-to-markdown conversion
Automatic summarization
Action item extraction
Timestamped notes
Batch processing support
Challenges Faced
Audio quality variance
Multi-speaker handling
Long transcript context limits
Latency for larger files
Key Learnings
Audio quality has first-order impact on results
Post-processing quality determines practical usefulness
Context chunking strategy matters for long sessions
Feedback loops improve output accuracy
Explore More Artificial Intelligence Projects
CV Agent Chatbot
Resume helper focused on showcasing AI skills
AIChatbotResume
AI Changing Room
Outfit assistant using image generation models. Uses MacBook Air M1 instead of GPU.
AIImage GenerationFashion
MemoryMaker
RAG application using vector embeddings with OpenAI embeddings and FAISS for search
PythonLLMFAISS
Need a similar implementation?
If you want to build a practical AI feature like this in your product, reach out and I can help with architecture, prototyping, and delivery.