Update: I think this is already outdated and I’m following up in Audio-UIs.
AI and transcription technology now reliably turn voice into text, enabling a form of computing that enhances deliberate, analytical thinking. As you may know, the human brain operates in two modes: System 1 handles fast, intuitive, and emotional responses, while System 2 manages slower, analytical, and deliberate thinking. Understanding this distinction is crucial to seeing how new audio interfaces are transforming computing.
Voice Enables Thought-Speed Computing
While keyboards work, they create friction that can interrupt analytical thought processes. Voice input eliminates this gap - we can now maintain deeper analytical engagement while communicating with computers.
Right after, language models are able to take nebulous text input and convert it into concrete actions and give the user near instant feedback.
Detecting System 1 Hijacking
Doom scrolling is a classic example of content that hijacks System 1. This phenomenon is particularly common in visual media platforms like TikTok or Facebook, where triggering content can seize control of our brain’s attention systems.
When encountering such content, we often have an internal voice that says “ick” or “I don’t like this.” However, by that point, System 1 has already taken control, and our fingers mechanically continue scrolling despite our better judgment.
Now imagine a computer that actively listens to these verbal cues of discomfort. When you vocalize that something isn’t right, your whole mind can engage in a dialogue with the computer. At the operating system level, it could offer you the choice to gently wind down and transition to something else, helping you regain control for your own wellbeing.
This creates computing that helps maintain analytical thinking rather than interrupting it.
Balanced Input/Output
Audio input doesn’t replace visual interfaces. Instead, augments them, and most importantly, it allows both systems of the human brain to be able to talk to the computer together.
- Voice inputcaptures sustained analytical thoughts without mechanical interruption
- Audio output can provide context without breaking focus
- Visual output can be something that speaks to the more emotional system one.
- But we can also have visual overlays that are produced upon taking system 2’s output through auditory input.
Computing with Deeper Thought
The rise of reliable speech interfaces marks a shift from computers that merely execute tasks to ones that actively support and sustain our analytical thinking processes. As AI continues to evolve, we’re moving toward computing that doesn’t just make things easier - it helps us think better.