AI Wearable Device, 2023
Project Ring began as an exploration into wearable, multimodal, and always-available AI interactions. Somewhat unintentionally, it evolved into a project on high-bandwidth human-machine collaboration. Ultimately, 100% of the code for this project was written by GPT-4.
In Project Ring, a user can trigger voice input to an AI using a ring-mounted joystick and converse via bone-conduction headphones. The user can also take photos using a camera on the ring. These photos are converted to textual descriptions of the environment and relayed to the AI, which can reference the contents of the photos in its responses.
Together, this experience is intended to demonstrate low-friction interactions which blend physical & digital information between humans & AI.
This system consisted of 3 components:
A ring with a camera & joystick, connected to a Raspberry Pi Zero. The ring allows the user to take photos & trigger voice commands using low-friction button inputs
A phone app paired to bone conduction headphones. The app records the user’s voice when the ring joystick is triggered, and plays synthesized voice responses from the AI
A cloud app which handles AI processing. It includes OpenAI Whisper (voice-to-text), Replicate (image-to-text), OpenAI ChatGPT (text-to-text), and ElevenLabs (text-to-voice)
Early on, I decided to try having GPT-4 write all of the code for this project. This included ~750 lines across a Raspberry Pi Python script, cloud application, HTML webpage, and Android app. After having completed this project, I can now say that it is possible, but not easy, to create software prototypes entirely using GPT-4. The main skills required were:
Knowing how to decompose the software architecture into small changes that could be completed by GPT-4,
Reading code well enough to copy & paste into the correct sections, and
Navigating error message interfaces
Several shortcomings were apparent in this experience. GPT-4 frequently lost context and needed to have prior code shared again. Sometimes GPT-4 would hallucinate and go off on tangents, which required being reminded of what problem we were presently trying to solve. The code was not stable, performant, or production-ready, although GPT-4 typically pointed this out proactively. And lastly, GPT-4 vastly outperformed GPT-3.5, which was challenging due to OpenAI’s GPT-4 limit of 25 message every 3 hours.
At the same time, the arc of progress has never been more clear. A better interface would’ve greatly improved GPT-4’s coding ability, even without further advancements to foundation models. AI may be capable of automating a large majority of coding tasks in a relatively short time period, and this will likely extend to most knowledge work. We will soon share the world with a new class of autonomous entities which will live in the cloud, on phones, and within wearables that dot our bodies. Whether oracles, companions, or cyborgs - the future will be plural.
For relevant prior work, see Project Oco