Research Portfolio — AI for Accessibility

Beyond Visual Defaults.

Creativity is a fundamental expression of human agency, yet our digital tools remain strictly visual. I build AI systems that decouple creativity from sight—transforming accessibility from a compliance checkbox into a blueprint for Multimodal, Verifiable, and Customizable Human-AI collaboration.

When we solve for non-visual authoring, we solve the interface for the future of AI agents.

The Bigger Picture

Accessibility Research as the Blueprint for Agent Design

Accessibility is not an edge case study—it is a stress test for trustworthy Human-AI interaction. My research posits that "blindness"—the inability to visually verify complex system outputs—is no longer just a disability context; it is the default state for all humans interacting with complex AI Agents.

1 The Safety & Trust Angle

The Risk Amplifier

Blind users are the "canaries in the coal mine" for AI. They represent users whose vulnerability is infinite if an agent hallucinates. Sighted users might catch a visual error in a glance; blind users cannot.

The Value: Solving "execution verification" for blind users is the only way to build truly high-trust Agents for any user in high-stakes scenarios.

2 The Generalization Angle

Universal "Blindness" under Load

As agents begin executing thousands of lines of code changes, visual verification fails for everyone. No human can audit every pixel or line of code in real-time. Sighted users are becoming "cognitively blind" to the scale of AI output.

The Value: The non-visual, semantic interaction paradigms I build for accessibility are the solution to the information overload facing general consumers.

3 The AI-Native Angle

Agents are "Digital-Natives" (and Digitally Blind)

If we treat AI Agents as first-class citizens, we realize they perceive the world like screen readers do: via DOM trees, APIs, and structured representations, not pixels.

The Value: My work in translating visual layouts into new semantic representations for accessibility is essentially designing the optimal native language for Agents. Empowering blind users to control systems directly maps to helping Agents better understand and co-create with humans.

Below, I present three case studies that put this vision into practice—each tackling a unique facet of making visual creativity accessible through AI: restoring spatial perception, enabling output verification, and giving users authorship over automation.

01
Spatial Understanding Touch & Audio CHI 2023 ASSETS 2024

Breaking the 1D Barrier

The Problem

Screen readers flatten the rich, 2D world of artboards and charts into a linear, 1D stream of text. Blind users lose spatial context, making layout and data analysis cognitively exhausting.

Spatial perception is often treated as purely visual, but it is fundamentally geometric and relational. My work decouples spatial reasoning from sight by reintroducing dimensionality through multimodal interaction. By combining touch, haptics, and spatial audio, we allow users to "physically" explore digital artifacts—restoring the agency to perceive layout and density that screen readers strip away.

A11yBoard: Decoupling Command and Perception

A11yBoard is a system that makes digital artboards—like presentation slides—accessible to blind users. Blind creators are often power users of keyboards, but keyboards lack spatial feedback. A11yBoard introduces a "split-sensory" architecture: users keep the precision of the keyboard for command input (on a laptop) while gaining a new "perceptual window" via a paired touch device, enabling risk-free spatial exploration of slide layouts.

A blind user utilizing A11yBoard with a split setup: a laptop for keyboard commands and a smartphone for tactile spatial exploration.
The A11yBoard Split Setup: Laptop for editing commands, mobile device for spatial perception.

To support this, we developed a gesture vocabulary that translates visual scanning into tactile queries. The "Split-Tap" allows a user to keep one finger on an object (maintaining spatial reference) while tapping with another to query its properties—separating navigation from interrogation.

Diagram of A11yBoard gestures including Single-finger Exploration, Split-tap Selection, Two-finger Dwell, Two-finger Flick, Double Tap, Three-finger Swipe, Quadruple Tap, and Triple Tap.
Gesture vocabulary for non-visual spatial scanning.

ChartA11y: Feeling the Shape of Data

While A11yBoard handles discrete objects, data visualizations present a different challenge: density. A screen reader can read a data point, but it cannot convey a "trend." ChartA11y is a smartphone app that makes charts and graphs accessible through touch, haptics, and sonification. It provides two complementary modes of interaction.

Semantic Navigation provides structured access to chart components. Users traverse the chart's hierarchy—axes, legends, series—through a gesture set designed for building a mental model before diving into details.

ChartA11y gesture vocabulary including panning, double tap, swipe, and rotor interactions.
Semantic Navigation Gestures: Structured access to chart hierarchy.

Sonification turns data analysis into a multisensory experience. We map pitch to value and timbre to density, enabling users to perceive trends through audio alone.

Visualization of auditory feedback mapping pitch to Y-values in line charts and pitch/duration to density in scatter plots.
Sonification Design: Mapping data density to audio timbre and duration.

Direct Touch Mapping turns the screen into a tactile canvas. As users drag their fingers across a scatter plot, they receive continuous haptic and auditory feedback based on data density—identifying clusters, outliers, and gaps instantly.

ChartA11y Direct Touch Mapping features showing sonification via touch, pinch to zoom, split-tap for info, and swiping to switch series.
Direct Touch Mapping: Users "scan" density with their fingers, using pinch-to-zoom to manage information.
Key Contribution: Multimodal systems that restore spatial reasoning to assistive technology, enabling blind users to perceive 2D layouts and dense data trends that screen readers fundamentally cannot convey.
02
3D Modeling Generative AI ASSETS 2025

Trust in the Black Box

The Problem

Generative AI can create complex 3D models instantly, but for blind creators, the output is a black box. How can they trust that the AI respected their intent if they can't see the mesh?

If a blind user prompts an AI for a "helicopter," they might get a blob or a masterpiece. Without sight, they cannot verify the result. A11yShape is a system that enables blind users to create and verify 3D parametric models with AI assistance. It solves the verification problem through Cross-Representation Interaction: instead of showing only the visual output, we synchronize the Code (the source of truth), the Semantic Hierarchy (the structure), and the AI Description (the explanation).

A11yShape UI showing the synchronization between the Code Editor, Semantic Tree, and AI Assistant panel.
Cross-Representation Highlighting: Selecting a component in the semantic tree (1) highlights its code (4) and generates a focused AI description (3).

This triangulation allows for verification without vision. Users inspect the underlying logic rather than visual output. If the AI says "added a propeller," the user can verify that the code block exists, is connected to the right parent node, and has parameters that make sense—transforming a "slot machine" interaction into a rigorous, iterative engineering process.

User journey comparison showing failure of all-at-once generation vs success of iterative, component-based construction.
From Hallucination to Engineering: While "all-at-once" generation fails (1), iterative verification allows blind users to construct complex parametric models (2-10).
Key Contribution: A verification paradigm for AI-generated artifacts where users inspect synchronized semantic representations rather than visual output—making trust possible without sight.
03
User-Defined Routines AI Agents Under Review

Authoring the Automation

The Problem

Navigating information-dense interfaces with a screen reader is repetitively exhausting. AI agents promise automation but often act as "black boxes"—taking control away from the user and creating safety risks if they hallucinate.

ScreenRoutine is a system that lets blind users define their own automation routines in natural language, then translates those routines into structured, verifiable programs. Instead of asking a black-box agent to "do it for me," users author Routines—e.g., "Find the cheapest cable"—which the system compiles into semantic blocks: Triggers, Filters, and Actions. This restores agency by putting the user in control of the automation logic.

ScreenRoutine Workflow: (A) User describes intent in natural language. (B) System translates this into a structured, verifiable routine. (C) The routine executes via standard screen reader navigation. (D) User can refine logic via natural language.
From Intent to Execution: Users speak a goal (A), which transforms into a verifiable routine (B) that drives the screen reader (C).

The Intermediate Representation is the key to trust. Before execution, the user can audit the logic: "Did it interpret 'cheapest' as sorting by price?" If the logic is flawed, they can refine it naturally (e.g., "Actually, sort by length"). By sitting between the user and the application, ScreenRoutine empowers blind users to be the architects of their own automation—combining the flexibility of LLMs with the reliability of deterministic execution.

Key Contribution: A system that shifts the accessibility paradigm from rigid tool use to user-authored AI routines, restoring personal agency in automated workflows.