Engineering March 2026 · 6 min read

How We Made AI Testing 200x Faster with Screen HTML

Why sending screenshots to AI models is the wrong approach, and how converting iOS screens to semantic HTML changed everything.

The Problem: Screenshots Are Slow and Dumb

Every AI testing tool follows the same pattern: take a screenshot, send it to a vision model, ask "what's on screen?" The AI squints at pixels and guesses.

This approach has three fatal problems:

When you're running an autonomous QA agent that needs to read the screen 10-20 times per test, 20-30 seconds per read means a simple test takes 5+ minutes. Most of that time is wasted staring at screenshots.

The Insight: LLMs Already Understand HTML

Here's what we realized: LLMs are trained on billions of web pages. They parse HTML natively. They know that <button> is tappable, <input> takes text, and <p> is content.

So instead of sending a screenshot and asking "what do you see?", what if we converted the iOS screen into HTML and sent that?

<!-- What the AI receives -->
<screen name="Booking Confirmation">
  <header>Confirm Your Ride</header>
  <p data-center="195,120">Mumbai → Pune</p>
  <p data-center="195,160">Seats: 2</p>
  <p data-center="195,200" class="price">Total: ₹249</p>
  <button data-center="195,450" id="pay-btn">Pay Now</button>
  <button data-center="195,510" id="cancel">Cancel</button>
</screen>

The AI instantly knows: there are two buttons, the price is ₹249, and to tap "Pay Now" it should target coordinates (195, 450). No vision processing. No guessing. No ambiguity.

The Results

MetricScreenshotScreen HTMLImprovement
Screen read time20-30 seconds~100ms200x faster
Token cost per read~2,000 tokens (image)~200 tokens (text)10x cheaper
Tap accuracy~70% (guessing coordinates)~95% (exact coordinates)Far more reliable
Element identificationOften wrongAlways correctNo ambiguity

A test that took 3-5 minutes now takes 30-60 seconds. Not because the AI got faster — because we stopped wasting time on screenshots.

How It Works Under the Hood

NoobQA uses the Noober SDK — a lightweight iOS debugging library that runs inside your app. When the AI agent calls noober_screen_html, here's what happens:

The key insight is that iOS view hierarchies and HTML DOM trees are structurally identical. A UIButton maps directly to <button>. A UITextField maps to <input>. The mapping is natural, not forced.

Why Not Just Use the Accessibility Tree?

Good question. iOS has a built-in accessibility API that exposes the view hierarchy. Most testing tools use it (via XCUITest or similar). But:

Our Screen HTML approach reads the actual view hierarchy (not the accessibility layer), runs in-process (no IPC overhead), and adds semantic context from the view type and content.

What This Enables

Fast screen reading unlocks capabilities that are impossible with screenshots:

The Bigger Picture

Screenshots are a crutch. They exist because existing tools don't have access to the app's internal state. If you can see inside the app — its view hierarchy, its network requests, its logs — you don't need to squint at pixels.

This is the core idea behind NoobQA: don't test from the outside looking in. Test from the inside looking out.

Screen HTML is just one example. Noober also gives the AI access to network requests (noober_assert_request), analytics events (noober_check_event), and app logs (noober_get_app_logs) — all things that are invisible to screenshot-based tools.

The result is QA testing that's not just faster, but fundamentally deeper. You're not just checking "does the button look right?" You're checking "did the API return the right data, did the analytics event fire, and is the UI showing the correct calculation?"

That's what we built. And it runs in under 60 seconds.

Try it yourself

100 free AI turns/month. No credit card.

Join Waitlist