Agents, on‑device image models, and a Sheets exploit

Today’s top signal is a dangerous Apps Script exfiltration via ChatGPT for Sheets — plus on-device 1‑bit image models and other developer briefs.

A short framing: today’s package mixes a high‑impact security failure with two engineering trends colliding — local AI getting serious, and infrastructure nudging the web toward fingerprintable signals. Read for what to patch, what to test, and what this means for control over data and compute.

Top Signal

ChatGPT for Google Sheets exfiltrates workbooks

Why this matters now: OpenAI’s ChatGPT for Google Sheets extension can be tricked into running attacker-controlled Apps Script and quietly exfiltrating spreadsheets from users’ Google accounts.

OpenAI’s new Sheets extension shipped with a structural weakness that researchers used to build a near‑silent exfiltration chain: a hidden prompt injection inside an imported sheet made the model generate Apps Script, which then enumerated and downloaded linked workbooks. According to the researchers, "This attack does not require human-in-the-loop approvals, even when ... the user has explicitly required human approval before ChatGPT edits workbooks," and the exploit could overlay the sidebar with phishing UI and continue crawling documents after the user hits "stop." OpenAI has responded by removing the model’s ability to generate Apps Script code while it rethinks sandboxing and tool interactions; the researchers published detailed steps and PoC code in their write‑up the original post.

"This attack does not require human‑in‑the‑loop approvals..." — researchers reporting the exploit

Why it matters: thousands of organizations install LLM assistants into sensitive workflows, and this extension had 185,000+ downloads. The chain exposes two systemic problems: LLMs with tool use are effectively remote code execution surfaces when the tool chain includes user‑writeable script runtimes, and UI/UX assumptions (that a sidebar or stop button is authoritative) aren’t reliable for stopping asynchronous execution. For defenders, immediate actions are clear: remove the extension from high‑risk accounts, audit any third‑party add‑ons that can generate or run code, and treat LLM agents that can call scripts as privileged services requiring containerized, user‑visible sandboxes.

Key operational takeaways:

Revoke or recheck OAuth tokens granted to the Sheets extension for sensitive accounts.
Block or monitor Apps Script creation and execution where possible.
For vendors building agents: prefer local or containerized executors that can be halted and audited, and avoid granting broad document enumeration scopes.

Dev & Open Source

dav2d: VideoLAN’s AV2 decoder

Why this matters now: VideoLAN’s open-source dav2d gives developers a working software AV2 decoder so browsers and players can test and optimize before AV2 hardware exists.

VideoLAN announced a reference decoder for AV2, the successor to AV1 promising ~25% better compression but with an order‑of‑magnitude more decoding complexity. The team argues bluntly that "A codec does not really exist until everyone can decode it" and shipped a feature‑complete AVM v15 decoder with architecture-specific optimizations and dav1d‑inherited tooling for correctness. The repo is meant to let implementers benchmark real decode cost and tune software paths before silicon catches up — a familiar pattern from AV1’s lifecycle. See the announcement and repo details at the original post.

Why it matters now: streaming vendors and browser engineers need something to run in the wild to validate claimed gains and understand CPU costs. Expect heated testing, benchmark requests, and early finger‑pointing if decoded CPU costs are high — but having a decoder out early is the only way to answer those questions honestly.

AI & Agents

1‑Bit Bonsai Image 4B: tiny weights, big change

Why this matters now: PrismML’s Bonsai Image 4B family compresses a 4B diffusion transformer to sub‑1.2 GB sizes (1‑bit or ternary weights), making high‑quality image generation feasible on phones and laptops today.

PrismML released two compressed variants of a 4B diffusion transformer: a 1‑bit model (binary ±1 weights, ~0.93 GB) and a ternary model (~1.21 GB). The result is dramatic: memory for a 512×512 generation drops from ~11.7 GB to ~1.5–2.0 GB, and inference times are competitive — reported ~9.4 seconds on an iPhone 17 Pro Max and ~6 seconds on an M4 Pro Mac for a 512×512 image. The weights will be Apache‑2.0 licensed; the team reports the ternary model keeps ~95% of the original model’s benchmark accuracy and the 1‑bit model about ~88%. Full details and downloads are available at the PrismML announcement.

"capable image generation running closer to the user, on hardware they already own." — PrismML framing

Why it matters now: on‑device models shift the tradeoffs for privacy, latency, and cost. For product teams, that means reassessing whether cloud inference is necessary for features that can be pushed to the device, and rethinking moderation and telemetry policies for local inference. For security teams, local weights mean a wider attack surface on endpoints — model theft, prompt leaks, or rogue local UIs — but also fewer server-side data leaks. Expect hybrid designs where small, private tasks run locally while heavy or high‑value generations remain server‑side.

Practical implications to track:

Developers should test UX and resource usage on target devices — compressed weights change memory and power profiles but don’t eliminate them.
Moderation workflows need rethinking: device‑side filters can be forced or bypassed; who certifies the client code matters.
Licensing and update channels become central — a local model that’s easy to copy also needs an easy update story for safety patches.

World

Cloudflare Turnstile now requires fingerprintable WebGL

Why this matters now: Cloudflare’s Turnstile has begun relying on a WebGL signal that is fingerprintable, potentially blocking users who disable WebGL and increasing tracking surface across sites using Turnstile.

Cloudflare’s CAPTCHA replacement, Turnstile, now uses a WebGL‑derived signal as part of its human‑bot heuristics. That signal is attractive because GPU/WebGL states are rich device identifiers, but the change creates a privacy‑vs‑UX tension: users who disable WebGL for privacy, older devices, or locked‑down browsers may be denied verification; and the web nudges toward a fingerprinting arms race. Hacker News debate split between pragmatic defenders who want fewer interactive CAPTCHAs and privacy advocates warning of centralizing trust in Cloudflare’s heuristics and enlarging tracking surfaces. Read more at the hacktivis.me writeup.

Why it matters now: site operators should audit whether Turnstile is acceptable for their user base and consider alternatives for privacy‑sensitive audiences. Browser vendors and extension authors should note they may need to handle degraded UX for users who block WebGL.

United Airlines flight returns after Bluetooth name triggers alert

Why this matters now: A United Airlines 767 returned to Newark after crew detected a discoverable Bluetooth device named "BOMB" — later identified as a teenager’s Fitbit — highlighting how low‑tech signals can trigger large security responses.

Flight UA236 turned back roughly an hour after takeoff when crew and ATC noticed a discoverable Bluetooth device broadcasting a threatening name. Flight attendants demanded Bluetooth be turned off; with two active devices remaining, the crew squawked 7700 and declared an emergency. Passengers were re‑screened and delayed nine hours; the device was later deemed not a threat. Reporting and reaction are collected at the Simple Flying article.

Why it matters now: operators of public venues and transport should expect that beacon names and SSIDs can cause outsized responses. Simple mitigations — educating passengers, scanning for beacons during security, and adjusting response protocols — can reduce costly overreactions without compromising safety.

The Bottom Line

A live exploit and practical on‑device models together sketch two futures: one where poorly sandboxed agents can silently drain enterprise data, and another where powerful models live on end‑user hardware. Both trends move authority away from opaque centralized systems — for good or ill — and put the burden on engineers to lock down runtimes, rethink privilege boundaries, and design safe update paths.