Intro

A quick, practical day: a terrifying one‑click token theft that should change how you use web IDEs; a clever hack that lets an idle NVIDIA GPU act as extra RAM; and a Stanford study that raises uncomfortable questions about AI in professional education. I’ll summarize each, then dig into the token exploit and the VRAM swap project.

In Brief

Microsoft’s new MAI-Code-1-Flash model lands in Copilot

Why this matters now: Developers using GitHub Copilot will start getting suggestions from Microsoft’s MAI‑Code‑1‑Flash model, which claims higher pass rates and lower token use versus a competitor — and that could reshape Copilot’s latency and cost profile.

Microsoft announced MAI‑Code‑1‑Flash, an inference‑efficient coding model integrated into GitHub Copilot for VS Code that the company says was “trained directly with GitHub Copilot harnesses used in production.” Microsoft presents benchmark wins and says Flash uses fewer tokens for hard problems, which matters if you care about interactive latency and Copilot costs. HN reaction was skeptical: commenters flagged opaque specs (a “137B” model with only a small fraction of active parameters), possible cherry‑picked benchmarks, and that this is as much a strategic move to keep Copilot inside Azure as it is a technical advance. If you rely on Copilot daily, watch rollout details and any pricing changes closely — the model may be technically solid but it’s part of a broader business play.

BYD parts under the CT scanner

Why this matters now: The CT teardown of BYD components offers a rare, visual look at why BYD’s vertical integration helps it undercut rivals and grow rapidly — useful context for EV buyers and competitors.

A tech outlet ran CT scans of four BYD components to show internal construction and highlight that BYD builds roughly 75% of its vehicle parts in‑house. The scans are a neat peek at build quality and compact engineering decisions, and they feed into a larger point: vertical integration can reduce supplier margins and accelerate scaling. Hacker News readers split between impressed with value engineering and worried about long‑term repairability or vendor lock‑in. If you follow EV supply chains, the scans are a compact data point on how BYD organizes production.

Stanford study: AI answers preferred over professors’ replies

Why this matters now: Law schools and legal tech teams should pay attention — Stanford’s blind evaluation suggests AI can produce pedagogically useful answers that professors actually prefer in many cases.

A Stanford team reports that, in nearly 3,000 pairwise comparisons, law professors preferred AI‑generated answers to contract‑law student questions about 75% of the time, and flagged AI responses as pedagogically harmful far less often. The paper is provocative: it moves the debate from “can AI write plausible text?” to “can AI meet nuanced, discipline‑specific expectations?” Critics on HN raised methodological concerns — small sample size, model selection bias, and the usual risk that confident‑sounding text can still hallucinate legal citations. The result doesn’t mean clinics should stop supervising students, but it should accelerate experiments with AI as a tutoring or drafting assistant — with guardrails.

Deep Dive

1‑Click GitHub Token Stealing via a VSCode Bug

Why this matters now: Anyone who’s opened a repo in github.dev (or used the vulnerable desktop flow) may have exposed a broad‑scope GitHub OAuth token that an attacker can steal with a single click.

The vulnerability disclosure and proof‑of‑concept from the researcher lays out a short, elegant chain that starts in an untrusted webview iframe on github.dev and ends with exfiltration of a GitHub token. The core problem is a design choice in the webview messaging: untrusted iframe content can synthesize keydown events and send them to the host via a did‑keydown message. That synthetic input can be chained to accept an extension recommendation, trigger an extension install or a custom keybinding that bypasses publisher trust, and finally run code that reads and sends your broad‑scope OAuth token offsite. As the author puts it:

"Just by clicking a link, it’s possible for an attacker to steal a GitHub token that can read and write to your repos, including private ones." (see the original writeup)

This attack is notable for three reasons. First, it requires minimal user interaction — a single click can be enough because github.dev’s flow uses broad tokens and, reportedly, lacks CSRF protections. Second, the chain abuses the extension and keybinding model rather than a classic DOM bug, which makes the path subtle and easier to miss in threat models. Third, the consequences are high: stolen tokens with repo scopes can let attackers exfiltrate code, raise CI secrets, or push malicious commits.

Short‑term mitigations are practical: clear github.dev site data before a session, avoid following unknown links into the web IDE, and audit or uninstall suspicious local extensions. Longer term, the community and HN commenters pushed for systemic fixes: launch web IDE sessions with per‑repo, narrowly scoped tokens (codespaces already does this), isolate extensions into profiles, treat extensions like full Node apps that need stricter sandboxing, and add obvious UX friction before granting elevated extension permissions. If you host or use web‑based IDEs, treat this as a design lesson: any path that elevates untrusted input into a privileged host context deserves zero‑trust boundaries and explicit user confirmation.

Use your NVIDIA GPU's VRAM as swap space on Linux (nbd‑vram)

Why this matters now: For laptops with soldered RAM and an underused discrete NVIDIA GPU, the nbd‑vram project can dramatically reduce swap latency and increase usable memory without buying new hardware.

The nbd‑vram project exposes GPU memory as a block device via NBD (Network Block Device) by allocating VRAM through the CUDA driver and serving it over a Unix socket. The daemon "allocates VRAM via the CUDA driver API, then serves it as a block device using the NBD protocol," which the kernel can then use for swap. The result in the author’s tests: combining zram, SSD swap and ~7 GB of VRAM produced about ~46 GB of addressable memory, up from stock, and latency for real‑world small page faults dropped from ~9 ms on a sleeping NVMe drive to ~335 µs on VRAM.

This trick works because many laptops have a relatively powerful discrete GPU that sits idle during non‑gaming workloads. VRAM doesn’t wear like flash, so using it for ephemeral swap avoids SSD wear. But there are tradeoffs and caveats. VRAM allocation can interfere with GPU‑heavy workloads: reclaiming memory for Wayland compositors or games may cause desktop instability. The current implementation uses userspace bounce buffers (NBD), so there’s room for performance improvements — commenters suggested using ublk or kernel offloads to avoid the extra copy. Also, the feature depends on CUDA drivers and NVIDIA stack limitations, and it’s niche: if you’re not RAM‑starved or you frequently use GPU compute/gaming, this probably isn’t for you.

For power users with soldered RAM or those doing heavy local development in constrained machines, nbd‑vram is an elegant, pragmatic workaround. It’s not a replacement for more system memory, but it’s a clever use of otherwise idle hardware that can make a machine feel significantly snappier under memory pressure.

Closing Thought

Today’s thread ties into one theme: tooling that trades convenience for expanded risk or opportunity. The VS Code/github.dev exploit shows how small UX decisions and extension trust models can cascade into major security failures. The VRAM swap work shows how creative engineering can reclaim underused resources to solve practical limits. And the Stanford study reminds us that AI’s competence can outpace our assumptions — which means we must design systems that leverage AI’s strengths while guarding for its weaknesses. Keep your tokens narrow, know what hardware you can safely repurpose, and treat AI outputs as useful but still in need of oversight.

Sources