lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

67
active users

#gpu

1 post1 participant0 posts today

"Like catastrophic brain damage"

"The move comes in response to an attack a team of academic researchers demonstrated against Nvidia’s RTX #A6000, a widely used #GPU for high-performance computing that’s available from many cloud services. A vulnerability the researchers discovered opens the GPU to Rowhammer, a class of attack that exploits physical weakness in DRAM chip modules that store data."

arstechnica.com/security/2025/

New #ZLUDA 5 Preview Released For #CUDA On Non-NVIDIA #GPU
For now this ability to run unmodified CUDA apps on non-#NVIDIA GPUs is focused on #AMD GPUs of the #Radeon RX 5000 series and newer, which is AMD Radeon GPUs with #ROCm. Besides CUDA code samples, GeekBench has been one of the early targets for testing.
phoronix.com/news/ZLUDA-5-prev

www.phoronix.comNew ZLUDA 5 Preview Released For CUDA On Non-NVIDIA GPUsZLUDA Version 5-preview.43 was released today as this open-source CUDA implementation for use on non-NVIDIA GPUs, with one of the current focuses being on enabling CUDA on AMD Radeon GPUs with ROCm.

#Nvidia's newest top-tier #AI #supercomputers deployed for the first time — #GraceBlackwellUltra Superchip systems deployed at #CoreWeave
#Dell's and CoreWeave's initial rollout involves Dell Integrated Racks equipped with 72 Nvidia #BlackwellUltra #GPU, 36 Arm-based 72-core Grace CPs, and 36 BlueField DPUs per rack. Each #GB300 #NVL72 rack delivers 1.1 ExaFLOPS of dense FP4 inference and 0.36 ExaFLOPS of FP8 training performance, 50% higher compared to a GB200 NVL.
tomshardware.com/tech-industry

Tom's Hardware · Nvidia's newest top-tier AI supercomputers deployed for the first time — Grace Blackwell Ultra Superchip systems deployed at CoreWeaveBy Anton Shilov

Found updated NFB (New Feature Branch) of #nvidia #GPU driver sets 575.64.03.

Filed PR for #FreeBSD #ports as Bug 287984
bugs.freebsd.org/bugzilla/show
and opened corresponding review D51144
reviews.freebsd.org/D51144

Patch there is for "-devel" variant of ports like x11/nvidia-driver-devel.

Runnign on stable/14, amd64 until last night without new issue for me.
But as my GPU on hand is old (Quadro P1000 notebook), cannot confirm that the new version solves any of known problems on GPUs with GSP in them by myself.

bugs.freebsd.orgMaking sure you're not a bot!

#ZLUDA Making Progress In 2025 On Bringing #CUDA To Non-NVIDIA #GPU
ZLUDA #opensource effort that started half-decade ago as drop-in CUDA implementation for #Intel GPUs and then for several years was funded by ##AMD as a CUDA implementation for #Radeon GPUs atop #ROCm and then open-sourced but then reverted has been continuing to push along a new path since last year. Current take on ZLUDA is a multi-vendor CUDA implementation for non-NVIDIA GPUs for #AI workloads & more.
phoronix.com/news/ZLUDA-Q2-202

www.phoronix.comZLUDA Making Progress In 2025 On Bringing CUDA To Non-NVIDIA GPUs
Can you program GPUs and do you want to become a HERO? #linuxphone
community needs your help.

We are trying record video, and have most pieces working, but one is
missing: fast enough debayering. That means about 23MB/sec on #librem5.

Debayering is not hard; camera images have subpixels split on two
lines, which need to be corrected. They also use different color
representation, but that's fixable by some table lookup and two matrix
multiplies.

Librem 5 has Vivante GPU, 4 in-order CPU cores and 3GB RAM. My feeling
is that it should be fast enough for that. If task is for some reason
impossible, that would be good to know, too.

Image data looks like this

RGRGRG...
xBxBxB...
.........
.........

Task is to turn that into usual rgbrgb.... format. rgb = RGB * color
matrix, with table lookups for better quality. I can fix that once I
get an example.

I'm looking for example code (#pinephone would work, too), reasons it
can not be done... and boosts if you have friends that can program
GPUs. #gpu #opensource

People continue to think about #AI in terms of #2010s computing, which is part of the reason everyone gets it wrong whether they're #antiAI or #tech bros.

Look, we had 8GB of #ram as the standard for a decade. The standard was set in 2014, and in 2015 #AlphaGo beat a human at #Go.

Why? Because, #hardware lags #software - in #economic terms: supply follows demand, but demand can not create its own supply.

It takes 3 years for a new chip to go through the #technological readiness levels and be released.

It takes 5 years for a new #chip architecture. E.g. the #Zen architecture was conceived in 2012, and released in 2017.

It takes 10 years for a new type of technology, like a #GPU.

Now, AlphaGo needed a lot of RAM, so how did it stagnate for a decade after doubling every two years before that?

In 2007 the #Iphone was released. #Computers were all becoming smaller, #energy #efficiency was becoming paramount, and everything was moving to the #cloud.

In 2017, most people used their computer for a few applications and a web browser. But also in 2017, companies were starting to build #technology for AI, as it was becoming increasingly important.

Five years after that, we're in the #pandemic lockdowns, and people are buying more powerful computers, we have #LLM, and companies are beginning to jack up the const of cloud services.

#Apple releases chips with large amounts of unified #memory, #ChatGPT starts to break the internet, and in 2025, GPU growth continues to outpace CPU growth, and in 2025 you have a competitor to Apple's unified memory.

The era of cloud computing and surfing the #web is dead.

The hype of multi-trillion parameter #LLMs making #AGI is a fantasy. There isn't enough power to do that, there aren't enough chips, it's already too expensive.

What _is_ coming is AI tech performing well and running locally without the cloud. AI Tech is _not_ just chatbots and #aiart. It's going to change what you can do with your #computer.

#NVIDIA #TensorCore Evolution: From Volta To Blackwell Amdahls Law, Strong Scaling, Asynchronous Execution, Blackwell, Hopper, Ampere, Turing, Volta, TMA
They introduce core features of major #datacenter #GPU, first explaining important first principles of performance engineering. Then trace evolution of Nvidia’s Tensor Core architectures and programming model, highlighting motivations behind evolution. End goal is to provide a resource for understanding Nvidia’s GPU arch
semianalysis.com/2025/06/23/nv

SemiAnalysis · NVIDIA Tensor Core Evolution: From Volta To BlackwellIn our AI Scaling Laws article from late last year, we discussed how multiple stacks of AI scaling laws have continued to drive the AI industry forward, enabling greater than Moore’s Law grow…

Review D50697 for -devel versions of #nvidia #GPU #driver sets (latest NFB 575.64) on #FreeBSD now landed on main branch of #ports tree as commit c7cde11f842b33bb36b85b91400bec795430c421.

From now on, whichever the newer version of NFB (New Feature Branch) and Production branch is tracked on -devel branch. Beta Branch is not planned to be tracked, and just being a trigger to preparing for upcoming NFB or Production Branch.

Note that as I'm not an insider of nvidia, I cannot start investigating until nvidia releases new driver sets.

#AMD's Instinct #MI355X accelerator will consume 1,400 watts. #CDNA4 challenges #BlackwellUltra.
AMD's #MI350X-series #GPU are based on #CDNA 4 architecture that introduces support for FP4 and FP6 precision formats alongside FP8 and FP16.
Both SKUs will come with 288GB HBM3E memory that will offer up to 8 TB/s of bandwidth, but MI350X will offer maximum FP4/FP6 performance of 18.45 PFLOPS, whereas the MI355X is said to push the maximum FP4/FP6 performance to 20.1 PFLOPS.
tomshardware.com/pc-components

Tom's Hardware · AMD's Instinct MI355X accelerator will consume 1,400 wattsBy Anton Shilov

Opened review for #FreeBSD Bug 287268 on Phablicator as Differential Revision D50697, which adds latest New Feature Branch (NFB) of #nvidia #GPU drivers as x11/nvidia-driver-devel, x11/linux-nvidia-libs-devel and graphics/nvidia-drm[|-510|-515|-61|-66]-kmod-devel.
Once accepted and landed, it would track the latest version of whichever NFB or Production Branch of drivers. Beta Branch of drivers wouldn't be tracked.
reviews.freebsd.org/D50697

reviews.freebsd.org⚙ D50697 [NEW PORT] x11/nvidia-driver-devel, x11/linux-nvidia-libs-devel, graphics/nvidia-drm[,510,515,61,66]-kmod-devel: Add new port

The #ComputerFrontiers25, co-organised by our colleagues Josef Weidendorfer and Amir Raoofy, starts today in Cagliari/Sardinia. It's all about new #computer technologies and system architectures.

Josef and Amir will present a tool for the systematic comparison of different #GPU programming models, which was developed at the Technische Universität München and the LRZ. You can read more about it in the slides of this post.

Find out more about #CF25:
computingfrontiers.org/2025/pr

Any #Linux #kernel ,
#graphics or #GPU people out there?

I'm trying to understand the relationship between the #amgdpu driver shipped with the kernel; and the "andgpu-dkms" driver that comes with #ROCm .

Specifically, with a recent enough kernel, do we really need to install the ROCm version of the driver? Does the ROCm version contain stuff the general driver does not? Or is the ROCm stack (esp. libhsa) tightly tied to a very specific version of the driver?

Continued thread

An #Nvidia spox said the company was evaluating its "limited" options. "Until we settle on a new product design & receive approval from the #US govt, we are effectively foreclosed from #China's $50 billion #DataCenter market."

China remains a huge #market for Nvidia, accounting for 13% of its sales in the past financial year. It's the 3rd time Nvidia has had to tailor a #GPU for the world's 2nd-largest #economy after restrictions from the US who is keen to stymie Chinese #tech development.

Continued thread

The #GPU or graphics processing unit will be part of #Nvidia's latest generation #Blackwell-architecture #AI processors & is expected to be priced between $6,500 & $8,000, well below the $10,000-$12,000 the #H20 sold for, acc/to 2 of the sources.

The lower price reflects its weaker specifications & simpler manufacturing requirements.

It will be based on Nvidia's RTX Pro 6000D, a server-class graphics processor & will use conventional GDDR7 memory instead of more advanced high bandwidth memory.

Examining #RDNA4's out-of-order memory accesses in detail, and investigating with testing
#RDNA 4's memory subsystem enhancements are exciting and improve performance across a variety of workloads. #AMD specifically calls out benefits in #raytracing. But RDNA 4's scheme for handling memory dependencies isn't fundamentally different from that of #GCN many years ago. RDNA 4’s makes the most significant change to AMD’s #GPU memory subsystem since RDNA launched in 2019.
chipsandcheese.com/p/rdna-4s-o

Chips and Cheese · RDNA 4's "Out-of-Order" Memory AccessesBy Chester Lam