Christian Lawson-Perfect @christianp

6 posts6 participants1 post today

**Thomas Wouters** @Yhg1s@social.coop · 5h

You know how sometimes a little hobby side-project can get a bit out of hand? An unexpected performance regression on speed.python.org that only showed up on GCC 5 (and 7) led me to set up more rigorous tracking of Python performance when using different compilers. I'm still backfilling data but I think it's pretty awesome to see how much, and how consistently, free-threaded Python performance has improved since 3.13:

https://github.com/Yhg1s/python-benchmarking-public

Curated results from personal bench_runner benchmarks - Yhg1s/python-benchmarking-public

GitHubGitHub - Yhg1s/python-benchmarking-public: Curated results from personal bench_runner benchmarksCurated results from personal bench_runner benchmarks - Yhg1s/python-benchmarking-public

#Python #benchmarks #PEP703

**Deutschland** @de@pubeurope.com · 6h

Deutschland @de@pubeurope.com

https://www.europesays.com/de/92778/ Nintendo Switch 2: Analyse des Nvidia-Chips enthüllt Details zur Technik, simulierte Benchmarks vergleichen Performance #Benchmarks #Deutschland #Forum #gaming #Germany #Grafikkarten #HybridKonsole #Informationen #Konsole #Laptop #Laptops #Mainboard #Netbook #Netbooks #News #NintendoSwitch2 #Notebook #Notebooks #Prozessoren #Science #Science&Technology #Teardown #Teaser #Technik #Technology #Test #Testbericht #Testberichte #Tests #Video #Wissenschaft #Wissenschaft&Technik

Nintendo Switch 2: Analyse des Nvidia-Chips enthüllt Details zur Technik, simulierte Benchmarks vergleichen Performance

**Ionut Balosin** @ionutbalosin@mastodon.social · 1d

Ionut Balosin @ionutbalosin@mastodon.social

Call for Contributors – #JVM #Performance #Benchmarks

If you're interested in contributing to the #JVM #Performance #Benchmarks project - an initiative that gained significant traction in the #Java community through our recent #JDK17 and #JDK21 analyses - check out the repo:

https://github.com/ionutbalosin/jvm-performance-benchmarks

DM me or open a PR to get started

#OpenJDK #GraalVM #JMH

**Deutschland** @de@pubeurope.com · 2d

Deutschland @de@pubeurope.com

https://www.europesays.com/de/87755/ Google leakt gravierendes Redesign von Android 16 #Android16 #Benchmarks #bunt #Design #Deutschland #ExpressiveDesign #Forum #Germany #GooglePixel10Pro #GooglePixel9Pro #Grafikkarten #Informationen #Laptop #Laptops #Leak #Material3 #Material3.0 #Netbook #Netbooks #News #Notebook #Notebooks #Prozessoren #Science #Science&Technology #Screenshots #Technik #Technology #Test #Testbericht #Testberichte #Tests #Wissenschaft #Wissenschaft&Technik

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · 3d

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"Bluntly, the Y-axis simply doesn’t make much sense. And needless to say, if the Y-axis doesn’t make sense, you can’t meaningfully use the graph to make predictions. Computers can answer some questions reliably now, for example, and some not, and the graph tells us nothing about which is which or when any specific question will be solved. Or consider songwriting; Dylan wrote some in an afternoon; Leonard Cohen took half a decade on and off to write Hallelujah. Should we average the two figures? Should we sample Dylan songs more heavily because he wrote more of them? Where should songwriting go on the figure? The whole thing strikes us as absurd.

Finally, the only thing METR looked at was “software tasks”. Software might be very different from other domains, in which case the graph (even it did make sense) might not apply. In the technical paper, the authors actually get this right: they discuss carefully the possibility that the tasks used for testing might not be representative of real-world software engineering tasks. They certainly don't claim that the findings of the paper apply to tasks in general. But the social media posts make that unwarranted leap.

That giant leap seems especially unwarranted given that there has likely been a lot of recent data augmentation directed towards software benchmarks in particular (where this is feasible). In other domains where direct, verifiable augmentation is less feasible, results might be quite different. (Witness the failed letter ‘r’ labeling task depicted above.) Unfortunately, literally none of the tweets we saw even considered the possibility that a problematic graph specific to software tasks might not generalize to literally all other aspects of cognition.

We can only shake our heads."

https://garymarcus.substack.com/p/the-latest-ai-scaling-graph-and-why

Marcus on AI · 4dThe latest AI scaling graph - and why it hardly makes senseBy Gary Marcus

#AI #GenerativeAI #LLMs

**Deutschland** @de@pubeurope.com · 3d

Deutschland @de@pubeurope.com

https://www.europesays.com/de/83487/ Apple iPhone 17 Air soll winzigen Akku durch neue Schutzhülle kompensieren #Akku #AkkuHülle #AppleIPhone17Air #Benchmarks #Deutschland #Forum #Germany #Gerücht #Grafikkarten #Informationen #Laptop #Laptops #Leak #MagSafe #Netbook #Netbooks #News #Notebook #Notebooks #Prozessoren #Schutzhülle #Science #Science&Technology #Technik #Technology #Test #Testbericht #Testberichte #Tests #Wissenschaft #Wissenschaft&Technik

**Deutschland** @de@pubeurope.com · Apr 25

Apr 25

Deutschland @de@pubeurope.com

https://www.europesays.com/de/58907/ OnePlus 13T startet als kompaktes Flaggschiff mit 6.260 mAh Akku und 50 MP Tele-Kamera #50MP #50MP #AMOLED #Benchmarks #Deutschland #Flaggschiff #Forum #Germany #Grafikkarten #Informationen #Kamera #kompakt #Laptop #Laptops #Launch #Netbook #Netbooks #News #Notebook #Notebooks #OnePlus13T #Preis #Prozessoren #Science #Science&Technology #Snapdragon8Elite #Specs #Technik #Technology #Test #Testbericht #Testberichte #Tests #Wissenschaft #Wissenschaft&Technik

**Radargit** @Radargit@mastodon.social · Apr 12

Apr 12

Radargit @Radargit@mastodon.social

The Redmi 14C is here — and it's bringing serious value to the budget segment.
We've broken down the full specs and benchmark results to see how it stacks up against the competition.

Unisoc T610 processor

6.71" HD+ display

Massive 5000mAh battery

Geekbench + AnTuTu results inside

Check out the full breakdown, performance insights, and more:
https://radargit.com/2025/04/12/xiaomi-redmi-14c-specifications-and-benchmarks/

Radargit · Apr 12 Xiaomi Redmi 14C: specifications and benchmarks.Explore the Xiaomi Redmi 14C: Budget-friendly smartphone with a 6.88-inch 120Hz display, 50MP AI camera, and 5160mAh battery.

#Redmi14C #Xiaomi #Smartphones

**hamid_reza_razeghi** @hamid_reza_razeghi@mastodon.social · Apr 12

Apr 12

hamid_reza_razeghi @hamid_reza_razeghi@mastodon.social

Meta faces flak for using an experimental Llama 4 Maverick AI to inflate benchmark scores. This prompted an apology & policy shift, now favoring the original version which lags behind OpenAI's GPT-4o, Anthropic's Claude 3.5, & Google's Gemini 1.5. Meta explained the experimental version was optimized for dialogue and excelled in LM Arena, but its benchmark reliability is debated. Meta clarifies they test various AI models, releasing the open-source version of Llama4 #AI #Meta #Llama4 #Benchmarks

**Radargit** @Radargit@mastodon.social · Apr 12

Apr 12

Radargit @Radargit@mastodon.social

Core Ultra 9 285: Performance Tests in Benchmarks and Full Specs

Intel's latest Core Ultra 9 285 is here — and we've put it through the gauntlet. From raw benchmark scores to full hardware specs, find out how it stacks up against the competition.

Read the full breakdown:
https://radargit.com/2025/04/11/core-ultra-9-285-performance-tests-in-benchmarks-and-full-specs/

Is this the new king of high-performance CPUs? Let’s talk.

Radargit · Apr 11Core Ultra 9 285: performance tests in benchmarks and full specsIntel® Core™ Ultra 9 Processor 285 , We tested this 24-core powerhouse in games, AI tasks, and 4K rendering. Full specs, benchmarks,

#Intel #CoreUltra9 #Benchmarks

**Radargit** @Radargit@mastodon.social · Apr 12 *

Apr 12 *

Radargit @Radargit@mastodon.social

Intel Core Ultra 5: Full Benchmarks + Specs Are In!
New post up with all the details on Intel’s latest Core Ultra 5 processor – performance tests, gaming benchmarks, power efficiency & more.
Check it out:

https://radargit.com/2025/04/12/intel-core-ultra-5-performance-tests-in-benchmarks-and-full-specs/

Radargit · Apr 12Intel Core Ultra 5 performance tests in benchmarks and full specsComplete specs for Intel Core Ultra 5 235 (Arrow Lake): 14 cores (6P+8E), 5GHz boost, 3nm process, DDR5-6400 support, Arc Graphics,

#Intel #CPUs #TechNews

**ComputerBase** @ComputerBase@mastodon.social · Apr 11

Apr 11

ComputerBase @ComputerBase@mastodon.social

RDNA 4 × Linux im Test: Benchmarks der Radeon RX 9070 XT unter Arch Linux https://www.computerbase.de/artikel/grafikkarten/amd-radeon-rx-9070-xt-linux-test.91853/ #linux #benchmarks #amd

ComputerBaseAMD Radeon RX 9070 (XT) unter Linux im TestAber wie schlägt sich AMD RDNA 4 in Spielen unter Linux? ComputerBase hat den Test mit Radeon RX 9070 XT und Arch Linux gemacht.

**Nicole Hennig** @nic221@techhub.social · Apr 10

Apr 10

Nicole Hennig @nic221@techhub.social

Announcing the OpenAI Pioneers Program https://openai.com/index/openai-pioneers-program/ #AI #evals #benchmarks

Text Shot: We believe that industries like legal, finance, insurance, healthcare, accounting, and many others are missing a unified source of truth for model benchmarking.

We are excited to spend time assisting eval creation with multiple companies in each sector over the coming months. Our team will work intensively with each company to design evals tailored to their domain—establishing clear benchmarks that guide model development and improve trust in AI systems, and sharing them publicly. Industry specific evals will be published at a later date.

**Hacker News** @h4ckernews@mastodon.social · Apr 8

Apr 8

Hacker News @h4ckernews@mastodon.social

Meta got caught gaming AI benchmarks

https://www.theverge.com/meta/645012/meta-llama-4-maverick-benchmarks-gaming

The Verge · Apr 8Meta got caught gaming AI benchmarksBy Kylie Robison

#HackerNews #Meta #AI

**Korrespondent zur See** @Hinnerk@mastodon.social · Apr 7

Apr 7

Korrespondent zur See @Hinnerk@mastodon.social

Hallo schlaues Fediverse, ich tauche gerade in ein völlig absurdes #Rabbithole und mein M1 Macbook hat dank #LlmStudio und #Ollama seinen Lüfter wiederentdeckt… Aktuell ist lokal bei #LLM mit 8-12b Schluss (32GB Ram). Gibt es irgendwo #Benchmarks die mir bitte ausreden, dass das mit einem M4 >48GB RAM drastisch besser wird? Oder wäre was ganz anderes schlauer? Oder anderes Hobby? Muss Mobil (erreichbar) sein, weil zu unsteter Lebenswandel für ein Desktop. Empfehlungen gern in den Kommentaren.

0%M4 MacBook bevor der Zoll es noch teurer macht
0%Klick dir was bei Hetzner (oder z.b. Empfehlung)
0%Linux Notebook mit fetter Grafikkarte (Tipps?)

**Radargit** @Radargit@mastodon.social · Apr 5

Apr 5

Radargit @Radargit@mastodon.social

Motorola Edge 30 Fusion: Specifications and Benchmarks

Ready to meet Motorola’s sleek performer? The Edge 30 Fusion blends power and style with impressive specs and solid benchmark results. Find out how it stacks up!

Read more: https://radargit.com/2025/04/05/motorola-edge-30-fusion-specifications-and-benchmarks/

Radargit · Apr 5Motorola Edge 30 Fusion: specifications and benchmarks.Motorola Edge 30 Fusion: Flagship Snapdragon 888+, 144Hz pOLED display, 50MP OIS camera, & 68W TurboPower charging. Sleek design.

#MotorolaEdge30Fusion #Smartphones #Benchmarks

**ResearchBuzz: Firehose** @researchbuzz_firehose@rbfirehose.com · Mar 29

Mar 29

ResearchBuzz: Firehose @researchbuzz_firehose@rbfirehose.com

Mashable: A new AI test is outwitting OpenAI, Google models, among others. “The Arc Prize Foundation, a nonprofit that measures AGI progress, has a new benchmark that is stumping the leading AI models. The test, called ARC-AGI-2 is the second edition ARC-AGI benchmark that tests models on general intelligence by challenging them to solve visual puzzles using pattern recognition, context […]

https://rbfirehose.com/2025/03/29/mashable-a-new-ai-test-is-outwitting-openai-google-models-among-others/

#openai #llm #largelanguagemodelsllm_

**Nicole Hennig** @nic221@techhub.social · Mar 29

Mar 29

Nicole Hennig @nic221@techhub.social

The real reason AI benchmarks haven’t reflected economic impacts https://epochai.substack.com/p/the-real-reason-ai-benchmarks-havent #AI #benchmarks

But why have researchers focused on tasks that are "just within reach"? One practical reason is that benchmarks have largely been constructed to provide effective training signals for improving Al models - tasks that are too easy or too hard don't generate useful feedback. And if all you care about is whether one model outperforms another, you don't need realistic tasks - just benchmarks for which differences in score correlate with differences in a broader range of capabilities.

**Nicole Hennig** @nic221.bsky.social@bsky.brid.gy · Mar 29

Mar 29

Nicole Hennig @nic221.bsky.social@bsky.brid.gy

The real reason AI benchmarks haven’t reflected economic impacts https://epochai.substack.com/p/the-real-reason-ai-benchmarks-havent #AI #benchmarks

**samurro** @samurro@fosstodon.org · Mar 26

Mar 26

samurro @samurro@fosstodon.org

Wow #google published internal #benchmarks for their new #AI model, showing numbers in comparison to other models. So this is the new way to increase #stockvalue huh? #Marketing #llm

Recent searches

Search options

Administered by:

Server stats:

#benchmarks