Christian Lawson-Perfect @christianp

0 posts0 participants0 posts today

**Hacker News** @h4ckernews@mastodon.social · Apr 7

Hacker News @h4ckernews@mastodon.social

Benchmarking LLM social skills with an elimination game

https://github.com/lechmazur/elimination_game

A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each othe...

GitHubGitHub - lechmazur/elimination_game: A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each otherA multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each othe...

#HackerNews #Benchmarking #LLM

**Hacker News** @h4ckernews@mastodon.social · Apr 3

Apr 3

Hacker News @h4ckernews@mastodon.social

Benchi – A benchmarking tool written in Go

https://github.com/ConduitIO/benchi

Benchmark any tool from the CLI. Contribute to ConduitIO/benchi development by creating an account on GitHub.

GitHubGitHub - ConduitIO/benchi: Benchmark any tool from the CLIBenchmark any tool from the CLI. Contribute to ConduitIO/benchi development by creating an account on GitHub.

#HackerNews #Benchi #Go

Replied in thread

**Xavier B.** @xibe@boitam.eu · Apr 1

Apr 1

Xavier B. @xibe@boitam.eu

@mariejulien À mon avis tu n'as pas encore trouvé ton PMF (Pouët/Market Fit).

Capture d'une notification de repouëts, indiquant "badiane et 136 autres ont boosté votre message". La première ligne du pouët est visible : "Comme à chaque 1er avril, Le Gorafi (...)".

#knowYourAudience #benchmarking #notInKansasAnymore

**LavX News** @lavxnews@mastodon.cloud · Mar 21

Mar 21

LavX News @lavxnews@mastodon.cloud

Unveiling the Truth: Document AI Benchmarking and Performance Insights

In a landscape saturated with claims of accuracy, a recent benchmark study sheds light on the realities of document AI performance. By evaluating different AI pipelines using the CUAD dataset, the fin...

https://news.lavx.hu/article/unveiling-the-truth-document-ai-benchmarking-and-performance-insights

#news #tech #Benchmarking

**C++Now** @cppnow@mastodon.social · Mar 20 *

Mar 20 *

C++Now @cppnow@mastodon.social

C++Now 2025 SESSION ANNOUNCEMENT: Explore microbenchmark With beman.inplace_vector by River Wu

https://schedule.cppnow.org/session/2025/explore-microbenchmark-with-beman-inplace_vector/

schedule.cppnow.orgExplore microbenchmark With beman.inplace_vector – C++Now Schedule

#benchmarking #cplusplus #cpp

**B166IR** @b166ir@k2pk.com · Mar 12 *

Mar 12 *

B166IR @b166ir@k2pk.com

https://youtu.be/J4qwuCXyAcU

In this video, Ollama vs. LM Studio (GGUF), showing that their performance is quite similar, with LM Studio’s tok/sec output used for consistent benchmarking.

What’s even more impressive? The Mac Studio M3 Ultra pulls under 200W during inference with the Q4 671B R1 model. That’s quite amazing for such performance!

youtu.be- YouTubeEnjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

#LLMs #AI #MachineLearning

**HGPU group** @hgpu@mast.hpc.social · Mar 10

Mar 10

HGPU group @hgpu@mast.hpc.social

A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading

#OpenMP #Benchmarking #Performance #Package

https://hgpu.org/?p=29809

hgpu.org · Mar 10A Microbenchmark Framework for Performance Evaluation of OpenMP Target OffloadingWe present a framework based on Catch2 to evaluate performance of OpenMP’s target offload model via micro-benchmarks. The compilers supporting OpenMP’s target offload model for heteroge…

**Andrew Jones (hpcnotes)** @hpcnotes@mast.hpc.social · Mar 1

Mar 1

Andrew Jones (hpcnotes) @hpcnotes@mast.hpc.social

UK based #HPC benchmarking role at Microsoft

Requires real experience with hands on HPC #benchmarking - porting, compiling, tuning, performance analysis etc. of scientific codes on HPC systems

https://buff.ly/fKfQz6j

LinkedInLinkedIn: Log In or Sign Up1 billion members | Manage your professional identity. Build and engage with your professional network. Access knowledge, insights and opportunities.

**HGPU group** @hgpu@mast.hpc.social · Feb 24

Feb 24

HGPU group @hgpu@mast.hpc.social

Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment

#Security #DeepSeek #LLM #Cloud #Performance #Benchmarking

https://hgpu.org/?p=29782

hgpu.org · Feb 24Evaluating the Performance of the DeepSeek Model in Confidential Computing EnvironmentThe increasing adoption of Large Language Models (LLMs) in cloud environments raises critical security concerns, particularly regarding model confidentiality and data privacy. Confidential computin…

**LavX News** @lavxnews@mastodon.cloud · Feb 19

Feb 19

LavX News @lavxnews@mastodon.cloud

Benchmarking Made Easy: A Deep Dive into Go and Python Performance Testing

Benchmarking is crucial for software performance, and both Go and Python offer powerful tools for developers. This article explores how to effectively implement benchmarking in both languages, highlig...

https://news.lavx.hu/article/benchmarking-made-easy-a-deep-dive-into-go-and-python-performance-testing

#news #tech #Benchmarking

**Andrew Jones (hpcnotes)** @hpcnotes@mast.hpc.social · Feb 17

Feb 17

Andrew Jones (hpcnotes) @hpcnotes@mast.hpc.social

Olga Pearce from LLNL giving a talk on #benchmarking for #HPC at #MW25NZ

Proposing a specification for running HPC benchmarks - benchpark - to help automation, reuse, reproducibility, tracking, etc.

**Jeff Fortin T.** @nekohayo@mastodon.social · Feb 10

Feb 10

Jeff Fortin T. @nekohayo@mastodon.social

The rabbithole investigation of Nautilus' very slow cold-disk-cache folders loading performance continued this week end.
Latest findings here: https://gitlab.gnome.org/GNOME/nautilus/-/issues/3374#note_2345406

A Sysprof flamegraph showing what happens when you repeatedly reload a 2000-items folder on a warm disk cache

A heatmap table showing that thumbnail checking attributes for files have a huge cost on folder load performance, the difference between 15+ seconds and 1-2 seconds.

#GNOMEFiles #Nautilus #GNOME

**James Young** @pronoiac@mefi.social · Feb 8

Feb 8

James Young @pronoiac@mefi.social

Surely someone's looked into this: if I wanted to store millions or billions of files on a filesystem, I wouldn't store them in one single subdirectory / folder. I'd split them up into nested folders, so each folder held, say, 100 or 1000 or n files or folders. What's the optimum n for filesystems, for performance or space?
I've idly pondered how to experimentally gather some crude statistics, but it feels like I'm just forgetting to search some obvious keywords.
#BillionFileFS #linux #filesystems #optimization #benchmarking

**HGPU group** @hgpu@mast.hpc.social · Feb 3

Feb 3

HGPU group @hgpu@mast.hpc.social

Thesis: Modernization and Optimization of MPI Codes

#MPI #OpenMP #Benchmarking #Performance #Package

https://hgpu.org/?p=29718

hgpu.org · Feb 3Modernization and Optimization of MPI CodesMPI has become the de facto standard for distributed memory computing since its inception in 1994. While the MPI standard has evolved to include new technologies like RDMA, many applications still …

**Stefan Marr** @smarr@mastodon.acm.org · Feb 3

Feb 3

Stefan Marr @smarr@mastodon.acm.org

Our benchmarking tool got a new release, ReBench 1.3

Important changes:
- better support for environment variables
- more predictable handling of build commands
- support for machine-specific settings
- tool to reduce measurement noise is more robust

https://github.com/smarr/ReBench/releases/tag/v1.3.0

#benchmarking #languageImplementation #experiments

Continued thread

**Microsoft DevBlogs** @msftdevblogs@dotnet.social · Jan 31

Jan 31

Microsoft DevBlogs @msftdevblogs@dotnet.social

Join the conversation and optimize your projects!

#VisualStudio #Benchmarking #PerformanceOptimization

This thread was auto-generated from the original post, which can be found here: https://devblogs.microsoft.com/visualstudio/benchmarking-with-visual-studio-profiler/.

Visual Studio Blog · Jan 7Benchmarking with Visual Studio Profiler - Visual Studio BlogWe have updated BenchmarkDotNet diagnosers, allowing you to use more of the tools in the performance profiler to analyze benchmarks. With this change it is super quick to dig into CPU usage and allocations of benchmarks making the measure, change, measure cycle quick and efficient.

**Leiden Madtrics** @leidenmadtrics@social.cwts.nl · Jan 30

Jan 30

Leiden Madtrics @leidenmadtrics@social.cwts.nl

New blogpost!

Benchmarking - an appropriate method for evaluating research units? Thed van Leeuwen and Frank van Vree explore possibilities and caveats, particularly in the context of the Dutch Strategy Evaluation Protocol (SEP).

You can read the bi-lingual post here:
𝘌𝘕𝘎 https://www.leidenmadtrics.nl/articles/benchmarking-in-research-evaluations-we-can-do-without-it
𝘕𝘓 https://www.leidenmadtrics.nl/articles/benchmarking-bij-onderzoeksevaluaties-we-kunnen-zonder

**#benchmarking** **#ResearchEvaluation**

www.leidenmadtrics.nlBenchmarking in research evaluations: we can do without itThe Strategy Evaluation Protocol (SEP) 2021-2027 proposes benchmarking as a method for evaluating research units. But what exactly does this entail and what are the risks? Our authors dive deeper into this topic and show what is possible and what to be careful about.

**LavX News** @lavxnews@mastodon.cloud · Jan 30

Jan 30

LavX News @lavxnews@mastodon.cloud

Evaluating LLMs: Moving Beyond Intuition in AI Development

As AI models proliferate, developers grapple with how to evaluate the effectiveness of large language models (LLMs) like GPT-4o. This article delves into the challenges of benchmarking LLMs and offers...

https://news.lavx.hu/article/evaluating-llms-moving-beyond-intuition-in-ai-development

#news #tech #AIDevelopment

**ResearchBuzz: Firehose** @researchbuzz_firehose@rbfirehose.com · Jan 28

Jan 28

ResearchBuzz: Firehose @researchbuzz_firehose@rbfirehose.com

ZDNet: ‘Humanity’s Last Exam’ benchmark is stumping top AI models – can you do any better?. “On Thursday, Scale AI and the Center for AI Safety (CAIS) released Humanity’s Last Exam (HLE), a new academic benchmark aiming to ‘test the limits of AI knowledge at the frontiers of human expertise,’ Scale AI said in a release. The test consists of 3,000 text and multi-modal questions on more than […]

https://rbfirehose.com/2025/01/28/zdnet-humanitys-last-exam-benchmark-is-stumping-top-ai-models-can-you-do-any-better/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Jan 28ZDNet: ‘Humanity’s Last Exam’ benchmark is stumping top AI models – can you do any better? | ResearchBuzz: Firehose

Recent searches

Search options

Administered by:

Server stats:

#benchmarking