mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

2.8K
active users

#reinforcementlearning

6 posts5 participants0 posts today

[AGI discussion, DeepMind] Welcome to the Era of Experience
storage.googleapis.com/deepmin
old.reddit.com/r/MachineLearni

* threshold of new era in AI that promises unprecedented level of ability
* new generation of agents will acquire superhuman capabilities, learning predominantly f. experience
* paradigm shift, accompanied by algorithmic advancements in RL, will unlock new supra-human capabilities

#Google#DeepMind#AI

📄 Nuestro último artículo "MELGYM: A dynamic control interface for MELCOR simulations" ha sido publicado en la revista SoftwareX.

🔗 sciencedirect.com/science/arti

Presentamos MELGYM, una interfaz en Python que permite el control interactivo de simulaciones con MELCOR, un código ampliamente utilizado para el análisis de seguridad en instalaciones nucleares como IFMIF-DONES.

Can reinforcement learning for LLMs scale beyond math and coding tasks? Probably

arxiv.org/abs/2503.23829

arXiv logo
arXiv.orgExpanding RL with Verifiable Rewards Across Diverse DomainsReinforcement learning (RL) with verifiable rewards (RLVR) has shown promising results in mathematical reasoning and coding tasks where well-structured reference answers are available. However, its applicability to broader domains remains underexplored. In this work, we study the extension of RLVR to more diverse domains such as medicine, chemistry, psychology, and economics. We observe high agreement in binary judgments across different large language models (LLMs) when objective reference answers exist, which challenges the necessity of large-scale annotation for training domain-specific reward models. To address the limitations of binary rewards when handling unstructured reference answers, we further incorporate model-based soft scoring into RLVR to improve its flexibility. Our experiments show that a distilled generative reward model can serve as an effective cross-domain verifier, providing reliable reward signals for RL without requiring domain-specific annotations. By fine-tuning a base 7B model using various RL algorithms against our reward model, we obtain policies that outperform state-of-the-art open-source aligned LLMs such as Qwen2.5-72B-Instruct and DeepSeek-R1-Distill-Qwen-32B by a large margin, across domains in free-form answer settings. This also strengthens RLVR's robustness and scalability, highlighting its potential for real-world applications with noisy or weak labels.

@lianna Well, most #AIs and #robots in fiction I think their inputs are mostly or fully sensory-based, and they learn in real time through #ReinforcementLearning - esque techniques. AIs like LLMs are frozen in place (they never update and are just replaced over time), and they do not have any meanful interaction to the real world, nor like reflection.

I'd think that robots like #Sophia a few years ago would be more closer to the former than the latter, but #AIBros love conflating the twos.

Happy birthday to Cognitive Design for Artificial Minds (lnkd.in/gZtzwDn3) that was released 4 years ago!

Since then its ideas have been presented and discussed widely in the research fields of AI/Cognitive Science/Robotics and - nowadays - both the possibilities and the limitations of: #LLMs, #GenerativeAI and #ReinforcementLearning (already envisioned and discussed in the book) have become a common topic of research interests in the AI community and beyond.
Similarly also the topic concerning the evaluation - in human-like and human-level terms - of the current AI systems has become a critical theme related to the problem Anthropomorphic interpretation of AI output (see e.g. lnkd.in/dVi9Qf_k ).
Book reviews have been published on ACM Computing Reviews (2021) lnkd.in/dWQpJdkV and on Argumenta (2023): lnkd.in/derH3VKN

I have been invited to present the content of the book in over 20 official scientific events in international conferences, Ph.D Schools in US, China, Japan, Finland, Germany, Sweden, France, Brazil, Poland, Austria and, of course, Italy.

A news I am happy to share is that Routledge/Taylor & Francis contacted me few weeks ago for a second edition! Stay tuned!

The #book is available in many webstores:
- Routledge: lnkd.in/dPrC26p
- Taylor & Francis: lnkd.in/dprVF2w
- Amazon: lnkd.in/dC8rEzPi

@academicchatter @cognition
#AI #minimalcognitivegrid #CognitiveAI #cognitivescience #cognitivesystems