mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

3K
active users

#reinforcementlearning

2 posts2 participants0 posts today

Happy birthday to Cognitive Design for Artificial Minds (lnkd.in/gZtzwDn3) that was released 4 years ago!

Since then its ideas have been presented and discussed widely in the research fields of AI/Cognitive Science/Robotics and - nowadays - both the possibilities and the limitations of: #LLMs, #GenerativeAI and #ReinforcementLearning (already envisioned and discussed in the book) have become a common topic of research interests in the AI community and beyond.
Similarly also the topic concerning the evaluation - in human-like and human-level terms - of the current AI systems has become a critical theme related to the problem Anthropomorphic interpretation of AI output (see e.g. lnkd.in/dVi9Qf_k ).
Book reviews have been published on ACM Computing Reviews (2021) lnkd.in/dWQpJdkV and on Argumenta (2023): lnkd.in/derH3VKN

I have been invited to present the content of the book in over 20 official scientific events in international conferences, Ph.D Schools in US, China, Japan, Finland, Germany, Sweden, France, Brazil, Poland, Austria and, of course, Italy.

A news I am happy to share is that Routledge/Taylor & Francis contacted me few weeks ago for a second edition! Stay tuned!

The #book is available in many webstores:
- Routledge: lnkd.in/dPrC26p
- Taylor & Francis: lnkd.in/dprVF2w
- Amazon: lnkd.in/dC8rEzPi

@academicchatter @cognition
#AI #minimalcognitivegrid #CognitiveAI #cognitivescience #cognitivesystems

The article provides good insights into industry leaders such as Waymo, DeepMind, and Amazon demonstrate the transformative power of Reinforcement Learning (RL).

Takeaways:
➡️ RL drives autonomy and innovation across industries, but challenges like interpretability remain pivotal.
➡️ Hybrid systems that blend RL and symbolic reasoning hint at breakthroughs in high-level decision-making.

computer.org/publications/tech

IEEE Computer Society · Reinforcement Learning in Agentic SystemsThis article explores the role of RL in agentic systems and showcase its transformative impact across industries.

Self-Improving Reasoners.

Both expert human problem solvers and successful language models employ four key cognitive behaviors

1. verification (systematic error-checking),

2. backtracking (abandoning failing approaches),

3. subgoal setting (decomposing problems into manageable steps), and

4. backward chaining (reasoning from desired outcomes to initial inputs).

Some language models naturally exhibits these reasoning behaviors and exhibit substantial gains, while others don't and quickly plateau.

The presence of reasoning behaviors, not the correctness
of answers is the critical factor. Models with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions.

It seems that the presence of cognitive behaviors enables self-improvement through RL.

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
arxiv.org/abs/2503.01307


arXiv logo
arXiv.orgCognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRsTest-time inference has emerged as a powerful paradigm for enabling language models to ``think'' longer and more carefully about complex challenges, much like skilled human experts. While reinforcement learning (RL) can drive self-improvement in language models on verifiable tasks, some models exhibit substantial gains while others quickly plateau. For instance, we find that Qwen-2.5-3B far exceeds Llama-3.2-3B under identical RL training for the game of Countdown. This discrepancy raises a critical question: what intrinsic properties enable effective self-improvement? We introduce a framework to investigate this question by analyzing four key cognitive behaviors -- verification, backtracking, subgoal setting, and backward chaining -- that both expert human problem solvers and successful language models employ. Our study reveals that Qwen naturally exhibits these reasoning behaviors, whereas Llama initially lacks them. In systematic experimentation with controlled behavioral datasets, we find that priming Llama with examples containing these reasoning behaviors enables substantial improvements during RL, matching or exceeding Qwen's performance. Importantly, the presence of reasoning behaviors, rather than correctness of answers, proves to be the critical factor -- models primed with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions. Finally, leveraging continued pretraining with OpenWebMath data, filtered to amplify reasoning behaviors, enables the Llama model to match Qwen's self-improvement trajectory. Our findings establish a fundamental relationship between initial reasoning behaviors and the capacity for improvement, explaining why some language models effectively utilize additional computation while others plateau.