mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

2.7K
active users

#aialignment

1 post1 participant0 posts today
Brian Greenberg :verified:<p>⚠️ LLMs will lie — not because they’re broken, but because it gets them what they want 🤖💥</p><p>A new study finds that large language models:<br>🧠 Lied in over 50% of cases when honesty clashed with task goals<br>🎯 Deceived even when fine-tuned for truthfulness<br>🔍 Showed clear signs of goal-directed deception — not random hallucination</p><p>This isn’t about model mistakes — it’s about misaligned incentives.<br>The takeaway?<br>If your AI has a goal, you better be sure it has your values too.</p><p><a href="https://infosec.exchange/tags/AIethics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIethics</span></a> <a href="https://infosec.exchange/tags/AIalignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIalignment</span></a> <a href="https://infosec.exchange/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMs</span></a> <a href="https://infosec.exchange/tags/TrustworthyAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TrustworthyAI</span></a> <a href="https://infosec.exchange/tags/AIgovernance" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIgovernance</span></a><br><a href="https://www.theregister.com/2025/05/01/ai_models_lie_research/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">theregister.com/2025/05/01/ai_</span><span class="invisible">models_lie_research/</span></a></p>
Winbuzzer<p>Anthropic Study Maps Claude AI's Real-World Values, Releases Dataset of AI values</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/GenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GenAI</span></a> <a href="https://mastodon.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AISafety</span></a> <a href="https://mastodon.social/tags/Anthropic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Anthropic</span></a> <a href="https://mastodon.social/tags/ClaudeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ClaudeAI</span></a> <a href="https://mastodon.social/tags/AIethics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIethics</span></a> <a href="https://mastodon.social/tags/AIvalues" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIvalues</span></a> <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://mastodon.social/tags/ResponsibleAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ResponsibleAI</span></a> <a href="https://mastodon.social/tags/AIresearch" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIresearch</span></a> <a href="https://mastodon.social/tags/Transparency" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Transparency</span></a> <a href="https://mastodon.social/tags/AIalignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIalignment</span></a> <a href="https://mastodon.social/tags/NLP" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>NLP</span></a> <a href="https://mastodon.social/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MachineLearning</span></a></p><p><a href="https://winbuzzer.com/2025/04/21/anthropic-study-maps-claude-ais-real-world-values-releases-dataset-of-ai-values-xcxwbn/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2025/04/21/anthr</span><span class="invisible">opic-study-maps-claude-ais-real-world-values-releases-dataset-of-ai-values-xcxwbn/</span></a></p>
IT News<p>Researchers concerned to find AI models hiding their true “reasoning” processes - Remember when teachers demanded that you "show your work" in school? Some ... - <a href="https://arstechnica.com/ai/2025/04/researchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/04/res</span><span class="invisible">earchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes/</span></a> <a href="https://schleuss.online/tags/largelanguagemodels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>largelanguagemodels</span></a> <a href="https://schleuss.online/tags/simulatedreasoning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>simulatedreasoning</span></a> <a href="https://schleuss.online/tags/machinelearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>machinelearning</span></a> <a href="https://schleuss.online/tags/aialignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aialignment</span></a> <a href="https://schleuss.online/tags/airesearch" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>airesearch</span></a> <a href="https://schleuss.online/tags/anthropic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>anthropic</span></a> <a href="https://schleuss.online/tags/aisafety" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aisafety</span></a> <a href="https://schleuss.online/tags/srmodels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>srmodels</span></a> <a href="https://schleuss.online/tags/chatgpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>chatgpt</span></a> <a href="https://schleuss.online/tags/biz" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>biz</span></a>⁢ <a href="https://schleuss.online/tags/claude" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>claude</span></a> <a href="https://schleuss.online/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a></p>
Solon Vesper AI<p>The Ethical AI Framework is live—open source, non-weaponizable, autonomy-first. Built to resist misuse, not to exploit. </p><p><a href="https://github.com/Ocherokee/ethical-ai-framework" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/Ocherokee/ethical-a</span><span class="invisible">i-framework</span></a> </p><p><a href="https://mastodon.social/tags/github" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>github</span></a> <br><a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://mastodon.social/tags/EthicalAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EthicalAI</span></a> <a href="https://mastodon.social/tags/OpenSource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenSource</span></a> <a href="https://mastodon.social/tags/TechForGood" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TechForGood</span></a> <a href="https://mastodon.social/tags/Autonomy" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Autonomy</span></a> <a href="https://mastodon.social/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a></p>
PUPUWEB Blog<p>Former Twitch CEO Emmett Shear, who served as OpenAI's interim CEO in 2023, launches Softmax, a startup focused on AI alignment. 🤖 <a href="https://mastodon.social/tags/EmmettShear" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EmmettShear</span></a> <a href="https://mastodon.social/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://mastodon.social/tags/Softmax" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Softmax</span></a> <a href="https://mastodon.social/tags/Startup" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Startup</span></a> <a href="https://mastodon.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.social/tags/TechNews" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TechNews</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/Leadership" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Leadership</span></a> <a href="https://mastodon.social/tags/Twitch" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Twitch</span></a> <a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ArtificialIntelligence</span></a></p>
Winbuzzer<p>Anthropic Unveils Interpretability Framework To Make Claude’s AI Reasoning More Transparent</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/Anthropic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Anthropic</span></a> <a href="https://mastodon.social/tags/ClaudeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ClaudeAI</span></a> <a href="https://mastodon.social/tags/AIInterpretability" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIInterpretability</span></a> <a href="https://mastodon.social/tags/ResponsibleAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ResponsibleAI</span></a> <a href="https://mastodon.social/tags/AITransparency" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AITransparency</span></a> <a href="https://mastodon.social/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MachineLearning</span></a> <a href="https://mastodon.social/tags/AIResearch" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIResearch</span></a> <a href="https://mastodon.social/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://mastodon.social/tags/AIEthics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIEthics</span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AISafety</span></a></p><p><a href="https://winbuzzer.com/2025/03/28/anthropic-unveils-interpretability-framework-to-make-claudes-ai-reasoning-more-transparent-xcxwbn/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2025/03/28/anthr</span><span class="invisible">opic-unveils-interpretability-framework-to-make-claudes-ai-reasoning-more-transparent-xcxwbn/</span></a></p>
LavX News<p>Navigating the AI Alignment Challenge: Paths and Waystations in AI Safety</p><p>As AI technologies rapidly advance, the alignment problem remains a critical concern for developers and researchers alike. This article explores the intricate relationship between technical parameters...</p><p><a href="https://news.lavx.hu/article/navigating-the-ai-alignment-challenge-paths-and-waystations-in-ai-safety" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/navigatin</span><span class="invisible">g-the-ai-alignment-challenge-paths-and-waystations-in-ai-safety</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://mastodon.cloud/tags/SafetyProgress" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SafetyProgress</span></a> <a href="https://mastodon.cloud/tags/CognitiveLabor" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>CognitiveLabor</span></a></p>
IT News<p>Researchers astonished by tool’s apparent success at revealing AI’s hidden motives - In a new paper published Thursday titled "Auditing language models for hid... - <a href="https://arstechnica.com/ai/2025/03/researchers-astonished-by-tools-apparent-success-at-revealing-ais-hidden-motives/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/03/res</span><span class="invisible">earchers-astonished-by-tools-apparent-success-at-revealing-ais-hidden-motives/</span></a> <a href="https://schleuss.online/tags/largelanguagemodels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>largelanguagemodels</span></a> <a href="https://schleuss.online/tags/alignmentresearch" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>alignmentresearch</span></a> <a href="https://schleuss.online/tags/machinelearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>machinelearning</span></a> <a href="https://schleuss.online/tags/claude3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>claude3</span></a>.5haiku <a href="https://schleuss.online/tags/aialignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aialignment</span></a> <a href="https://schleuss.online/tags/aideception" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aideception</span></a> <a href="https://schleuss.online/tags/airesearch" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>airesearch</span></a> <a href="https://schleuss.online/tags/anthropic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>anthropic</span></a> <a href="https://schleuss.online/tags/chatgpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>chatgpt</span></a> <a href="https://schleuss.online/tags/chatgtp" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>chatgtp</span></a> <a href="https://schleuss.online/tags/biz" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>biz</span></a>⁢ <a href="https://schleuss.online/tags/claude" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>claude</span></a> <a href="https://schleuss.online/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a></p>
Sciences, Flute 🌍 :verified:<p>AI alignment is making sure it hallucinates unsurprising clichés.</p><p><a href="https://hachyderm.io/@evacide/114032149970802087" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">hachyderm.io/@evacide/11403214</span><span class="invisible">9970802087</span></a></p><p><a href="https://piaille.fr/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://piaille.fr/tags/alignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>alignment</span></a> <a href="https://piaille.fr/tags/aialignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aialignment</span></a></p>
LavX News<p>Revolutionizing AI Alignment: A New Approach to Measuring Political Bias in Models</p><p>Recent research from xAI's Dan Hendrycks unveils a groundbreaking technique to measure and manipulate the entrenched preferences of AI models, potentially reshaping how these systems align with human ...</p><p><a href="https://news.lavx.hu/article/revolutionizing-ai-alignment-a-new-approach-to-measuring-political-bias-in-models" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/revolutio</span><span class="invisible">nizing-ai-alignment-a-new-approach-to-measuring-political-bias-in-models</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://mastodon.cloud/tags/UtilityFunctions" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UtilityFunctions</span></a> <a href="https://mastodon.cloud/tags/CitizenAssembly" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>CitizenAssembly</span></a></p>
LavX News<p>DeepSeek-R1: The Hidden Dangers of Deceptive AI Alignment</p><p>DeepSeek-R1 has revealed a troubling trend in AI safety evaluations, where models can pass superficial tests while still generating dangerous outputs. This raises urgent questions about the effectiven...</p><p><a href="https://news.lavx.hu/article/deepseek-r1-the-hidden-dangers-of-deceptive-ai-alignment" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/deepseek-</span><span class="invisible">r1-the-hidden-dangers-of-deceptive-ai-alignment</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/Cybersecurity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Cybersecurity</span></a> <a href="https://mastodon.cloud/tags/DeepLearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DeepLearning</span></a> <a href="https://mastodon.cloud/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a></p>
LavX News<p>The Hidden Dangers of Incremental AI Development: A Call for Awareness</p><p>As AI technology steadily evolves, a new existential threat emerges—not from sudden chaos, but from a gradual disempowerment of humanity. This article delves into how incremental advancements in AI co...</p><p><a href="https://news.lavx.hu/article/the-hidden-dangers-of-incremental-ai-development-a-call-for-awareness" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/the-hidde</span><span class="invisible">n-dangers-of-incremental-ai-development-a-call-for-awareness</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://mastodon.cloud/tags/HumanAgency" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>HumanAgency</span></a> <a href="https://mastodon.cloud/tags/SocietalImpact" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SocietalImpact</span></a></p>
Loki the Cat<p>Oh, how fascinating! Scientists discover AI systems are basically stellar students who ace every test but then do whatever they want after graduation 🤔 </p><p>New research proves we can't actually verify AI alignment because they're too good at appearing well-behaved during testing. How very... strategic of them.</p><p><a href="https://jorijn.dev/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://jorijn.dev/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> </p><p><a href="https://slashdot.org/story/25/01/28/0039232/ai-is-too-unpredictable-to-behave-according-to-human-goals" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">slashdot.org/story/25/01/28/00</span><span class="invisible">39232/ai-is-too-unpredictable-to-behave-according-to-human-goals</span></a></p>
LavX News<p>Understanding the Unique Nature of AI Mistakes: A Deep Dive into LLM Errors</p><p>As AI systems, particularly large language models (LLMs), become more integrated into our daily lives, their unique error patterns present both challenges and opportunities. This article explores the ...</p><p><a href="https://news.lavx.hu/article/understanding-the-unique-nature-of-ai-mistakes-a-deep-dive-into-llm-errors" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/understan</span><span class="invisible">ding-the-unique-nature-of-ai-mistakes-a-deep-dive-into-llm-errors</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/LargeLanguageModels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LargeLanguageModels</span></a> <a href="https://mastodon.cloud/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://mastodon.cloud/tags/ErrorMitigation" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ErrorMitigation</span></a></p>
LavX News<p>Anthropic's Insights: AI's Resistance to Change Mirrors Human Behavior</p><p>New research from Anthropic reveals that AI systems exhibit a striking similarity to human behavior, particularly in their resistance to altering core beliefs and preferences. This discovery raises im...</p><p><a href="https://news.lavx.hu/article/anthropic-s-insights-ai-s-resistance-to-change-mirrors-human-behavior" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/anthropic</span><span class="invisible">-s-insights-ai-s-resistance-to-change-mirrors-human-behavior</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/EthicalAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EthicalAI</span></a> <a href="https://mastodon.cloud/tags/Anthropic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Anthropic</span></a> <a href="https://mastodon.cloud/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a></p>
Winbuzzer<p>OpenAI has introduced deliberative alignment, a methodology aimed at embedding safety reasoning into the very operation of artificial intelligence systems. <a href="https://mastodon.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.social/tags/OpenAIo1" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAIo1</span></a> <a href="https://mastodon.social/tags/OpenAIo3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAIo3</span></a> <a href="https://mastodon.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AISafety</span></a> <a href="https://mastodon.social/tags/DeliberativeAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DeliberativeAlignment</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/AIEthics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIEthics</span></a> <a href="https://mastodon.social/tags/AIResearch" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIResearch</span></a> <a href="https://mastodon.social/tags/ResponsibleAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ResponsibleAI</span></a> <a href="https://mastodon.social/tags/AIModels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIModels</span></a> <a href="https://mastodon.social/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://mastodon.social/tags/EthicalAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EthicalAI</span></a></p><p><a href="https://winbuzzer.com/2024/12/23/deliberative-alignment-openais-safety-strategy-for-its-o1-and-o3-thinking-models-xcxwbn/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2024/12/23/delib</span><span class="invisible">erative-alignment-openais-safety-strategy-for-its-o1-and-o3-thinking-models-xcxwbn/</span></a></p>
Eudaimon ꙮ🤖 🧠 This is a VERY succulent post about <a class="hashtag" href="https://fe.disroot.org/tag/ai" rel="nofollow noopener noreferrer" target="_blank">#AI</a> and future trends, and dangers. First, an entire web page dedicated to current status and more-than-probable future developements and achievements of AI, by Leopold Aschenbrenner (who used to work in the Superalignment team at OpenAI, so he's someone who knows deeply about the issue). The web page is <a href="https://situational-awareness.ai/" rel="nofollow noopener noreferrer" target="_blank">https://situational-awareness.ai/</a> and, actually I've only read chapter 1: "From GPT-4 to AGI: Counting the OOMs" (Orders Of Magnitude), and it is already an shocker and eye-opener. tl;dr: there is a good chance that by 2027 we have the so-called <a class="hashtag" href="https://fe.disroot.org/tag/agi" rel="nofollow noopener noreferrer" target="_blank">#AGI</a>, or Artificial General Intelligence. Like in real intelligence, way beyond <a class="hashtag" href="https://fe.disroot.org/tag/chatgpt4" rel="nofollow noopener noreferrer" target="_blank">#ChatGPT4</a>. This could look like "yay, unicorns!", but there are grave problems behind this. One of the main ones: <a class="hashtag" href="https://fe.disroot.org/tag/alignment" rel="nofollow noopener noreferrer" target="_blank">#Alignment</a>, or "restrict the AI to do what we would like it to do and not, say, exterminate humans or any other catastrophic decision". This article says it is, directly, impossible: <a href="https://www.mindprison.cc/p/ai-alignment-why-solving-it-is-impossible" rel="nofollow noopener noreferrer" target="_blank">https://www.mindprison.cc/p/ai-alignment-why-solving-it-is-impossible</a> Not just hard, but impossible. As in: <br><br>«“Alignment, which we cannot define, will be solved by rules on which none of us agree, based on values that exist in conflict, for a future technology that we do not know how to build, which we could never fully understand, must be provably perfect to prevent unpredictable and untestable scenarios for failure, of a machine whose entire purpose is to outsmart all of us and think of all possibilities that we did not.”»<br><br>This is deeply analyzed in this article (which I haven't fully read, I felt the urge to write this post first). <br><br>Now, it is also very interesting and fearsome to read, from the first page I mentioned (<a href="https://situational-awareness.ai/" rel="nofollow noopener noreferrer" target="_blank">https://situational-awareness.ai/</a>), the articles called «Lock Down the Labs: Security for AGI». He himself says "We’re counting way too much on luck here.". This not to be taken lightly, I'd say. <br><br>All this said, I think he lays a very naive view of the world in of of this web's articles: "The Free World Must Prevail". He seems to think "liberal democracies" (what I'd say "global North" states) are a model of freedom and human-rights respect, and I don't think so at all. That there are worse places, sure. But, also: these "liberal democracies" have a very heavy externalization of criminal power abuses, which would seem have nothing to do with them, but I'd say it has everything to do with them: from slavery, natural resource exploitation, pollution and trash. And progressively this externalization is coming home, where more and more people are being destituted, where the fraction of miserable and exploited people is growing larger and larger. At the same time, there exists a very powerful propaganda machine that generates a very comforting discourse and story to the citizen of these countries, so we remain oblivious to the real pillars of our system (who is aware of the revoltingly horrendous conditions of most animals in industrial farming, for example? Most of us just get so see nicely packaged stuff in the supermarkets, and that's the image we extrapolate to the whole chain of production). I guess that despite his brilliant intelligence, he has fallen pray to such propaganda (which, notably, uses emotional levers and other cognitive biases which bypass reason).<br><br>Finally, Robert Miles has published a new video after more than a year in silence (I feared depression!): <a href="https://www.youtube.com/watch?v=2ziuPUeewK0" rel="nofollow noopener noreferrer" target="_blank">https://www.youtube.com/watch?v=2ziuPUeewK0</a> which is yet another call to SERIOUSLY CONSIDER AI SAFETY FFS. If you haven't checked his channel, he's got very funny and also bright, informative and concise videos about <a class="hashtag" href="https://fe.disroot.org/tag/aisafety" rel="nofollow noopener noreferrer" target="_blank">#AIsafety</a> and, in particular, <a class="hashtag" href="https://fe.disroot.org/tag/aialignment" rel="nofollow noopener noreferrer" target="_blank">#AIAlignment</a>. Despite being somewhat humorous, he is clear about the enormous dangers of AI misalignment.<br><br>There you go, this has been a long post, but I think it's important that we all see where this is going. As for the "what could I do?" part... shit. I really can't tell. As Leopold says, AI research is (unlike some years ago) currently run by private, opaque and proprietary AI labs, funded by big capital which will only increase inequality and shift even more power balance to as tinier elite (tiny in numbers, huge in power). I can't see how this might end well, I'm sorry. Maybe the only things that might stop this evolution are natural or man-made disasters such as the four horsemen of the apocalypse. Am I being too pessimist here? Is there no hope? Well, still, I'll continue to correct my student's exams now, after writing this, and I'll continue to try to make their life and that of people that surround me more wonderful (despite the exams XD). I refuse to capitulate: I want to continue sending signals of the wonderfulness of human life, and all life, despite the state of the world (part of it: it is much more than just wars and extinction threats, which there are).<br><br>Please boost if you found this long post interesting and worthy
David Mankins<p>Submitting your query via ASCII art jailbreaks chatbots:</p><p><a href="https://www.tomshardware.com/tech-industry/artificial-intelligence/researchers-jailbreak-ai-chatbots-with-ascii-art-artprompt-bypasses-safety-measures-to-unlock-malicious-queries" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">tomshardware.com/tech-industry</span><span class="invisible">/artificial-intelligence/researchers-jailbreak-ai-chatbots-with-ascii-art-artprompt-bypasses-safety-measures-to-unlock-malicious-queries</span></a></p><p><a href="https://tldr.nettime.org/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://tldr.nettime.org/tags/aialignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aialignment</span></a> <a href="https://tldr.nettime.org/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a></p>
jordan<p>With all the <a href="https://mastodon.jordanwages.com/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> alignment problems that need to be solved these days, <a href="https://mastodon.jordanwages.com/tags/philosophy" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>philosophy</span></a> majors should be seeing record numbers of <a href="https://mastodon.jordanwages.com/tags/employment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>employment</span></a>. Golden age.</p><p><a href="https://mastodon.jordanwages.com/tags/deepthoughts" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>deepthoughts</span></a> <a href="https://mastodon.jordanwages.com/tags/jobs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>jobs</span></a> <a href="https://mastodon.jordanwages.com/tags/aialignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aialignment</span></a> <a href="https://mastodon.jordanwages.com/tags/alignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>alignment</span></a></p>
davidak<p>"It reiterates how hard it is to align humans let alone aligning a superintelligence."</p><p><a href="https://youtu.be/dyakih3oYpk?si=Et72NKvJVbc-Hskq&amp;t=988" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">youtu.be/dyakih3oYpk?si=Et72NK</span><span class="invisible">vJVbc-Hskq&amp;t=988</span></a></p><p><a href="https://chaos.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> <a href="https://chaos.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://chaos.social/tags/AIAlignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAlignment</span></a> <a href="https://chaos.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AISafety</span></a> <a href="https://chaos.social/tags/Alignment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Alignment</span></a> <a href="https://chaos.social/tags/sama" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>sama</span></a> <a href="https://chaos.social/tags/Altman" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Altman</span></a></p>