Teixi<p><a href="https://mastodon.social/tags/ACMPrize" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ACMPrize</span></a><br><a href="https://mastodon.social/tags/2024ACMPrize" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>2024ACMPrize</span></a><br><a href="https://mastodon.social/tags/ACMTuringAward" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ACMTuringAward</span></a></p><p><a href="https://mastodon.social/tags/AndrewBarto" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AndrewBarto</span></a><br><a href="https://mastodon.social/tags/RichardSutton" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RichardSutton</span></a> </p><p>» <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ReinforcementLearning</span></a><br>An Introduction<br>1998<br>standard reference...cited over 75,000<br>...<br>prominent example of <a href="https://mastodon.social/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a><br><a href="https://mastodon.social/tags/AlphaGo" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AlphaGo</span></a> victory<br>over best human <a href="https://mastodon.social/tags/Go" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Go</span></a> players<br>2016 2017<br>....<br>recently has been the development of the chatbot <a href="https://mastodon.social/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ChatGPT</span></a><br>...<br>large language model <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> trained in two phases ...employs a technique called<br>reinforcement learning from human feedback <a href="https://mastodon.social/tags/RLHF" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RLHF</span></a> «</p><p>aka cheap labor unnamed in papers</p><p><a href="https://awards.acm.org/about/2024-turing" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">awards.acm.org/about/2024-turi</span><span class="invisible">ng</span></a></p><p>2/2</p>