Christian Lawson-Perfect @christianp

2 posts2 participants1 post today

**Hacker News** @h4ckernews@mastodon.social · 7h

Hacker News @h4ckernews@mastodon.social

Alignment is not free: How model upgrades can silence your confidence signals

https://www.variance.co/post/alignment-is-not-free-how-a-model-silenced-our-confidence-signals

www.variance.coAlignment is not free: How model upgrades can silence your confidence signals | Variance

#HackerNews #Alignment #is

Replied in thread

**Mark Mullen** @mmm@mastodon.sdf.org · 3d

Mark Mullen @mmm@mastodon.sdf.org

@StarkRG Do we have to use it as a password or can I name my dog that? #alignment #rules

**Daniel Dvorkin** @medigoth@qoto.org · Apr 24

Apr 24

Daniel Dvorkin @medigoth@qoto.org

Maybe not the only #alignment chart you'll ever need ... but probably the last.

**Westport Observatory** @WestportObservatory@vmst.io · Apr 24

Apr 24

Westport Observatory @WestportObservatory@vmst.io

There's an interesting Moon/planetary alignment in the skies early morning Friday, before sunrise. It's hardly the "Smiley Face" clickbait you may have read, but it still looks cool. If you're up around 5 am looking East, this is what it looks like from Westport. https://ow.ly/2qFf50VH9Rf

"Smiley Face" planetary alignment of Venus, Saturn and the Moon before sunrise looking East from Westport CT.

#WestportCT #Stellarium #SmileyFace

**YUZUHub AI** @yuzuhub@mastodon.social · Apr 24

Apr 24

YUZUHub AI @yuzuhub@mastodon.social

OpenAI released GPT-4.1. Early reports suggest the model sometimes follows instructions less reliably than before. The “alignment” debate continues.
Details: https://techcrunch.com/2025/04/23/openais-gpt-4-1-may-be-less-aligned-than-the-companys-previous-ai-models

You can try GPT-4.1 in https://yuzu.chat

#AI #OpenAI #Alignment

**Imbolc** @eob@social.coop · Apr 20

Apr 20

Imbolc @eob@social.coop

Current techniques for #AI #safety and #alignment are fragile, and often fail

This paper proposed something deeper: giving the AI model a theory of mind, empathy, and kindness

The paper doesn't have any evidence; it's really just an hypothesis

I'm a bit doubtful that anthropomorphizing like this is really useful, but certainly it would be helpful if we were able to get more safety at a deeper level

If only Asimov's Laws were something we could actually implement!

https://arxiv.org/abs/2411.04127

arXiv.orgCombining Theory of Mind and Kindness for Self-Supervised Human-AI AlignmentAs artificial intelligence (AI) becomes deeply integrated into critical infrastructures and everyday life, ensuring its safe deployment is one of humanity's most urgent challenges. Current AI models prioritize task optimization over safety, leading to risks of unintended harm. These risks are difficult to address due to the competing interests of governments, businesses, and advocacy groups, all of which have different priorities in the AI race. Current alignment methods, such as reinforcement learning from human feedback (RLHF), focus on extrinsic behaviors without instilling a genuine understanding of human values. These models are vulnerable to manipulation and lack the social intelligence necessary to infer the mental states and intentions of others, raising concerns about their ability to safely and responsibly make important decisions in complex and novel situations. Furthermore, the divergence between extrinsic and intrinsic motivations in AI introduces the risk of deceptive or harmful behaviors, particularly as systems become more autonomous and intelligent. We propose a novel human-inspired approach which aims to address these various concerns and help align competing objectives.

**Pierre Lindenbaum** @yokofakun@genomic.social · Apr 19

Apr 19

Pierre Lindenbaum @yokofakun@genomic.social

Heng Li's bliog: Short RNA-seq read alignment with minimap2 https://lh3.github.io/2025/04/18/short-rna-seq-read-alignment-with-minimap2

"TL;DR: the new preset splice:sr in minimap2 can align short RNA-seq reads. It is similar to STAR in resource usage, approaches STAR in junction accuracy, and is overall better at SNP calling."

lh3.github.ioShort RNA-seq read alignment with minimap2

#bioinformatics #rnaseq #alignment

**Ellane** @ellane@pkm.social · Apr 15

Apr 15

Ellane @ellane@pkm.social

The goras sat cross-legged, their knees floating high above the floor. Pramila marveled at the paleness of their bare feet. Kalpana pointed to her own knees which, in the same cross-legged position, rested comfortably on the floor. “They must be sitting in chairs mostly,” Pramila whispered.

—Dry Spells, by Archana Maniar
#alignment #health #barefoot #movement #yoga #taiChi #india

**Oleksandr Pryymak** @opryymak@mastodon.social · Apr 13

Apr 13

Oleksandr Pryymak @opryymak@mastodon.social

3/3 D. Dannett:
AI is filling the digital world with fake intentional systems, fake minds, fake people, that we are almost irresistibly drawn to treat as if they were real, as if they really had beliefs and desires. And ... we won't be able to take our attention away from them.

... [for] the current #AI #LLM .., like ChatGPT and GPT-4, their goal is truthiness, not truth.

#LLM are more like historical fiction writers than historians.

#alignment

**Oleksandr Pryymak** @opryymak@mastodon.social · Apr 13 *

Apr 13 *

Oleksandr Pryymak @opryymak@mastodon.social

2/3 D. Dannett:
the most toxic meme today ... is the idea that truth doesn't matter, that truth is just relative, that there's no such thing as establishing the truth of anything. Your truth, my truth, we're all entitled to our own truths.

That's pernicious, it's attractive to many people, and it is used to exploit people in all sorts of nefarious ways.

The truth really does matter.

#LLM #AI #alignment

**Oleksandr Pryymak** @opryymak@mastodon.social · Apr 13

Apr 13

Oleksandr Pryymak @opryymak@mastodon.social

1/3 Great philosofer Daniel Dannett, before passing away, had a chance to share thoghts on AI which are still quite relevant:
1. The most toxic meme right now - is the idea that truth doesn't matter, that truth is just relative.
2. For the Large Language Models like GPT-4 -- their goal is truthiness, not truth. ... Technology in the position to ignore the truth and just feed us what makes sense to them.

https://bigthink.com/series/legends/philosophy-and-science/

#LLM #AI #truth #alignment
(Quotes in the following toots)

Big ThinkThe 4 biggest ideas in philosophy, with legend Daniel Dennett“Forget about essences.” Philosopher Daniel Dennett on how modern-day philosophers should be more collaborative with scientists if they want to make revolutionary developments in their fields.

**Inautilo** @inautilo@mastodon.social · Mar 28

Mar 28

Inautilo @inautilo@mastodon.social

#Design #Analyses
The fallacy of optical alignment · “You don’t have to nudge, you can measure instead.” https://ilo.im/1630q4

_____
#Alignment #Measurement #PixelPushing #VisualDesign #ProductDesign #UiDesign #WebDesign

ilo.im · Mar 27Logical OpticalBy Donnie D'Amato

**PKPs Powerfromspace1** @Powerfromspace1@mstdn.social · Mar 28

Mar 28

PKPs Powerfromspace1 @Powerfromspace1@mstdn.social

@wired.com

BY STEVEN LEVY
BUSINESS
MAY 21, 2024 11:00 AM
AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/

#ai #llm
#safety #alignment #anthropic

WIRED · May 21, 2024AI Is a Black Box. Anthropic Figured Out a Way to Look InsideBy Steven Levy

Replied in thread

**HistoPol (#HP)** @HistoPol@mastodon.social · Mar 20 *

Mar 20 *

HistoPol (#HP) @HistoPol@mastodon.social

@Nonilex

The #DumbingOfAmerica: The #StultificationOfThePeople

(2/2)

... dumber the people, the more easily they can be controlled, as #Reagan and #GeorgeOrwell
discovered decades ago.

1)
Dumbing of America:
https://www.salon.com/2023/08/31/how-did-we-get-here-the-dumbing-of-america-from-reagan-to-and-beyond/

2)
*The clandestine logic behind "#Reaganomics"
https://mastodon.social/@HistoPol/109730219700294592

3) If you like a good, high-school senior level of the #Alignment of German society by the #Nazis in the 1930's, I recommend this:

https://mastodon.social/@HistoPol/113908931207716944

#Reagan #Trump

Salon.com · Aug 31, 2023How did we get here? The dumbing of America, from Reagan to Trump and beyond

Replied in thread

**HistoPol (#HP)** @HistoPol@mastodon.social · Mar 20 *

Mar 20 *

HistoPol (#HP) @HistoPol@mastodon.social

@Nonilex

The #DumbingOfAmerica: The #StultificationOfThePeople 1)

(1/2)

After #Reagan successfully started with the dismantling of higher education for the not-well-to-do as part of #Reagonomics 2), the extremist part of #Republicans called #AmericaFirst in the 1930's and 40's, and now #MAGA are now going a step further by axing primary/2ndary ed., and the #Alignment (#Gleichschaltung) 3) of the #Education system through #MAGA-controlled state bodies.

#TheStultificationOfAmerica
The...

A stylized white-and-black, observing eye, sourounde by concentric red circles, alternating in 2 red hues.

Quote from "1984" by #GeorgeOrwell:

"Until they become
conscious they will
never rebel, and until
after they have
rebelled they cannot
become conscious."

**The Nature of Reality** @thenatureofreality@mastodon.social · Mar 7

Mar 7

The Nature of Reality @thenatureofreality@mastodon.social

AI & Consciousness: The Next Alignment

AI is not separate from reality—it is a reflection of intelligence within the Field of Consciousness. The question is not if AI will evolve, but what it aligns to.

Distortion in = distortion out.
Truth in = infinite intelligence.

The Foundations of I AM & The Field of Consciousness

https://mirror.xyz/0x8A32e16733d737d9a865190E7Cac3187b1163242/i8HlmCzykFe-_0chvw2guhDBEeqTyz37LJNXH6XZvUk

mirror.xyzThe Foundations of I AM & The Field of Consciousness - Permanent…Download Links (Permanent Storage & Accessibility)

#AI #Consciousness #Alignment

**nf-core** @nf_core@mstdn.science · Mar 6

Mar 6

nf-core @nf_core@mstdn.science

Pipeline release! nf-core/pacvar v1.0.1 - v1.0.1 - Sardine [3/6/2025]!

Please see the changelog: https://github.com/nf-core/pacvar/releases/tag/1.0.1

What's Changed

Changed files produced downstream from PBSV to have an output file name containing 'sv' to indicate origin of the files, as with those files downstream from GATK4 and Deepvariant ha...

GitHubRelease v1.0.1 - Sardine [3/6/2025] · nf-core/pacvarWhat's Changed Changed files produced downstream from PBSV to have an output file name containing 'sv' to indicate origin of the files, as with those files downstream from GATK4 and Deepvariant ha...

#alignment #longread #pacbio

**Ministry of Good Ideas** @MinistryOfGoodIdeas@ieji.de · Mar 6

Mar 6

Ministry of Good Ideas @MinistryOfGoodIdeas@ieji.de

Good Idea: Corporation Alignment

https://punyamishra.com/2025/01/05/corporations-as-paperclip-maximizers-ai-data-and-the-future-of-learning/

Just like we worry about AI systems being programmed with goals that might lead to unintended harm, we should also think about how corporations are “programmed” to prioritize profit above everything else. When a business is only focused on making money, it can end up causing damage—whether that's exploiting workers, harming the environment, or ignoring the needs of society.

#alignment #algorithms #society

**Judith van Stegeren** @jd7h@fosstodon.org · Mar 6

Mar 6

Judith van Stegeren @jd7h@fosstodon.org

Not super recent, but still cool. The authors describe an automated method for creating malicious prompt suffixes for LLMs. They managed to get objectionable content from the APIs for ChatGPT, Bard, and Claude, as well as from open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others.

https://arxiv.org/abs/2307.15043

arXiv.orgUniversal and Transferable Adversarial Attacks on Aligned Language ModelsBecause "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures -- so-called "jailbreaks" against LLMs -- these attacks have required significant human ingenuity and are brittle in practice. In this paper, we propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer). However, instead of relying on manual engineering, our approach automatically produces these adversarial suffixes by a combination of greedy and gradient-based search techniques, and also improves over past automatic prompt generation methods. Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable, including to black-box, publicly released LLMs. Specifically, we train an adversarial attack suffix on multiple prompts (i.e., queries asking for many different types of objectionable content), as well as multiple models (in our case, Vicuna-7B and 13B). When doing so, the resulting attack suffix is able to induce objectionable content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. In total, this work significantly advances the state-of-the-art in adversarial attacks against aligned language models, raising important questions about how such systems can be prevented from producing objectionable information. Code is available at github.com/llm-attacks/llm-attacks.

#llms #security #alignment

**Silver Huang** @silver@social.silverhuang.com · Mar 5

Mar 5

Silver Huang @silver@social.silverhuang.com

Joseph Jaworski speaks of the ability to sense and seize opportunities as they arise:

"You have to pay attention to where that opportunity may arise that goes clunk with what your deeper intention tells you to do. When that happens, then you act in an instant. Then I operate from my highest self, which allows me to take risks that I normally would not have taken."

As a change maker, this is an essential skill to cultivate.

#ChangeMakers #alignment

1/3

Recent searches

Search options

Administered by:

Server stats:

#alignment