mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

2.7K
active users

#generalization

0 posts0 participants0 posts today
Continued thread

Grokking at Edge of Numerical Stability
arxiv.org/abs/2501.04697
old.reddit.com/r/MachineLearni
en.wikipedia.org/wiki/Grokking

* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra performance/unexpected abilities
* unexp./accid. finding
* mechanisms starting to unravel

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
arxiv.org/abs/2405.15071
news.ycombinator.com/item?id=4

#LLM#ML#grokking

A post from August 2024 by @grimalkina, boosted by someone on another instance, about why to report demographics in research even when you're not studying those groups. This seems like a great primer for people who have little background in basic #sampling and #generalization (for some reason I can't link/boost from here, so):

mastodon.social/@grimalkina/11

My 2 cents (already at least partially covered by Dr. Hicks):

1. Your study is never just about your study. Good science is #open and reusable. e.g., maybe your study on tech-enabled healthcare access isn't specifically about LGBTQ+ or Hispanic people, but what are you doing to help a researcher who comes along in 10 years? That information will change what they find and report.

2. Marginalized groups are often minorities, meaning representative probability samples (or --uncomfortable gesture-- convenience samples) for bread-and-butter research frequently have subpopulations too small for reasonable power in correlations, group differences, etc. That's just reality. It's also a big problem for our understanding of #marginalized + #minority groups. Oversampling or targeted studies of those groups are important. It's also important to have a large number of less-targeted studies with relevant information that can be synthesized later (see #1): one study with 1.3% trans participants doesn't tell us much about the trans population, but 20 studies, each of which has 1.3% trans participants, could tell us meaningful things.

3. Representation is important. My belief is that #marginalized+minoritized people need their identities and existence public and constant. In #science, both they and other people consuming the research will benefit from being reminded that they are there, almost always, in our #research.

MastodonCat Hicks (@grimalkina@mastodon.social)Ok, so there is a certain criticism that people seem to make about including demographic characteristics in a research study (e.g., the breakdown of gender, or age or race), which is separate from other more meritorious critiques about HOW we measure those things (imperfectly!). This critique is not only logically flawed, it is a questionable scientific practice that can warp the scientific record, but it seems to be a pervasive misconception in software research so I'm going to break it down

Here's a very simple sequence (generalized from the Fibonacci sequence) to discourage students from generalizing a pattern too quickly. In fact, the sequence will look like it is the powers of 2 until it stops.

1, 1, 2, 4, 8, 16, ..., 2ᵏ, 2ᵏ⁺¹−1, 2ᵏ⁺²−3, 2ᵏ⁺³−8, ...

By selecting a detail in the sequence's (recursive) formula, I can control what the value of 𝑘 will be. So, technically, this is a family of sequences with the Fibonacci sequence being the one with 𝑘=2.

Reasons this family of sequences is cool:

1. I can control exactly what the value of the last power of 2 is and can make the pattern break after 2, 3, 10, 20, or 100 consecutive powers of 2 showing up.

2. The formula for this sequence is very easy to describe:
Start with a 1 and to find a new term, add up the last 𝑘 terms of the sequence (everything before the starting 1 that can be considered to be 0 if needed). Note that the 𝑘 terms being added up will match with the first 𝑘 powers of two (starting at 2⁰=1) showing up in the sequence before the pattern breaks.

3. If you know the Fibonacci sequence (which is the special case of 𝑘=2), then this family of sequences is a natural generalization to look at. See:
en.wikipedia.org/wiki/Generali

4. If we adjust it to say "sum of all previous terms", we do in fact get the powers of two sequence.
Proof (by induction):
Base case: 1 + 1 = 2
Hypothesis: Assume that upto now, we've added up terms and gotten a power of two, say 2ᵏ.
Inductive step: For the next term, when we add all previous terms, we would add the terms that gave us 2ᵏ and then add the 2ᵏ term itself resulting in the sum of 2ᵏ⁺¹.

; in a ; ; of .

en.wikipedia.orgGeneralizations of Fibonacci numbers - Wikipedia

Sometimes I read an article twice, this was such an article, explains why also in 2024 we don't fully understand LLMs , they are not "just statistics" as some argue, simply because some aspects with regard to generalisation and over fitting seem to work differently. Working on those models is still "more alchemy then chemistry".
technologyreview.com/2024/03/0
#AI #LLM #generativeAI #statistics #generalization

MIT Technology Review · Large language models can do jaw-dropping things. But nobody knows exactly why.By Will Douglas Heaven