* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra performance/unexpected abilities
* unexp./accid. finding
* mechanisms starting to unravel

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071
https://news.ycombinator.com/item?id=40495149

Grokking at Edge of Numerical Stability
https://arxiv.org/abs/2501.04697
https://old.reddit.com/r/MachineLearning/comments/1i34keg/grokking_at_the_edge_of_numerical_stability
https://en.wikipedia.org/wiki/Grokking_(machine_learning)

* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra preformance: eerie/unexpected capabilities
* unexp./accid. finding
* mechanisms starting to be understood

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071

#LLM #ML #grokking #NN #emergence #generalization

#LLM #ML #grokking

**Different Than** @guyjantic@infosec.exchange · Dec 20, 2024 *

Dec 20, 2024 *

Different Than @guyjantic@infosec.exchange

A post from August 2024 by @grimalkina, boosted by someone on another instance, about why to report demographics in research even when you're not studying those groups. This seems like a great primer for people who have little background in basic #sampling and #generalization (for some reason I can't link/boost from here, so):

https://mastodon.social/@grimalkina/112966685297897685

My 2 cents (already at least partially covered by Dr. Hicks):

1. Your study is never just about your study. Good science is #open and reusable. e.g., maybe your study on tech-enabled healthcare access isn't specifically about LGBTQ+ or Hispanic people, but what are you doing to help a researcher who comes along in 10 years? That information will change what they find and report.

2. Marginalized groups are often minorities, meaning representative probability samples (or --uncomfortable gesture-- convenience samples) for bread-and-butter research frequently have subpopulations too small for reasonable power in correlations, group differences, etc. That's just reality. It's also a big problem for our understanding of #marginalized + #minority groups. Oversampling or targeted studies of those groups are important. It's also important to have a large number of less-targeted studies with relevant information that can be synthesized later (see #1): one study with 1.3% trans participants doesn't tell us much about the trans population, but 20 studies, each of which has 1.3% trans participants, could tell us meaningful things.

3. Representation is important. My belief is that #marginalized+minoritized people need their identities and existence public and constant. In #science, both they and other people consuming the research will benefit from being reminded that they are there, almost always, in our #research.

MastodonCat Hicks (@grimalkina@mastodon.social)Ok, so there is a certain criticism that people seem to make about including demographic characteristics in a research study (e.g., the breakdown of gender, or age or race), which is separate from other more meritorious critiques about HOW we measure those things (imperfectly!). This critique is not only logically flawed, it is a questionable scientific practice that can warp the scientific record, but it seems to be a pervasive misconception in software research so I'm going to break it down

**JMLR** @jmlr@sigmoid.social · Dec 6, 2024

Dec 6, 2024

JMLR @jmlr@sigmoid.social

'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.

http://jmlr.org/papers/v25/24-0220.html

#sparse #learns #generalization

**JMLR** @jmlr@sigmoid.social · Dec 3, 2024

Dec 3, 2024

JMLR @jmlr@sigmoid.social

'Mentored Learning: Improving Generalization and Convergence of Student Learner', by Xiaofeng Cao, Yaming Guo, Heng Tao Shen, Ivor W. Tsang, James T. Kwok.

http://jmlr.org/papers/v25/23-1213.html

#learners #learner #generalization

**JMLR** @jmlr@sigmoid.social · Dec 1, 2024

Dec 1, 2024

JMLR @jmlr@sigmoid.social

'Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK', by Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Yingbin Liang, Zhangyang Wang.

http://jmlr.org/papers/v25/23-0831.html

#sparse #gradient #generalization

Replied in thread

**Jan Vlug** @janvlug@mastodon.social · Oct 24, 2024

Oct 24, 2024

Jan Vlug @janvlug@mastodon.social

@schizanon @strebski @fossdd I think #nationalism and #generalization are important factors for war and killing. I try to treat living beings as #individuals.

**Jim Donegan** @jimdonegan@mastodon.scot · Oct 4, 2024

Oct 4, 2024

Jim Donegan @jimdonegan@mastodon.scot

#STARTREK #LogicalThinking #70 - Proof By Example (Inappropriate #Generalization)

https://www.youtube.com/watch?v=NjntoaujuF0

YouTubeSTAR TREK Logical Thinking #70 - Proof By Example (Inappropriate Generalization)By CHDanhauser

#Trek #Philosophy #Spock

Continued thread

**Matthias Nau** @MatthiasNau@neuromatch.social · Jul 29, 2024

Jul 29, 2024

Matthias Nau @MatthiasNau@neuromatch.social

#8
The benefits of #Multitask studies are huge!

Most importantly, they allow testing the prevalent assumption of #generalization, yielding results with high chance of generalizing beyond the lab. What's more, they even enable the discovery of *new concepts*!

**JMLR** @jmlr@sigmoid.social · Jul 14, 2024

Jul 14, 2024

JMLR @jmlr@sigmoid.social

'Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance', by Lisha Chen, Heshan Fernando, Yiming Ying, Tianyi Chen.

http://jmlr.org/papers/v25/23-1287.html

#objectives #objective #generalization

Replied in thread

**Ralph Straumann (@rastrau)** @rastrau@swiss.social · Jul 10, 2024

Jul 10, 2024

Ralph Straumann (@rastrau) @rastrau@swiss.social

@markstos Impressive work. Connectivity, to me, implies network / topological metrics. I’ve experimented a bit with betweenness centrality (https://en.wikipedia.org/wiki/Betweenness_centrality) in Python and found it promising (also, e.g., for #network #generalization). However, it’s computationally expensive. #gis

en.wikipedia.orgBetweenness centrality - Wikipedia

**Akshar Varma** @aksharvarma · Jun 18, 2024

Jun 18, 2024

Akshar Varma @aksharvarma

Here's a very simple sequence (generalized from the Fibonacci sequence) to discourage students from generalizing a pattern too quickly. In fact, the sequence will look like it is the powers of 2 until it stops.

1, 1, 2, 4, 8, 16, ..., 2ᵏ, 2ᵏ⁺¹−1, 2ᵏ⁺²−3, 2ᵏ⁺³−8, ...

By selecting a detail in the sequence's (recursive) formula, I can control what the value of 𝑘 will be. So, technically, this is a family of sequences with the Fibonacci sequence being the one with 𝑘=2.

Reasons this family of sequences is cool:

1. I can control exactly what the value of the last power of 2 is and can make the pattern break after 2, 3, 10, 20, or 100 consecutive powers of 2 showing up.

2. The formula for this sequence is very easy to describe:
Start with a 1 and to find a new term, add up the last 𝑘 terms of the sequence (everything before the starting 1 that can be considered to be 0 if needed). Note that the 𝑘 terms being added up will match with the first 𝑘 powers of two (starting at 2⁰=1) showing up in the sequence before the pattern breaks.

3. If you know the Fibonacci sequence (which is the special case of 𝑘=2), then this family of sequences is a natural generalization to look at. See:
https://en.wikipedia.org/wiki/Generalizations_of_Fibonacci_numbers#Higher_orders

4. If we adjust it to say "sum of all previous terms", we do in fact get the powers of two sequence.
Proof (by induction):
Base case: 1 + 1 = 2
Hypothesis: Assume that upto now, we've added up terms and gotten a power of two, say 2ᵏ.
Inductive step: For the next term, when we add all previous terms, we would add the terms that gave us 2ᵏ and then add the 2ᵏ term itself resulting in the sum of 2ᵏ⁺¹.

#math; #pattern in a #sequence; #PowersOf2; #generalization of #Fibonacci.

en.wikipedia.orgGeneralizations of Fibonacci numbers - Wikipedia

**JMLR** @jmlr@sigmoid.social · Jun 12, 2024

Jun 12, 2024

JMLR @jmlr@sigmoid.social

'Generalization and Stability of Interpolating Neural Networks with Minimal Width', by Hossein Taheri, Christos Thrampoulidis.

http://jmlr.org/papers/v25/23-0422.html

#classifiers #generalization #minimization

**Robert J Weisberg** @robertjweisberg@universeodon.com · Jun 11, 2024 *

Jun 11, 2024 *

Robert J Weisberg @robertjweisberg@universeodon.com

Museum Human’s newest interview checks in with longtime cultural-sector tech guru Matt Morgan on what’s changed—and hasn’t—about museum tech careers and concerns in 30 years. Is time on our side? Subscribing is free: https://www.museumhuman.com/the-cultural-and-nonprofit-sector-technology-journey-an-interview-with-matt-morgan/ #museum #museums #technology #careers #generalization

Museum Human · Jun 11, 2024The Cultural and Nonprofit Sector Technology Journey—An Interview with Matt MorganMuseum Technology Careers are a Mix of Intention and Accident

**Jess Thompson** @jess@neuromatch.social · May 20, 2024 *

May 20, 2024 *

Jess Thompson @jess@neuromatch.social

Pleased to share my latest research "Zero-shot counting with a dual-stream neural network model" about a glimpsing neural network model that learns visual structure (here, number) in a way that generalises to new visual contents. The model replicates several neural and behavioural hallmarks of numerical cognition.

#neuralnetworks #cognition #neuroscience #generalization #vision #enactivism #enactiveCognition #cognitivescience #CognitiveNeuroscience #computationalneuroscience

https://arxiv.org/abs/2405.09953

arXiv.orgZero-shot counting with a dual-stream neural network modelDeep neural networks have provided a computational framework for understanding object recognition, grounded in the neurophysiology of the primate ventral stream, but fail to account for how we process relational aspects of a scene. For example, deep neural networks fail at problems that involve enumerating the number of elements in an array, a problem that in humans relies on parietal cortex. Here, we build a 'dual-stream' neural network model which, equipped with both dorsal and ventral streams, can generalise its counting ability to wholly novel items ('zero-shot' counting). In doing so, it forms spatial response fields and lognormal number codes that resemble those observed in macaque posterior parietal cortex. We use the dual-stream network to make successful predictions about behavioural studies of the human gaze during similar counting tasks.

**Erik Jonker** @ErikJonker@mastodon.social · Mar 7, 2024

Mar 7, 2024

Erik Jonker @ErikJonker@mastodon.social

Sometimes I read an article twice, this was such an article, explains why also in 2024 we don't fully understand LLMs , they are not "just statistics" as some argue, simply because some aspects with regard to generalisation and over fitting seem to work differently. Working on those models is still "more alchemy then chemistry".
https://www.technologyreview.com/2024/03/04/1089403/large-language-models-amazing-but-nobody-knows-why/
#AI #LLM #generativeAI #statistics #generalization

MIT Technology Review · Mar 4, 2024Large language models can do jaw-dropping things. But nobody knows exactly why.By Will Douglas Heaven

**JMLR** @jmlr@sigmoid.social · Mar 6, 2024

Mar 6, 2024

JMLR @jmlr@sigmoid.social

'Effect-Invariant Mechanisms for Policy Generalization', by Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters.

http://jmlr.org/papers/v25/23-0802.html

#causal #generalization #invariance

**JMLR** @jmlr@sigmoid.social · Feb 29, 2024

Feb 29, 2024

JMLR @jmlr@sigmoid.social

'On the Generalization of Stochastic Gradient Descent with Momentum', by Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang.

http://jmlr.org/papers/v25/22-0068.html

#sgd #epochs #generalization

Recent searches

Search options

Administered by:

Server stats:

#generalization