π0.5: A VLA with open-world generalization
#HackerNews #π0.5 #VLA #openworld #generalization #machinelearning #AI

π0.5: A VLA with open-world generalization
#HackerNews #π0.5 #VLA #openworld #generalization #machinelearning #AI
e509 — Maverick and Marbles
e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.
https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/
Pipeline release! nf-core/drugresponseeval v1.0.0 - 1.0.0!
Please see the changelog: https://github.com/nf-core/drugresponseeval/releases/tag/1.0.0
Grokking at Edge of Numerical Stability
https://arxiv.org/abs/2501.04697
https://old.reddit.com/r/MachineLearning/comments/1i34keg/grokking_at_the_edge_of_numerical_stability
https://en.wikipedia.org/wiki/Grokking_(machine_learning)
* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra performance/unexpected abilities
* unexp./accid. finding
* mechanisms starting to unravel
Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071
https://news.ycombinator.com/item?id=40495149
A post from August 2024 by @grimalkina, boosted by someone on another instance, about why to report demographics in research even when you're not studying those groups. This seems like a great primer for people who have little background in basic #sampling and #generalization (for some reason I can't link/boost from here, so):
https://mastodon.social/@grimalkina/112966685297897685
My 2 cents (already at least partially covered by Dr. Hicks):
1. Your study is never just about your study. Good science is #open and reusable. e.g., maybe your study on tech-enabled healthcare access isn't specifically about LGBTQ+ or Hispanic people, but what are you doing to help a researcher who comes along in 10 years? That information will change what they find and report.
2. Marginalized groups are often minorities, meaning representative probability samples (or --uncomfortable gesture-- convenience samples) for bread-and-butter research frequently have subpopulations too small for reasonable power in correlations, group differences, etc. That's just reality. It's also a big problem for our understanding of #marginalized + #minority groups. Oversampling or targeted studies of those groups are important. It's also important to have a large number of less-targeted studies with relevant information that can be synthesized later (see #1): one study with 1.3% trans participants doesn't tell us much about the trans population, but 20 studies, each of which has 1.3% trans participants, could tell us meaningful things.
3. Representation is important. My belief is that #marginalized+minoritized people need their identities and existence public and constant. In #science, both they and other people consuming the research will benefit from being reminded that they are there, almost always, in our #research.
'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.
http://jmlr.org/papers/v25/24-0220.html
#sparse #learns #generalization
'Mentored Learning: Improving Generalization and Convergence of Student Learner', by Xiaofeng Cao, Yaming Guo, Heng Tao Shen, Ivor W. Tsang, James T. Kwok.
http://jmlr.org/papers/v25/23-1213.html
#learners #learner #generalization
'Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK', by Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Yingbin Liang, Zhangyang Wang.
http://jmlr.org/papers/v25/23-0831.html
#sparse #gradient #generalization
@schizanon @strebski @fossdd I think #nationalism and #generalization are important factors for war and killing. I try to treat living beings as #individuals.
#STARTREK #LogicalThinking #70 - Proof By Example (Inappropriate #Generalization)
#8
The benefits of #Multitask studies are huge!
Most importantly, they allow testing the prevalent assumption of #generalization, yielding results with high chance of generalizing beyond the lab. What's more, they even enable the discovery of *new concepts*!
'Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance', by Lisha Chen, Heshan Fernando, Yiming Ying, Tianyi Chen.
http://jmlr.org/papers/v25/23-1287.html
#objectives #objective #generalization
@markstos Impressive work. Connectivity, to me, implies network / topological metrics. I’ve experimented a bit with betweenness centrality (https://en.wikipedia.org/wiki/Betweenness_centrality) in Python and found it promising (also, e.g., for #network #generalization). However, it’s computationally expensive. #gis
Here's a very simple sequence (generalized from the Fibonacci sequence) to discourage students from generalizing a pattern too quickly. In fact, the sequence will look like it is the powers of 2 until it stops.
1, 1, 2, 4, 8, 16, ..., 2ᵏ, 2ᵏ⁺¹−1, 2ᵏ⁺²−3, 2ᵏ⁺³−8, ...
By selecting a detail in the sequence's (recursive) formula, I can control what the value of 𝑘 will be. So, technically, this is a family of sequences with the Fibonacci sequence being the one with 𝑘=2.
Reasons this family of sequences is cool:
1. I can control exactly what the value of the last power of 2 is and can make the pattern break after 2, 3, 10, 20, or 100 consecutive powers of 2 showing up.
2. The formula for this sequence is very easy to describe:
Start with a 1 and to find a new term, add up the last 𝑘 terms of the sequence (everything before the starting 1 that can be considered to be 0 if needed). Note that the 𝑘 terms being added up will match with the first 𝑘 powers of two (starting at 2⁰=1) showing up in the sequence before the pattern breaks.
3. If you know the Fibonacci sequence (which is the special case of 𝑘=2), then this family of sequences is a natural generalization to look at. See:
https://en.wikipedia.org/wiki/Generalizations_of_Fibonacci_numbers#Higher_orders
4. If we adjust it to say "sum of all previous terms", we do in fact get the powers of two sequence.
Proof (by induction):
Base case: 1 + 1 = 2
Hypothesis: Assume that upto now, we've added up terms and gotten a power of two, say 2ᵏ.
Inductive step: For the next term, when we add all previous terms, we would add the terms that gave us 2ᵏ and then add the 2ᵏ term itself resulting in the sum of 2ᵏ⁺¹.
#math; #pattern in a #sequence; #PowersOf2; #generalization of #Fibonacci.
'Generalization and Stability of Interpolating Neural Networks with Minimal Width', by Hossein Taheri, Christos Thrampoulidis.
http://jmlr.org/papers/v25/23-0422.html
#classifiers #generalization #minimization
Museum Human’s newest interview checks in with longtime cultural-sector tech guru Matt Morgan on what’s changed—and hasn’t—about museum tech careers and concerns in 30 years. Is time on our side? Subscribing is free: https://www.museumhuman.com/the-cultural-and-nonprofit-sector-technology-journey-an-interview-with-matt-morgan/ #museum #museums #technology #careers #generalization
Pleased to share my latest research "Zero-shot counting with a dual-stream neural network model" about a glimpsing neural network model that learns visual structure (here, number) in a way that generalises to new visual contents. The model replicates several neural and behavioural hallmarks of numerical cognition.
#neuralnetworks #cognition #neuroscience #generalization #vision #enactivism #enactiveCognition #cognitivescience #CognitiveNeuroscience #computationalneuroscience
Sometimes I read an article twice, this was such an article, explains why also in 2024 we don't fully understand LLMs , they are not "just statistics" as some argue, simply because some aspects with regard to generalisation and over fitting seem to work differently. Working on those models is still "more alchemy then chemistry".
https://www.technologyreview.com/2024/03/04/1089403/large-language-models-amazing-but-nobody-knows-why/
#AI #LLM #generativeAI #statistics #generalization
'Effect-Invariant Mechanisms for Policy Generalization', by Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters.
http://jmlr.org/papers/v25/23-0802.html
#causal #generalization #invariance
'On the Generalization of Stochastic Gradient Descent with Momentum', by Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang.
http://jmlr.org/papers/v25/22-0068.html
#sgd #epochs #generalization