Christian Lawson-Perfect @christianp

0 posts0 participants0 posts today

**Leshem Choshen** @LChoshen@sigmoid.social · Oct 13, 2023

Oct 13, 2023

Leshem Choshen @LChoshen@sigmoid.social

Back in the days of 2021
there was a lovely evaluation paper:
Automatically identifying label errors
Improving score's reliability
Finding example's difficulty
Active Learning

https://aclanthology.org/2021.acl-long.346/

@par @hoyle
#machinelearning #evaluation #IRT #LLM #deepRead

**Leshem Choshen** @LChoshen@sigmoid.social · Aug 30, 2023

Aug 30, 2023

Leshem Choshen @LChoshen@sigmoid.social

Did you know:
Evaluating a single model on HELM took
4K GPU hours or +10K$ in API calls?!
Flash-HELM️can reduce costs by X200!
https://arxiv.org/abs/2308.11696

#deepRead #machinelearning #evaluation

**Leshem Choshen** @LChoshen@sigmoid.social · Aug 9, 2023

Aug 9, 2023

Leshem Choshen @LChoshen@sigmoid.social

The newFormer is introduced,
but what do we really know about it?

@ari and others
imagine a new large-scale architecture &
ask how would you interptret its abilities and behaviours
https://arxiv.org/abs/2308.00189
#deepRead #NLProc #MachineLearning

Continued thread

**Leshem Choshen** @LChoshen@sigmoid.social · Mar 20, 2023

Mar 20, 2023

Leshem Choshen @LChoshen@sigmoid.social

@mega Linear transformations can skip over layers, even till the end

We can see what the network thought!
We can stop generating at early layers!

https://arxiv.org/abs/2303.09435v1

#NLProc #deepRead

**Leshem Choshen** @LChoshen@sigmoid.social · Mar 20, 2023

Mar 20, 2023

Leshem Choshen @LChoshen@sigmoid.social

What's in a layer?

Representations are vectors
If only they were words...

Finding:
Any layer can be mapped well to another linearly
Simple, efficient & interpretable
& improves early exit

https://arxiv.org/abs/2303.09435v1
Story and
#nlproc #deepRead #MachinLearning

**Leshem Choshen** @LChoshen@sigmoid.social · Mar 15, 2023

Mar 15, 2023

Leshem Choshen @LChoshen@sigmoid.social

Mindblowing pretraining paradigm

Train the same model to predict the two directions separately
Better results, more parallelization

https://arxiv.org/abs/2303.07295
#deepRead #nlproc #pretraining #machinelearning

**Leshem Choshen** @LChoshen@sigmoid.social · Jan 23, 2023

Jan 23, 2023

Leshem Choshen @LChoshen@sigmoid.social

3 reasons for hallucinations started
only 2 prevailed

Finding how networks behave while hallucinating, they
filter hallucinations (with great success)

https://arxiv.org/abs/2301.07779
#NLProc #neuralEmpty #NLP #deepRead

**Otte Oldschool** @edudoc@mstdn.ca · Dec 27, 2022

Dec 27, 2022

Otte Oldschool @edudoc@mstdn.ca

I’ve just spent the morning going through the #mastodon news feed, and I thoroughly enjoyed it. I honestly can’t remember when the last time was I got completely immersed in behind-the-story analysis in this way. Well done, #mastodon, and thank you.
#mastodonnews #deepread #news #newsanalysis

**Leshem Choshen** @LChoshen@sigmoid.social · Dec 7, 2022

Dec 7, 2022

Leshem Choshen @LChoshen@sigmoid.social

What neurons determine agreement in multilingual LLMs?

#deepRead but some answers:
Across languages-2 distinct ways to encode syntax
Share neurons not info

Autoregressive have dedicated synt. neurons (MLM just spread across)

@amuuueller@twitter.com yu xia @tallinzen@twitter.com #conllLivetweet2022

Recent searches

Search options

Administered by:

Server stats:

#deepread