Follow

Whenever anyone is talking about "machine learning" I usually just mentally replace it with "statistics" and it works just as well.

At least we all know that statistics always lies, statistics requires good input data to get good conclusions, and statistics can be distorted to come up with almost any conclusion you want.

And it really is just statistics, but rebranded.

@JordiGH And, as has been proven time and time again, is entirely subject to the prejudices of the person creating it. If we ever did create true, sci-fi-like AIs (which I truly believe is an idiotic notion), they'd be racist af.

@JordiGH this kind of assumes people understand statistics better than machine learning

@arxivfever I mostly just want them to distrust it the same way, not necessarily understand it.

@JordiGH machine learning is a lot less careful (even less) about what they are actually measuring

@JordiGH The whole point of the 'learning' part of 'machine learning' is the adaptation of new information.

I imagine there are static statistical models that dont fair well to an ever-changing environment.

ML heavily relies on statistics, dgmw, but to say one is a rebrand of the other is just kinda boneheaded imo

@JordiGH
> statistics can be distorted to come up with almost any conclusion you want

well yeah, lying is always fun

@somarasu It's not that adaptive. It's usually just a regression with a specific kind of function you're fitting. We've been doing regression for at least a hundreds of years. When you get new data you do a regression again. That's also at least a hundred years old.

We do have more computers now, but the fundamental process isn't different.

@JordiGH You're right, but my understand of ML is that that regression happens 'dynamically' (in this context i take that to mean 'the system can ascertain when that regression needs to occur')

@JordiGH my point was that one needs the other, but they ought not be treated *the same*

@somarasu I suppose there's some degree of adaptability, but I do see a lot of people "train" a model once (or as I like to call it, compute a local optimum) and use the parameters to predict for a long time without refitting the model to new data.

I guess Kálmán filters and the like let you make small and continual improvements to your model, but they aren't an essential part of what most people call machine learning, are they?

@JordiGH And i'd say that *that's* also a bonehead move. Models change as data -- and time -- changes. You cant train a model once, dust your collar and go "i did it".

In my world, machine learning evolves just as much as human learning does. Perhaps not at the exact same pace, but there needs to be the same amount of discipline insomuch as "shit can change tomorrow, maybe we oughta adapt our models to reflect that".

But i dunno, im just a JS n-word who dont no nuthin, boss.

@JordiGH They made a Disney movie about 'Heuristics' one time didnt they?! I saw that movie it was a good movie!

@somarasu Maybe I would like machine learning more if I knew of more ethical applications of it.

@JordiGH Well, shit! That's incredibly apropos of Statistics!

They used """StAtiStiCs""" when they were infecting negros with Syphilis for what turned out to be "no real fucking reason".

If you're looking for the ethical application of either, you're gonna have to think of that yourself

@somarasu I believe my intent in my original toot was to equally smear both the same, to make people realise they're both the same kind of lies.

@somarasu Also, that human experimentation crap is horrific, yeah, I've heard about it.

Every bigot always fancies themself a scientist and a statistician.

@JordiGH It's not the tools it's their application.

I think that's what you were getting at. Which, absolutely -- you're right. But dont let the application cloud your judgement as far as the tools go. That's all i was getting at.

@JordiGH And they arent lies, they're tool. If you wanna paint them as lies, fine, but it's my job to let you know you're full of it, if that's the case

@somarasu Yeah, I get that, it's just a tool, but when so many people use the tool for harm, I do start questioning the tool itself.

I have been on the lookout for ethical machine learning jobs, and they just seem so hard to come by. It's all ads or tracking or surveillance or worse.

My last job was using it for detecting brain lesions in MS patients, and that was fun and good, but I haven't found anything comparable since.

@somarasu Did you see the guys that reinvented phrenology? Wow! Much machine! Such learn!

faception.com/

And this isn't the only company of the sort.

@JordiGH has anyone ever said to you: "you that you cant see the forest for the trees"?

@somarasu Yeah, Marilyn Manson. The next lyric is "you can't smell your own shit on your knees".

But really, if you have a good job idea, let me know.

@somarasu I mean, Gauss was doing "machine learning" to predict planetary trajectories. That's almost 200 years old by now.

@JordiGH I wouldnt equate Gauss with 'machine learning' in the same way i wouldnt equate JSB with 'electronic music'

@JordiGH it's a cool new approach for computing statistics - but it's still statistics.

@JordiGH

If you publish something about it, it's artificial intelligence.

If you want to impress your investors, it's machine learning.

If you want to hire people to do it, it's Bayesian statistics.

If you are actually programming it, it's just python.

@JordiGH the longstanding joke is that it's AI when you're pitching it, ML when you're hiring for it and linear regressions when you're implementing it.

@JordiGH

I, a professional statistician, rise respectfully to challenge your propagation of misinformation about what statisticians are and do.

Do people manipulate numbers? Yes. Those people do not have Master's degrees or PhD's in statistics, and what they're doing is not something that any statistician would recognize.

The discovery of gravity waves by fitting a 17-parameter nonlinear model to data (using Bayesian methods)? Developed by statisticians (physicists developed the model of course). Survey sampling (why we know the US census undercounts!), optimizing processes through efficient experimentation, designing Covid-19 vaccine (and hydroxychloroquine) trials--statisticians.

Misleading use of numbers--we don't have time for that actually (and could ruin our career).

Pick a random academic statistics department and a random professor's bio to find out what people actually do.

I'm puzzled that your profile professes an interest in math (yay!) yet assert that statisticians always lie. Apparently there's a mathematical discipline that you're not very familiar with--check us out!

@JimG No, I do know some stats. And I was being hyperbolic about statisticians always lying. I was relying on the popular conception of deceiving statistics (raw numbers, not the practice of inferential statistics) being reported in the media.

I know basically all science is done with statistics. A lot of pseudoscience too.

For example, modern phrenology.

faception.com/

@JimG @JordiGH similarly to this statistician over here, I would like to respectfully rise (😀) to challenge your propagation of misinformation of what machine learners do.

While the statisticians are busy fitting 17 parameter models to discover gravitational waves, machine learners are busy fitting 17 Billion parameter models to produce text that is often indistinguishable from human-written.

(The 17 B number wasn't exaggeration by the way, the numbers just aligned perfectly here).

@JordiGH @JimG
I guess if you really want, you can force this opinion. It's not technically wrong, it's just a useless way to think about it. It's like saying that biology is the same as chemistry because all biological entities consist of atoms.

@maltimore @JimG It's not, it's really not any different. Regression is an old topic, invented by statisticians. Cross-validation is an old topic, invented by statisticians. Optimisation (or "learning") is an old topic, mostly invented by operations research but also used by statisticians.

Machine learning has taken statistical tools, packaged them in software, and given themselves fancy names in order to attract more money and distance themselves from their intellectual ancestors.

@maltimore @JimG It's a silly marketing term, just like "dynamic programming".

@maltimore @JimG And usually, fewer attempts to formally justify the mathematics.

Just because your model has 17 billion parameters instead of 17 doesn't mean you should stop trying to write proofs about it, although in practice that's what happens, nobody knows how it works. It's just hidden voodoo we attribute to the wisdom of the machine, who has learned.

@JordiGH @JimG
you've had a statistician tell you you were wrong about the statistics part, now I (Machine Learning PhD student) tell you that you're wrong on the machine learning part..

You even use terms in an underspecific way. For instance, what do you even mean by regression? Do you mean linear regression, or do you mean regression as in regression vs. classification?

Anyways, I think we should put the discussion to rest as, nobody here will change their mind.

@maltimore @JimG Regression as opposed to interpolation: approximation of many data points with a function that goes as close as possible through those points without necessarily going through them. All of those kinds were invented long ago, but machine learning came up with a few new different functions to use as approximations, but not a fundamentally different method.

@maltimore @JimG The statistician objected to statistics being used misleadingly. The machine learner objected to being called statistics. These are more tribal distinctions than a description of the methods.

If it's a matter of throwing university degrees around, my masters in applied mathematics tells me you're both full of it and should stop acting like you're so different.

@maltimore @JimG And what do you mean by regression vs classification? The usual thing is to perform a regression and then just take a threshold of some values to classify, right? Or do you have other kinds of classification in mind, such as clustering?

@maltimore @JimG Oh, I see, looks like the established practice is to not even call it a regression if you happen to fit, say, a neural network to some data and then just take thresholds of the outputs to classify, looks like. Do I have that right?

@maltimore Also, if you are involved in good and ethical applications of machine learning that aren't ads, spying, emotional/psychological manipulation or worse, I'd be quite interested in knowing what you're using it for.

@JordiGH
I already said that it's technically true that ML is statistics, but that that is not a useful way to think about it. I think I'll stick with this statement and unsubscribe from this thread.

@JordiGH @maltimore
Sorry for the late response, I've been offline for several days.

Regarding what is stats and what is AI...in my opinion I'm afraid this is more cultural than principled. On the one hand anyone who creates a device that makes decisions based on variable data is doing something statistical. Such devices should be objects of study by statisticians (not that we should wholly own it). On the other hand, a majority of statisticians have (I think) unilaterally decided that statistical models have additive components, and voluntarily ceded AI to computer science. There are notable exceptions (e.g. Breiman, Friedman). I agree with you that a neural net is regression but I think most statisticians would not, and would then struggle to find a meaningful distinction. Incidentally, I fit a single-hidden-layer nnet last week, because I needed a quick flexible surface smoother and that fit the bill.

This is even now the object of much discussion in the statistics community.

So I'm defending my discipline's ethics generally (we don't lie, don't twist numbers), but on AI, I'm the first to criticize us for failing to embrace it and characterize it.

Sign in to participate in the conversation
Mathstodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!