Christian Lawson-Perfect @christianp

Recent searches

Search options

Only available when logged in.

**Jeremy Kun** @j2kun · Apr 29, 2023

Apr 29, 2023

This week on Saturday Morning Math Writer's Club, I've got an interview to prepare for with Rob Schapire, inventor of boosting in machine learning. I think this is going to be a great story about a preeminently practical result that came from pure theory. I have a slew of questions to ask Rob, but in the mean time I want to find examples of people using it in production.

Jeremy Kun @j2kun@mathstodon.xyz

Now I know there's a long list of Kaggle competition winners who use XGBoost: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions

But I don't consider this a production setting.

The same doc has a list of "use cases" and "integrations," but the only two that seem like they count are brief notes from the "Tencent data platform team" and the "autohome.com ad platform team." I will have to dig through the integrated tools to see if they list any compelling users.

GitHubxgboost/demo at master · dmlc/xgboostScalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow - xgboos...

Apr 29, 2023, 05:07 PM··Web

0boosts·1favorite

**Jeremy Kun** @j2kun · Apr 29, 2023

Apr 29, 2023

Jeremy Kun @j2kun

A friend also brought up the possibility of writing about cuckoo hashing, but I don't know of anyone who uses this in prod. Plus, it seems like a relatively minor upgrade over something like linear probing, so I'm not sure in what context this would be particularly useful.

**Jeremy Kun** @j2kun · Apr 29, 2023

Apr 29, 2023

Jeremy Kun @j2kun

Apparently boosted decision trees are popular in particle physics? Like LHC, identify the higgs boson kind of problems.

**Jeremy Kun** @j2kun · Apr 29, 2023

Apr 29, 2023

Jeremy Kun @j2kun

ok, will file away some particle physics stuff for later. In the mean time: anyone know of production systems that use boosting? Like, not just an ML framework that supports it, or a Kaggle competition that won with it, but an actual production system that uses it?

**Jeremy Kun** @j2kun · Apr 29, 2023

Apr 29, 2023

Jeremy Kun @j2kun

Got sidetracked about some network science applications. Apparently Twitter does a community detection routine via matrix factorization to help with tweet recommendations

**Jeremy Kun** @j2kun · Apr 29, 2023

Apr 29, 2023

Jeremy Kun @j2kun

Had to take a break to take the kid on the world's slowest bike ride around the block. Writing will continue during his nap.

**Jeremy Kun** @j2kun · Apr 29, 2023

Apr 29, 2023

Jeremy Kun @j2kun

I emailed an academic who published some papers in particle physics about the use of boosted decision trees. In an 05 paper he predicted they'd be widespread in use for particle physics data analysis. I asked if he felt that prediction had come true.

He responded with "here's what Chat gpg [sic] replied," followed by the sort of generic LLM text that is completely useless.

What a trashy decision.

**Jeremy Kun** @j2kun · Apr 30, 2023

Apr 30, 2023

Jeremy Kun @j2kun

He followed up with a quote from a book he wrote that was equally unhelpful, and better than chat gpt, but still. The gall.

**Blake C. Stacey** @bstacey@icosahedron.website · Apr 29, 2023

Apr 29, 2023

Blake C. Stacey @bstacey@icosahedron.website

@j2kun wat

**Carlos Scheidegger** @scheidegger@mastodon.social · Apr 29, 2023

Apr 29, 2023

Carlos Scheidegger @scheidegger@mastodon.social

@j2kun but also, “in a 2005 paper at journal xyz with doi abc, this douchebag…” well played lol

**theHigherGeometer** @highergeometer · Apr 30, 2023

Apr 30, 2023

theHigherGeometer @highergeometer

@j2kun that seems like a new version of a lmgtfy link. Certainly it's rude in response to your specific question. Somehow asking GPT is better than an actual human-written source found by a search engine? I'm amazed at how fast people are adopting this technique as if it's magic and always true.

**davidlowryduda** @davidlowryduda · May 1, 2023

May 1, 2023

davidlowryduda @davidlowryduda

@j2kun lmgpttfy is the next lmgtfy?
Boo them.

When I was a grad student, I was reading an arxiv preprint from an academic and got confused. I sent him an email (the first cold email I sent) about his paper, and he responded with roughly "I don't have time to talk to students. Bye."

That was also offputting, especially to young me.

**ksg** @ksg · May 1, 2023

May 1, 2023

ksg @ksg

@j2kun Yandex uses (or at least used) it a lot: see Matrixnet and Catboost

**0xDE** @11011110 · Apr 30, 2023

Apr 30, 2023

0xDE @11011110

@j2kun When cuckoo hashing might be better than linear probing: (1) when some kind of real-time response rate is needed and you're willing to pay two cache misses instead of one in order to guarantee that you will find your key in constant time, compared to the logarithmic worst case behavior of linear probing; (2) when you're dealing with adversarial input that will find the worst case keys of your hash table and repeatedly hit on them; (3) if you want all cache operations to take the same time as protection against timing attacks.

But I think in the vast majority of real-world applications none of these apply.

**Jeremy Kun** @j2kun · Apr 30, 2023

Apr 30, 2023

Jeremy Kun @j2kun

@11011110 thanks for the tips! I suspect you wrote the Wikipedia page on this, because it reads familiar :) anyway, I could see it being useful in some kind of global-scale something or other that Google or Facebook has, and the math analysis is nice, but unless the story is a bit richer I think it's not the best fit. But it could suffice if I'm struggling to fill out the book.

**0xDE** @11011110 · Apr 30, 2023

Apr 30, 2023

0xDE @11011110

@j2kun Cuckoo filters in place of Bloom filters might have greater practicality. So that's the other reason for talking about cuckoo hashing: so you can use it to explain cuckoo filters.

**0xDE** @11011110 · May 30, 2023

May 30, 2023

0xDE @11011110

@j2kun Saw this today as a supposed practical application and deployment of cuckoo hashing in the TikTok recommendation system: https://gantry.io/blog/papers-to-know-20230110/, via https://en.wikipedia.org/wiki/Special:Diff/1157764695

gantry.ioMonolith: The Recommendation System Behind TikTokGantry website

**Jeremy Kun** @j2kun · May 30, 2023

May 30, 2023

Jeremy Kun @j2kun

@11011110 Oh boy, if I can get an interview with someone from TikTok... that would be wild. Maybe also get me in trouble with various state entities???

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back