@dlzv Yeah, I kinda dismissed pretrained models because everything is in English and I work in French... besides our data is kinda specific, but I may try at some point in the future.
I think a nice panorama is given in "Two decades of statistical language modeling: Where do we go from here?" and some of the references there, although my favorite is "Latent Dirichlet Allocation" (which came later). Those are two of the ones I read lol.
@jt There are plenty of pretrained models for French! I think it's probably one of the most well-studied languages: https://huggingface.co/models?language=fr&sort=downloads
Thanks for the refs!
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!