TIL that modern CPUs have an $F_2$-polynomial multiplication intrinsic operation: https://en.wikipedia.org/wiki/CLMUL_instruction_set
@j2kun Yeah, it's surprisingly useful. Aside from the classic "algebraic" use cases, there are some often useful bit tricks like computing the running bit parity by carryless multiplying by all ones/-1.
@j2kun For example, if you mark the start and end of a range with a 1 bit then the running parity is a mask vector to select the bits in those ranges. You can even use this for computing rasterization coverage masks for potentially overlapping polygons where overlaps are resolved with the "mod 2" rule.
@j2kun And here's a fun application to parsing quoted strings: https://github.com/simdjson/simdjson/blob/cab383e1de7385c6460b66e5fad25a116d750402/src/generic/stage1/json_string_scanner.h#L67
@pervognsen @j2kun speeding up multi-block CRC32C is another example:
@j2kun
A bit of an aside, but when you have instructions named PCLMULQDQ, PCLMULLQLQDQ, PCLMULHQLQDQ, PCLMULLQHQDQ and PCLMULHQHQDQ, I'm starting to question the use of the term "mnemonics" for assembler instruction names.
@j2kun More recent x86 CPUs also have GF2P8AFFINEQB, which does 8x8 GF(2) matrix multiplication. It has so many uses!