mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

2.9K
active users

#floatingpoint

3 posts2 participants0 posts today

So, as a follow-up to signed zeros in complex numbers: Consider a+b, a-b, 1/a+1/b, 1/a-1/b where a and b are 0. or -0. Some obvious ones:
0. + 0. -> 0.
1/0. + 1/0. -> inf
1/0. + 1/-0. -> NaN
-0. - 0. -> -0.
But what is 0. + -0. and 0. - 0.?

#C#CPlusPlus#cpp

FWIW: There's a freely available 2023 paper that's basically "Cliff notes" of the book "Handbook of Floating-point Arithmetic"

hal.science/hal-04095151v1

hal.scienceFloating-point arithmeticFloating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more. In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.

Another #FloatingPoint refinement of pair arithmetic. This time on the so-called "sloppy add"

arxiv.org/abs/2404.05948

AFAIK: The latest refinement of the accurate building blocks are in this:

hal.science/hal-02972245

And there's also weakened constraint versions by Rump & Lang

tuhh.de/ti3/paper/rump/LaRu201

arXiv logo
arXiv.orgOn the robustness of double-word addition algorithmsWe demonstrate that, even when there are moderate overlaps in the inputs of sloppy or accurate double-word addition algorithms in the QD library, these algorithms still guarantee error bounds of $O(u^2(|a|+|b|))$ in faithful rounding. Furthermore, the accurate algorithm can achieve a relative error bound of $O(u^2)$ in the presence of moderate overlaps in the inputs when rounding function is round-to-nearest. The relative error bound also holds in directed rounding, but certain additional conditions are required. Consequently, in double-word multiplication and addition operations, we can safely omit the normalization step of double-word multiplication and replace the accurate addition algorithm with the sloppy one. Numerical experiments confirm that this approach nearly doubles the performance of double-word multiplication and addition operations, with negligible precision costs. Moreover, in directed rounding mode, the signs of the errors of the two algorithms are consistent with the rounding direction, even in the presence of input overlap. This allows us to avoid changing the rounding mode in interval arithmetic. We also prove that the relative error bound of the sloppy addition algorithm exceeds $3u^2$ if and only if the input meets the condition of Sterbenz's Lemma when rounding to nearest. These findings suggest that the two addition algorithms are more robust than previously believed.

🎉 🎉 C23 and C++23 are finally joining the quadruple precision club, by bringing a standard way to handle 128-bit floating point numbers!
(FP16 is also here if you need it)

Here is hoping that a future Fortran standard will adopt the C_Float128 kind specifier that gcc/gfortran already has as an extension.
en.cppreference.com/w/cpp/type

en.cppreference.comFixed width floating-point types (since C++23) - cppreference.com
#c23#cpp23#cpp

I am excited to read about numpy_quaddtype, a project to include quad precision in numpy. The standard precision in numpy (and most other places) is double precision: numbers are stored in 64 bits and the precision is about 16 decimal digits. This is usually enough but not always.

Numpy does have longdouble, which may or may not increase precision, depending on your platform, but even if it does, the increase is very modest. If I need more precision, I typically use FLINT, but that is meant for super high precision and rigorous computations. It will be very good to have another tool.

More details in this blogpost: labs.quansight.org/blog/numpy-

labs.quansight.orgNumpy QuadDType: Quadruple Precision for EveryoneIntroducing the new data-type for Numpy providing cross-platform support of quadruple precision.

Bytesized floating point: the HiF8 format. Intricate!

Zero is 0x00, so that's nice. And there's only one zero.

There are two infinities, which could be handy.

Range is 2^15 down to 2^-15 (for positive numbers - it's a sign and magnitude system, mostly)

Up to 4 bits of precision, variable. Best precision around 1.0, tapering off for larger and smaller values.

Denormals go down to 2^-22

arxiv.org/pdf/2409.16626

Right, that's quite enough time spent remote-#debugging a weird #FloatingPoint edge case. Now time to watch rugby. My apologies to all the #Windows users who have to suffer this bug. Owing to your OS vendor's refusal to let authors use their proprietary platform, what would take me 5 minutes to investigate on Mac or Linux takes 2 weeks on Windows. Consider upgrading. To anything. Literally anything. I have recently proven that the international postal service is #TuringComplete, try that.