Christian Lawson-Perfect @christianp

Recent searches

Search options

Only available when logged in.

3 posts2 participants0 posts today

**IT News** @itnewsbot@schleuss.online · 4d

Using Integer Addition to Approximate Float Multiplication - Once the domain of esoteric scientific and business computing, floating point calc... - https://hackaday.com/2025/04/10/using-integer-addition-to-approximate-float-multiplication/ #softwaredevelopment #softwarehacks #floatingpoint #fpu #gpu

Hackaday · 4dUsing Integer Addition To Approximate Float MultiplicationOnce the domain of esoteric scientific and business computing, floating point calculations are now practically everywhere. From video games to large language models and kin, it would seem that a pr…

**mkretz** @mkretz@floss.social · 6d

mkretz @mkretz@floss.social

Given

float max = numeric_limits<float>::max();
float x = max * max - max * max;

What is the value of x?

∞
-∞
NaN
depends

#cpp #CPlusPlus #floatingpoint

**mkretz** @mkretz@floss.social · Apr 4

Apr 4

mkretz @mkretz@floss.social

So, as a follow-up to signed zeros in complex numbers: Consider a+b, a-b, 1/a+1/b, 1/a-1/b where a and b are 0. or -0. Some obvious ones:
0. + 0. -> 0.
1/0. + 1/0. -> inf
1/0. + 1/-0. -> NaN
-0. - 0. -> -0.
But what is 0. + -0. and 0. - 0.?

14%0.
43%-0.
43%0. or -0. (implementations are free to choose)
0%NaN (because of indeterminable sign)

#C #CPlusPlus #cpp

**Marc B. Reynolds** @mbr@mastodon.gamedev.place · Feb 11

Feb 11

Marc B. Reynolds @mbr@mastodon.gamedev.place

FWIW: There's a freely available 2023 paper that's basically "Cliff notes" of the book "Handbook of Floating-point Arithmetic"

https://hal.science/hal-04095151v1

hal.scienceFloating-point arithmeticFloating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more. In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.

#FloatingPoint

**Marc B. Reynolds** @mbr@mastodon.gamedev.place · Feb 8 *

Feb 8 *

Marc B. Reynolds @mbr@mastodon.gamedev.place

Another #FloatingPoint refinement of pair arithmetic. This time on the so-called "sloppy add"

https://arxiv.org/abs/2404.05948

AFAIK: The latest refinement of the accurate building blocks are in this:

https://hal.science/hal-02972245

And there's also weakened constraint versions by Rump & Lang

https://www.tuhh.de/ti3/paper/rump/LaRu2017b.pdf

arXiv.orgOn the robustness of double-word addition algorithmsWe demonstrate that, even when there are moderate overlaps in the inputs of sloppy or accurate double-word addition algorithms in the QD library, these algorithms still guarantee error bounds of $O(u^2(|a|+|b|))$ in faithful rounding. Furthermore, the accurate algorithm can achieve a relative error bound of $O(u^2)$ in the presence of moderate overlaps in the inputs when rounding function is round-to-nearest. The relative error bound also holds in directed rounding, but certain additional conditions are required. Consequently, in double-word multiplication and addition operations, we can safely omit the normalization step of double-word multiplication and replace the accurate addition algorithm with the sloppy one. Numerical experiments confirm that this approach nearly doubles the performance of double-word multiplication and addition operations, with negligible precision costs. Moreover, in directed rounding mode, the signs of the errors of the two algorithms are consistent with the rounding direction, even in the presence of input overlap. This allows us to avoid changing the rounding mode in interval arithmetic. We also prove that the relative error bound of the sloppy addition algorithm exceeds $3u^2$ if and only if the input meets the condition of Sterbenz's Lemma when rounding to nearest. These findings suggest that the two addition algorithms are more robust than previously believed.

**davinov** @davinov@mamot.fr · Feb 2

Feb 2

davinov @davinov@mamot.fr

TC39: Decimal: 0.1+0.2===0.3
"Yeahhhh, I don't need to explain much about this proposal"

#JavaScript #FOSDEM #FloatingPoint

**Marc B. Reynolds** @mbr@mastodon.gamedev.place · Jan 25

Jan 25

Marc B. Reynolds @mbr@mastodon.gamedev.place

A couple of #FloatingPoint papers that I caught my eye for future reading:

1) A tightening of FastTwoSum knowledge
2) Attempting to generate approximations which account for runtime evaluation errors (with source that implements an external proc for Sollya)

1) https://hal.science/hal-04875749v1
2) https://hal.science/hal-04709615v1

hal.scienceFastTwoSum revisitedThe FastTwoSum algorithm is a classical way to evaluate the rounding error that occurs when adding two numbers in finite precision arithmetic. Starting with Dekker in the early 1970s, numerous floating-point analyses have been made of this algorithm, that are aimed at identifying sufficient conditions for the error to be computed exactly and, otherwise, at quantifying the quality of the error estimate thus produced. In this paper we revisit these two aspects of FastTwoSum. We first provide new, less restrictive conditions for exactness, and show that FastTwoSum performs an error-free transform in more general situations than those found so far in the literature. Second, when exactness cannot be guaranteed we give several error analyses of the output of FastTwoSum and show that the bounds obtained are tight. In particular, this provides further insight into how the algorithm behaves when roundings other than 'to nearest' are used, or when the operands are reversed.

**Alanna** @kelpana@mastodon.ie · Jan 8

Jan 8

Alanna @kelpana@mastodon.ie

JFC please don't use floating point types for calculating financial transactions or storing monetary values. I just got asked to review some code by someone and it is the first thing I spotted.

#roundingerrors #floatingpoint #softwareengineering

ct @ctaylor@mastodon.content.town · Dec 30, 2024

Dec 30, 2024

ct @ctaylor@mastodon.content.town

https://arxiv.org/abs/2411.12090

arXiv.orgHardware Trends Impacting Floating-Point Computations In Scientific ApplicationsThe evolution of floating-point computation has been shaped by algorithmic advancements, architectural innovations, and the increasing computational demands of modern technologies, such as artificial intelligence (AI) and high-performance computing (HPC). This paper examines the historical progression of floating-point computation in scientific applications and contextualizes recent trends driven by AI, particularly the adoption of reduced-precision floating-point types. The challenges posed by these trends, including the trade-offs between performance, efficiency, and precision, are discussed, as are innovations in mixed-precision computing and emulation algorithms that offer solutions to these challenges. This paper also explores architectural shifts, including the role of specialized and general-purpose hardware, and how these trends will influence future advancements in scientific computing, energy efficiency, and system design.

#hpc #supercomputing #floatingpoint

**IT News** @itnewsbot@schleuss.online · Dec 29, 2024

Dec 29, 2024

IT News @itnewsbot@schleuss.online

A Die-Level Look at the Pentium FDIV Bug - The early 1990s were an interesting time in the PC world, mainly because PCs were ... - https://hackaday.com/2024/12/29/a-die-level-look-at-the-pentium-fdiv-bug/ #reverseengineering #floatingpoint #lookuptable #decapping #division #pentium #intel #fdiv #pla

Hackaday · Dec 29, 2024A Die-Level Look At The Pentium FDIV BugThe early 1990s were an interesting time in the PC world, mainly because PCs were entering the zeitgeist for the first time. This was fueled in part by companies like Intel and AMD going head-to-he…

**Nicd** @nicd@masto.ahlcode.fi · Dec 12, 2024

Dec 12, 2024

Nicd @nicd@masto.ahlcode.fi

Sertifioitu IEEE 754 -hetki Ilmatieteen laitoksella.

#IEEE754 #FloatingPoint #ohjelmointi

**dorotaC** @dcz@fosstodon.org · Nov 26, 2024

Nov 26, 2024

dorotaC @dcz@fosstodon.org

Irregular reminder: floating point calculations are not exact.

Take your program to a different #hardware and your tests may start failing:

https://codeberg.org/libobscura/libobscura/src/branch/master/crates/conv/tests/convtest.rs#L40

https://codeberg.org/libobscura/libobscura/actions/runs/30#jobstep-7-431

Codeberg.orglibobscura/crates/conv/tests/convtest.rs at masterlibobscura - Your friendly Linux camera library

#fpu #float #floatingpoint

**Eternal, Majesty** @majesty@toot.cat · Nov 15, 2024

Nov 15, 2024

Eternal, Majesty @majesty@toot.cat

I did some performance work recently on a #riscv vector nextafter function so that people can increment their floating-points faster. I know this is something most people working with #floatingpoint really need, so I am delivering. \s

**Torsten Bronger** @bronger@mastodon.social · Nov 15, 2024

Nov 15, 2024

Torsten Bronger @bronger@mastodon.social

Whom should I call to add NaN support in #JSON?

#JavaScript #floatingpoint #ieee754

**Methylzero** @Methylzero@mast.hpc.social · Oct 26, 2024 *

Oct 26, 2024 *

Methylzero @Methylzero@mast.hpc.social

C23 and C++23 are finally joining the quadruple precision club, by bringing a standard way to handle 128-bit floating point numbers!
(FP16 is also here if you need it)

Here is hoping that a future Fortran standard will adopt the C_Float128 kind specifier that gcc/gfortran already has as an extension.
https://en.cppreference.com/w/cpp/types/floating-point

en.cppreference.comFixed width floating-point types (since C++23) - cppreference.com

#c23 #cpp23 #cpp

**Jitse Niesen** @jitseniesen · Oct 8, 2024

Oct 8, 2024

Jitse Niesen @jitseniesen

I am excited to read about numpy_quaddtype, a project to include quad precision in numpy. The standard precision in numpy (and most other places) is double precision: numbers are stored in 64 bits and the precision is about 16 decimal digits. This is usually enough but not always.

Numpy does have longdouble, which may or may not increase precision, depending on your platform, but even if it does, the increase is very modest. If I need more precision, I typically use FLINT, but that is meant for super high precision and rigorous computations. It will be very good to have another tool.

More details in this blogpost: https://labs.quansight.org/blog/numpy-quaddtype-blog

labs.quansight.orgNumpy QuadDType: Quadruple Precision for EveryoneIntroducing the new data-type for Numpy providing cross-platform support of quadruple precision.

#FloatingPoint #numpy #quansight

**Ed S** @EdS@mastodon.sdf.org · Oct 8, 2024

Oct 8, 2024

Ed S @EdS@mastodon.sdf.org

Bytesized floating point: the HiF8 format. Intricate!

Zero is 0x00, so that's nice. And there's only one zero.

There are two infinities, which could be handy.

Range is 2^15 down to 2^-15 (for positive numbers - it's a sign and magnitude system, mostly)

Up to 4 bits of precision, variable. Best precision around 1.0, tapering off for larger and smaller values.

Denormals go down to 2^-22

https://arxiv.org/pdf/2409.16626

Graph of signficand precision in bits versus exponent value (including denormals)

Shows exponent from -22 to +15, precision from 1 bit up to 4 bits

7 values around 0 have 4 bit precision, on either side 4 values have 3 bit precision, and beyond those both sides show 8 values with 2 bit precision

#computerscience #floatingpoint #hif8

**David Cantrell** @DrHyde@fosstodon.org · Oct 5, 2024

Oct 5, 2024

David Cantrell @DrHyde@fosstodon.org

Right, that's quite enough time spent remote-#debugging a weird #FloatingPoint edge case. Now time to watch rugby. My apologies to all the #Windows users who have to suffer this bug. Owing to your OS vendor's refusal to let authors use their proprietary platform, what would take me 5 minutes to investigate on Mac or Linux takes 2 weeks on Windows. Consider upgrading. To anything. Literally anything. I have recently proven that the international postal service is #TuringComplete, try that.