Floating-Point Types
Description
The library provides safe floating-point types that detect IEEE 754 exceptional conditions at runtime: overflow, underflow, division by zero, invalid operations, and NaN propagation.
These types are drop-in replacements for the standard floating-point types float and double with added safety guarantees.
| Type | Underlying Type | Width | IEEE 754 Format | Approximate Max Finite Magnitude |
|---|---|---|---|---|
|
|
32 bits |
binary32 (single precision) |
3.40e+38 |
|
|
64 bits |
binary64 (double precision) |
1.80e+308 |
The types assume an IEEE 754 binary32 / binary64 representation for the underlying float and double.
Each type exposes a basis_type member type alias that refers to the underlying floating-point type, allowing conversion back to built-in types when needed.
#include <boost/safe_numbers/floats.hpp>
namespace boost::safe_numbers {
using f32 = detail::float_basis<float>;
using f64 = detail::float_basis<double>;
template <compatible_float_type BasisType>
class float_basis {
public:
using basis_type = BasisType;
// Construction
constexpr float_basis() noexcept = default;
explicit constexpr float_basis(BasisType val) noexcept;
// Reject construction from any non-basis type (prevents narrowing and silent widening)
template <typename T>
requires (!std::is_same_v<T, BasisType>)
explicit constexpr float_basis(T) noexcept = delete;
// Conversion to the underlying type
explicit constexpr operator BasisType() const noexcept;
// Comparison operators
friend constexpr auto operator==(float_basis lhs, float_basis rhs) noexcept -> bool;
friend constexpr auto operator<=>(float_basis lhs, float_basis rhs) noexcept
-> std::partial_ordering = default;
}; // class float_basis
// Arithmetic operators (throw on IEEE 754 exceptional results)
template <compatible_float_type BasisType>
constexpr auto operator+(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
template <compatible_float_type BasisType>
constexpr auto operator-(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
template <compatible_float_type BasisType>
constexpr auto operator*(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
template <compatible_float_type BasisType>
constexpr auto operator/(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
template <compatible_float_type BasisType>
constexpr auto operator%(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
} // namespace boost::safe_numbers
f32 and f64 can store any value of the underlying type, including signed zeros, infinities, and NaN; construction does not validate the value.
The safety guarantees are enforced at the point of arithmetic, where exceptional IEEE 754 results are turned into exceptions rather than being silently propagated.
This differs from Bounded Floating-Point Types, which rejects NaN and out-of-range values at construction time.
Operator Behavior
Default Construction
constexpr float_basis() noexcept = default;
Values are default-initialized to positive zero.
Construction from the Underlying Type
explicit constexpr float_basis(BasisType val) noexcept;
Construction from the underlying type is explicit to prevent accidental conversions.
For f32 the argument type is float, and for f64 it is double.
Construction from Other Types
template <typename T>
requires (!std::is_same_v<T, BasisType>)
explicit constexpr float_basis(T) noexcept = delete;
Construction from any type other than the exact underlying type is a compile-time error.
This deleted catch-all eliminates both narrowing (for example double into f32) and silent widening (for example float into f64), as well as construction from integers or bool.
auto a = f32{1.5f}; // OK: float into f32
auto b = f64{1.5}; // OK: double into f64
// auto c = f32{1.5}; // Compile error: double into f32 (would narrow)
// auto d = f64{1.5f}; // Compile error: float into f64 (deleted to avoid silent widening)
// auto e = f32{1}; // Compile error: int into f32
// auto f = f32{true}; // Compile error: bool into f32
Conversion to the Underlying Type
explicit constexpr operator BasisType() const noexcept;
Conversion back to the underlying floating-point type is explicit.
f32 converts to float and f64 converts to double.
auto x = f32{3.5f};
auto raw = static_cast<float>(x); // 3.5f
Comparison Operators
friend constexpr auto operator==(float_basis lhs, float_basis rhs) noexcept -> bool;
friend constexpr auto operator<=>(float_basis lhs, float_basis rhs) noexcept
-> std::partial_ordering = default;
Three-way comparison is supported via operator<=>, which returns std::partial_ordering because floating-point values are not totally ordered.
All comparison operators (<, ⇐, >, >=, ==, !=) are available.
A comparison involving NaN yields std::partial_ordering::unordered, so every relational test against a NaN value is false, exactly as for the built-in types.
Equality is defined explicitly rather than via a defaulted operator so that the -Wfloat-equal diagnostic stays contained within the library and does not leak into user code.
Arithmetic Operators
template <compatible_float_type BasisType>
constexpr auto operator+(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
template <compatible_float_type BasisType>
constexpr auto operator-(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
template <compatible_float_type BasisType>
constexpr auto operator*(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
template <compatible_float_type BasisType>
constexpr auto operator/(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
template <compatible_float_type BasisType>
constexpr auto operator%(float_basis<BasisType> lhs,
float_basis<BasisType> rhs) -> float_basis<BasisType>;
Each arithmetic operator computes the result, classifies it according to IEEE 754-2008 sections 6 and 7, and throws when the result is an exceptional value:
-
+,-: Throwstd::overflow_erroron saturation to positive infinity andstd::underflow_erroron saturation to negative infinity. Subtracting like-signed infinities, or adding opposite-signed infinities, is an invalid operation and throwsstd::domain_error. -
*: Throwsstd::overflow_errororstd::underflow_erroron saturation to an infinity. Multiplying zero by an infinity is an invalid operation and throwsstd::domain_error. -
/: Throwsstd::overflow_errororstd::underflow_erroron saturation to an infinity. Dividing zero by zero or infinity by infinity is an invalid operation and throwsstd::domain_error. Dividing a finite non-zero value by zero throwsstd::domain_error. -
%: Computes the IEEE 754 remainder viastd::fmod. Modulo by zero, or modulo of an infinite dividend, throwsstd::domain_error. The remainder cannot overflow or underflow.
In every operation, an operand that is a quiet or signaling NaN causes the operation to throw std::domain_error.
The % operator borrows the spelling of the integer modulo operator but performs floating-point remainder, mirroring std::fmod rather than truncated integer division.
Exception Behavior
The following table summarizes the exceptional conditions and the exception each one produces.
| Condition | Operators | Exception Type |
|---|---|---|
Result saturates to positive infinity |
|
|
Result saturates to negative infinity |
|
|
Either operand is a quiet NaN |
|
|
Either operand is a signaling NaN |
|
|
Addition of opposite-signed infinities, or subtraction of like-signed infinities |
|
|
Multiplication of zero by an infinity |
|
|
Division of zero by zero, or infinity by infinity |
|
|
Division of a finite non-zero value by zero |
|
|
Modulo of zero by zero, or with an infinite dividend |
|
|
Modulo by zero with a finite non-zero dividend |
|
|
Saturation to an infinity (IEEE 754 section 6.1) maps to std::overflow_error or std::underflow_error according to sign. The invalid-operation cases (section 7.2), NaN propagation (section 6.2), and division by zero of a finite numerator (section 7.3) are all reported as std::domain_error.
|
Because the underlying value is not validated at construction, an f32 or f64 that already holds a non-finite value will still throw on the first arithmetic operation that observes it:
auto nan = f32{std::numeric_limits<float>::quiet_NaN()}; // OK: stored, not validated
auto inf = f32{std::numeric_limits<float>::infinity()}; // OK: stored, not validated
// auto bad = nan + f32{1.0f}; // Throws std::domain_error: NaN operand
// auto big = inf + f32{1.0f}; // Throws std::overflow_error: result is positive infinity
Operations Not Provided
The floating-point types deliberately expose a smaller surface than the integer types. The following operations are not provided:
-
Compound assignment (
+=,-=,*=,/=,%=) -
Increment and decrement (
++,--) -
Unary
+and unary- -
Bitwise operators (
~,&,|,^,<<,>>and their compound forms)
Bitwise operators have no meaning for floating-point values, and the remaining operations are omitted to keep the type minimal: every value change goes through one of the five checked binary operators, so there is a single place where IEEE 754 exceptional results are intercepted.
The policy-based free functions offered for the integer types (saturating_*, overflowing_*, checked_*, strict_*, and widening_*) are likewise not provided for floating-point types.
IEEE 754 already defines saturation to infinity and the propagation of NaN; the safe floating-point types intercept exactly those exceptional results and report them, rather than offering alternative numeric policies.
See Overflow Policies for the policy model as it applies to the integer types.
Mixed-Width Operations
Operations between f32 and f64 are compile-time errors.
The operands must be promoted to a common type explicitly before performing the operation.
auto a = f32{1.0f};
auto b = f64{2.0};
// auto result = a + b; // Compile error: mismatched types
// Promote a to f64 explicitly, then add
auto a_wide = f64{static_cast<double>(static_cast<float>(a))};
auto result = a_wide + b; // OK
Literals
The user-defined literals _f32 and _f64 construct f32 and f64 values from floating-point literals, range-checking against the target’s maximum.
See User-Defined Literals for details.
using namespace boost::safe_numbers::literals;
constexpr auto pi = 3.14_f32;
constexpr auto e = 2.718281828459045_f64;
Standard Library Support
f32 and f64 are specializations of std::numeric_limits in <boost/safe_numbers/limits.hpp>, so traits such as max(), min(), epsilon(), infinity(), and is_iec559 are available.
The types also participate in the same library_type concept as the integer types, so the iostream operators, std::formatter, and fmt::formatter specializations work transparently.
See <limits> Support, Stream I/O Support, and Formatting Support.