The characteristics of floating types are defined in terms of a model that describes a representation of floating-point numbers and values that provide information about an implementation's floating-point arithmetic.
The following parameters are used to define the model for each floating-point type:
A floating-point number x is defined by the following model:
x " " = " " sb"^" e" " " " sum from k=1 to p^ " " f_ k" " " " b"^" " "-k ,
" " e_ min" " " " <= " " e " " <= " " e_ max" "
In addition to normalized floating-point numbers ($f_ 1$>0 if x≠0), floating types may be able to contain other kinds of floating-point numbers, such as subnormal floating-point numbers (x≠0, e=$e_ min$, $f_ 1$=0) and unnormalized floating-point numbers (x≠0, e>$e_ min$, $f_ 1$=0), and values that are not floating-point numbers, such as infinities and NaNs. A NaN is an encoding signifying Not-a-Number. A quiet NaN propagates through almost every arithmetic operation without raising a floating-point exception; a signaling NaN generally raises a floating-point exception when occurring as an arithmetic operand.
An implementation may give zero and non-numeric values, such as infinities and NaNs, a sign, or may leave them unsigned. Wherever such values are unsigned, any requirement in POSIX.1-2008 to retrieve the sign shall produce an unspecified sign and any requirement to set the sign shall be ignored.
The accuracy of the floating-point operations ('+', '-', '*', '/') and of the functions in <math.h> and <complex.h> that return floating-point results is implementation-defined, as is the accuracy of the conversion between floating-point internal representations and string representations performed by the functions in <stdio.h>, <stdlib.h>, and <wchar.h>. The implementation may state that the accuracy is unknown.
All integer values in the <float.h> header, except FLT_ROUNDS, shall be constant expressions suitable for use in #if preprocessing directives; all floating values shall be constant expressions. All except DECIMAL_DIG, FLT_EVAL_METHOD, FLT_RADIX, and FLT_ROUNDS have separate names for all three floating-point types. The floating-point model representation is provided for all values except FLT_EVAL_METHOD and FLT_ROUNDS.
The rounding mode for floating-point addition is characterized by the implementation-defined value of FLT_ROUNDS:
All other values for FLT_ROUNDS characterize implementation-defined rounding behavior.
The values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized by the implementation-defined value of FLT_EVAL_METHOD:
All other negative values for FLT_EVAL_METHOD characterize implementation-defined behavior.
The <float.h> header shall define the following values as constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign.
lpile { p_ max" " " " log_ 10" " " " b above
left ceiling " " 1 " " + " " p_ max" " " " log_ 10" " " " b right ceiling }
" " " " lpile {if " " b " " is " " a " " power " " of " " 10 above otherwise}
lpile { p " " log_ 10" " " " b above
left floor " " (p " " - " " 1) " " log_ 10" " " " b " " right floor }
" " " " lpile {if " " b " " is " " a " " power " " of " " 10 above otherwise}
left ceiling " " log_ 10" " " " b"^" " "{ e_ min" " " " "^" " "-1 } ^ " " right ceiling
Additionally, FLT_MAX_EXP shall be at least as large as FLT_MANT_DIG, DBL_MAX_EXP shall be at least as large as DBL_MANT_DIG, and LDBL_MAX_EXP shall be at least as large as LDBL_MANT_DIG; which has the effect that FLT_MAX, DBL_MAX, and LDBL_MAX are integral.
left floor " " log_ 10" " ( ( 1 " " - " " b"^" " "-p ) " "
b"^" e" "_ max" "^ ) " " right floor
The <float.h> header shall define the following values as constant expressions with implementation-defined values that are greater than or equal to those shown:
(1 " " - " " b"^" " "-p^) " " b"^" e" "_ max" "
The <float.h> header shall define the following values as constant expressions with implementation-defined (positive) values that are less than or equal to those shown:
The following sections are informative.
