Ain't that beautiful / 2

Discussion:

(too old to reply)

Bonita Montero

2024-04-10 12:21:10 UTC

Unfortunately C++20's from_chars doesn's support wide characters.
So I implemented my own from_chars called parse_double generically
as a template which can handle any ASCII-based character set whose
characters are integers.
Another requirement for me was that the code should support zero
-terminated strings as well as flat memory ranges with a beginning
and an end. This is done with a function-object which derermines
if a iterator points to a terminating character or address. The
default type for this function object is parse_never_ends(), which
results in any invalid character for the text text value to be a
termination character and as the zero is always an invalid charac-
ter for a floating point value parsing doens't end because the end
function object reports an end but because parsing can't find fur-
ther caracters.
And there's an overload of parse_double which is given a start and
and end iterator and this overload internally has its own end-func-
tion-object which compares the current reading position against the
end iterator.
My code scans the digits beyond the comma first and stores it in a
thead_local vector of doubles whose capacity only grows across mul-
tiple calls of my function for optimal performance. To maximize per-
formance I'm using my own union of doubles to suppress default-ini-
tialization of the vector's elements. The digits are appended until
their value becomes zero or there are no further digits.
The values in the suffix-vector are added in reverse order. If I'd
add them in forward order the precision would be less since there
would be mantissa digits dropped right from the final mantissa.
That's while there's the vector of doubles.
Each digit's valus is multiplied by a 10 ^ N value. This value is
calculated incrementally by successive / 10.0 or * 10.0 operations.
This successive calculations may lead to less precision than if
this value is calculated for each digit with binary exponentation.
So there's a precise mode with my code which is activated through
the first template parameter of my function which defaults to false.
With that each digit's value is calculated with binary exponenta-
tion of 10 ^ N, but this also gives less performance. With my test
code this gives up to four bits of additional precision.

So here's the code:

template<std::random_access_iterator Iterator>
requires std::integral<std::iter_value_t<Iterator>>
struct parse_result
{
std::errc ec;
Iterator next;
};

// parse ends at first invalid character, at least at '\0'
auto parse_never_ends = []( auto ) { return false; };

template<bool Precise = false, std::random_access_iterator Iterator,
typename End = decltype(parse_never_ends)>
requires std::integral<std::iter_value_t<Iterator>>
&& requires( End end, Iterator it ) { { end( it ) } ->
std::convertible_to<bool>; }
parse_result<Iterator> parse_double( Iterator str, double &result, End
end = End() )
{
using namespace std;
static_assert(sizeof(double) == sizeof(uint64_t) &&
numeric_limits<double>::is_iec559, "double must be IEEE-754 double
precision");
// mask to a double's exponent
constexpr uint64_t EXP_MASK = 0x7FFull << 52;
// calculate 10 ^ exp in double
auto pow10 = []( int64_t exp ) -> double
{
// table for binary exponentation with 10 ^ (2 ^ N)
static array<double, 64> tenPows;
// table initialized ?
if( static atomic_bool once( false ); !once.load( memory_order_acquire ) )
{
// weakly no: test locked again
static mutex onceMtx;
lock_guard lock( onceMtx );
if( !once.load( memory_order_relaxed ) )
{
// no: calculate table
for( double p10x2xN = 10.0; double &pow : tenPows )
pow = p10x2xN,
p10x2xN *= p10x2xN;
// set initialized flag with release semantics
once.store( true, memory_order_release );
}
}
// begin with 1.0 since x ^ 0 = 1
double result = 1.0;
// unsigned exponent
uint64_t uExp = exp >= 0 ? exp : -exp;
// highest set bit of exponent
size_t bit = 63 - countl_zero( uExp );
// bit mask to highest set bit
uint64_t mask = 1ull << bit;
// loop as long as there are bits in unsigned exponent
for( ; uExp; uExp &= ~mask, mask >>= 1, --bit )
// bit set ?
if( uExp & mask )
{
// yes: multiply result by 10 ^ (bit + 1)
result *= tenPows[bit];
// overlow ?
if( (bit_cast<uint64_t>( result ) & EXP_MASK) == EXP_MASK )
// yes: result wouldn't change furhter; stop
break;
}
// return 1 / result if exponent is negative
return exp >= 0 ? result : 1.0 / result;
};
Iterator scn = str;
// ignore-case compare of a string with arbitrary with with a C-string
auto xstricmp = [&]( Iterator str, char const *second ) -> bool
{
// unsigned character-type
using uchar_t = make_unsigned_t<iter_value_t<Iterator>>;
auto toLower = []( uchar_t c ) -> uchar_t
{
return c - (c >= 'a' && c <= 'a' ? 'a' - 'A' : 0);
};
for( ; ; ++str, ++second )
if( !*second ) [[unlikely]]
return true;
else if( end( str ) ) [[unlikely]]
return false;
else if( toLower( *str ) != (unsigned char)*second ) [[unlikely]]
return false;
};
// at end ?
if( end( scn ) )
// yes: err
return { errc::invalid_argument, scn };
// double's binary representation sign
uint64_t binSign = 0;
// positive sign ?
if( *scn == '+' ) [[unlikely]]
// at end ?
if( end( ++scn ) ) [[unlikely]]
// yes: err
return { errc::invalid_argument, str };
else;
// negative sign ?
else if( *scn == '-' )
{
// yes: remember sign
binSign = 1ull << 63;
// at end ?
if( end( ++scn ) )
// yes: err
return { errc::invalid_argument, str };
}
// apply binSign to a double
auto applySign = [&]( double d )
{
return bit_cast<double>( binSign | bit_cast<uint64_t>( d ) );
};
// NaN ?
if( xstricmp( scn, "nan" ) ) [[unlikely]]
{
// yes: apply sign to NaN
result = applySign( numeric_limits<double>::quiet_NaN() );
return { errc(), scn + 3 };
}
// SNaN ?
if( xstricmp( scn, "snan" ) ) [[unlikely]]
{
// yes: apply sign to NaN
result = applySign( numeric_limits<double>::signaling_NaN() );
return { errc(), scn + 4 };
}
// Inf
if( xstricmp( scn, "inf" ) ) [[unlikely]]
{
// yes: apply sign to Inf
result = applySign( numeric_limits<double>::infinity() );
return { errc(), scn + 3 };
}
// begin of prefix
Iterator prefixBegin = scn;
while( *scn >= '0' && *scn <= '9' && !end( ++scn ) );
Iterator
// end of prefix
prefixEnd = scn,
// begin and end of suffix initially empty
suffixBegin = scn,
suffixEnd = scn;
// has comma for suffix ?
if( !end( scn ) && *scn == '.' )
{
// suffix begin
suffixBegin = ++scn;
for( ; !end( scn ) && *scn >= '0' && *scn <= '9'; ++scn );
// suffix end
suffixEnd = scn;
}
// prefix and suffix empty ?
if( prefixBegin == prefixEnd && suffixBegin == suffixEnd ) [[unlikely]]
// yes: err
return { errc::invalid_argument, str };
// exponent initially zero
int64_t exp = 0;
// has 'e' for exponent ?
if( !end( scn ) && (*scn == 'e' || *scn == 'E') )
// yes: scan exponent
if( auto [ec, next] = parse_int( ++scn, exp, end ); ec == errc() )
[[likely]]
// succeeded: rembember end of exponent
scn = next;
else
// failed: 'e' without actual exponent
return { ec, scn };
// number of suffix digits
size_t nSuffixes;
if( exp >= 0 ) [[likely]]
// suffix is within suffix or right from suffix
if( suffixEnd - suffixBegin - exp > 0 ) [[likely]]
// suffix is within suffix
nSuffixes = suffixEnd - suffixBegin - (ptrdiff_t)exp;
else
// there are no suffixes
nSuffixes = 0;
else
if( prefixEnd - prefixBegin + exp >= 0 ) [[likely]]
// suffix is within prefix
nSuffixes = suffixEnd - suffixBegin - (ptrdiff_t)exp;
else
// there are no prefixes, all digits are suffixes
nSuffixes = suffixEnd - suffixBegin + (prefixEnd - prefixBegin);
// have non-default initialized doubles to save CPU-time
union ndi_dbl { double d; ndi_dbl() {} };
// thread-local vector with suffixes
thread_local vector<ndi_dbl> ndiSuffixDbls;
// resize suffixes vector, capacity will stick to the maximum
ndiSuffixDbls.resize( nSuffixes );
// have range checking with suffixes on debugging
span suffixDbls( &to_address( ndiSuffixDbls.begin() )->d, &to_address(
ndiSuffixDbls.end() )->d );
// iterator after last suffix
auto suffixDblsEnd = suffixDbls.begin();
double digMul;
int64_t nextExp;
auto suffix = [&]( Iterator first, Iterator end )
{
while( first != end ) [[likely]]
{
// if we're having maximum precision calculate digMul with pow10 for
every iteration
if constexpr( Precise )
digMul = pow10( nextExp-- );
// pow10-value of digit becomes zero ?
if( !bit_cast<uint64_t>( digMul ) )
// yes: no further suffix digits to calculate
return false;
// append suffix double
*suffixDblsEnd++ = (int)(*first++ - '0') * digMul;
// if we're having less precision calculate digMul cascaded
if constexpr( !Precise )
digMul /= 10.0;
}
// further suffix digits to calculate
return true;
};
// flag that signals that is suffix beyond the suffix in prefix
bool furtherSuffix;
if( exp >= 0 ) [[likely]]
// there's no suffix in prefix
nextExp = -1,
digMul = 1.0 / 10.0,
furtherSuffix = true;
else
{
// there's suffix in prefix
Iterator suffixInPrefixBegin;
if( prefixEnd - prefixBegin + exp >= 0 )
// sufix begins within prefix
suffixInPrefixBegin = prefixEnd + (ptrdiff_t)exp,
nextExp = -1,
digMul = 1.0 / 10.0;
else
{
// suffix begins before prefix
suffixInPrefixBegin = prefixBegin;
nextExp = (ptrdiff_t)exp + (prefixEnd - prefixBegin) - 1;
if constexpr( !Precise )
digMul = pow10( nextExp );
}
furtherSuffix = suffix( suffixInPrefixBegin, prefixEnd );
}
if( furtherSuffix && exp < suffixEnd - suffixBegin )
// there's suffix in suffix
if( exp <= 0 )
// (remaining) suffix begins at suffix begin
suffix( suffixBegin, suffixEnd );
else
// suffix begins at exp in suffix
suffix( suffixBegin + (ptrdiff_t)exp, suffixEnd );
result = 0.0;
// add suffixes from the tail to the beginning
for( ; suffixDblsEnd != suffixDbls.begin(); result += *--suffixDblsEnd );
// add prefix digits from end reverse to first
auto prefix = [&]( Iterator end, Iterator first )
{
while( end != first ) [[likely]]
{
// if we're having maximum precision calculate digMul with pow10 for
every iteration
if constexpr( Precise )
digMul = pow10( nextExp++ );
// pow10-value of digit becomes infinte ?
if( (bit_cast<uint64_t>( digMul ) & EXP_MASK) == EXP_MASK ) [[unlikely]]
{
// yes: infinte result, no further suffix digits to calculate
result = numeric_limits<double>::infinity();
return false;
}
// add digit to result
result += (int)(*--end - '0') * digMul;
// if we're having less precision calculate digMul cascaded
if constexpr( !Precise )
digMul *= 10.0;
}
return true;
};
// flag that signals that prefix digits are finite so far, i.e. not Inf
bool prefixFinite = true;
if( !exp ) [[likely]]
// prefix ends at suffix
nextExp = 0,
digMul = 1.0;
else if( exp > 0 ) [[likely]]
{
// there's prefix in suffix
Iterator prefixInSuffixEnd;
if( exp <= suffixEnd - suffixBegin )
// prefix ends within suffix
prefixInSuffixEnd = suffixBegin + (ptrdiff_t)exp,
nextExp = 0,
digMul = 1.0;
else
{
// prefix ends after suffix end
prefixInSuffixEnd = suffixEnd;
nextExp = exp - (suffixEnd - suffixBegin);
if constexpr( !Precise )
digMul = pow10( nextExp );
}
prefixFinite = prefix( prefixInSuffixEnd, suffixBegin );
}
else if( exp < 0 )
{
// prefix ends before suffix
nextExp = -exp;
if constexpr( !Precise )
digMul = pow10( -exp );
}
if( prefixFinite && prefixEnd - prefixBegin + exp > 0 ) [[likely]]
// prefix has prefix
if( exp >= 0 ) [[likely]]
// there's full prefix in prefix
prefixFinite = prefix( prefixEnd, prefixBegin );
else
// remaining prefix is within prefix
prefixFinite = prefix( prefixEnd + (ptrdiff_t)exp, prefixBegin );
if( !prefixFinite ) [[unlikely]]
{
// result is Inf or NaN
// if there we had (digit = 0) * (digMul = Inf) == NaN:
// make result +/-Inf
result = bit_cast<double>( binSign | EXP_MASK );
return { errc::result_out_of_range, scn };
}
result = applySign( result );
return { errc(), scn };
}

template<bool Precise = false, std::random_access_iterator Iterator>
requires std::integral<std::iter_value_t<Iterator>>
parse_result<Iterator> parse_double( Iterator str, Iterator end, double
&result )
{
return parse_double<Precise>( str, result, [&]( Iterator it ) { return
it == end; } );
}

wij

2024-04-11 00:21:16 UTC

Permalink

Post by Bonita Montero
Unfortunately C++20's from_chars doesn's support wide characters.
So I implemented my own from_chars called parse_double generically
as a template which can handle any ASCII-based character set whose
characters are integers.
Another requirement for me was that the code should support zero
-terminated strings as well as flat memory ranges with a beginning
and an end. This is done with a function-object which derermines
if a iterator points to a terminating character or address. The
default type for this function object is parse_never_ends(), which
results in any invalid character for the text text value to be a
termination character and as the zero is always an invalid charac-
ter for a floating point value parsing doens't end because the end
function object reports an end but because parsing can't find fur-
ther caracters.
And there's an overload of parse_double which is given a start and
and end iterator and this overload internally has its own end-func-
tion-object which compares the current reading position against the
end iterator.
My code scans the digits beyond the comma first and stores it in a
thead_local vector of doubles whose capacity only grows across mul-
tiple calls of my function for optimal performance. To maximize per-
formance I'm using my own union of doubles to suppress default-ini-
tialization of the vector's elements. The digits are appended until
their value becomes zero or there are no further digits.
The values in the suffix-vector are added in reverse order. If I'd
add them in forward order the precision would be less since there
would be mantissa digits dropped right from the final mantissa.
That's while there's the vector of doubles.
Each digit's valus is multiplied by a 10 ^ N value. This value is
calculated incrementally by successive / 10.0 or * 10.0 operations.
This successive calculations may lead to less precision than if
this value is calculated for each digit with binary exponentation.
So there's a precise mode with my code which is activated through
the first template parameter of my function which defaults to false.
With that each digit's value is calculated with binary exponenta-
tion of 10 ^ N, but this also gives less performance. With my test
code this gives up to four bits of additional precision.
template<std::random_access_iterator Iterator>
requires std::integral<std::iter_value_t<Iterator>>
struct parse_result
{
std::errc ec;
Iterator next;
};
// parse ends at first invalid character, at least at '\0'
auto parse_never_ends = []( auto ) { return false; };
template<bool Precise = false, std::random_access_iterator Iterator,
typename End = decltype(parse_never_ends)>
requires std::integral<std::iter_value_t<Iterator>>
&& requires( End end, Iterator it ) { { end( it ) } ->
std::convertible_to<bool>; }
parse_result<Iterator> parse_double( Iterator str, double &result, End
end = End() )
{
using namespace std;
static_assert(sizeof(double) == sizeof(uint64_t) &&
numeric_limits<double>::is_iec559, "double must be IEEE-754 double
precision");
// mask to a double's exponent
constexpr uint64_t EXP_MASK = 0x7FFull << 52;
// calculate 10 ^ exp in double
auto pow10 = []( int64_t exp ) -> double
{
// table for binary exponentation with 10 ^ (2 ^ N)
static array<double, 64> tenPows;
// table initialized ?
if( static atomic_bool once( false ); !once.load( memory_order_acquire ) )
{
// weakly no: test locked again
static mutex onceMtx;
lock_guard lock( onceMtx );
if( !once.load( memory_order_relaxed ) )
{
// no: calculate table
for( double p10x2xN = 10.0; double &pow : tenPows )
pow = p10x2xN,
p10x2xN *= p10x2xN;
// set initialized flag with release semantics
once.store( true, memory_order_release );
}
}
// begin with 1.0 since x ^ 0 = 1
double result = 1.0;
// unsigned exponent
uint64_t uExp = exp >= 0 ? exp : -exp;
// highest set bit of exponent
size_t bit = 63 - countl_zero( uExp );
// bit mask to highest set bit
uint64_t mask = 1ull << bit;
// loop as long as there are bits in unsigned exponent
for( ; uExp; uExp &= ~mask, mask >>= 1, --bit )
// bit set ?
if( uExp & mask )
{
// yes: multiply result by 10 ^ (bit + 1)
result *= tenPows[bit];
// overlow ?
if( (bit_cast<uint64_t>( result ) & EXP_MASK) == EXP_MASK )
// yes: result wouldn't change furhter; stop
break;
}
// return 1 / result if exponent is negative
return exp >= 0 ? result : 1.0 / result;
};
Iterator scn = str;
// ignore-case compare of a string with arbitrary with with a C-string
auto xstricmp = [&]( Iterator str, char const *second ) -> bool
{
// unsigned character-type
using uchar_t = make_unsigned_t<iter_value_t<Iterator>>;
auto toLower = []( uchar_t c ) -> uchar_t
{
return c - (c >= 'a' && c <= 'a' ? 'a' - 'A' : 0);
};
for( ; ; ++str, ++second )
if( !*second ) [[unlikely]]
return true;
else if( end( str ) ) [[unlikely]]
return false;
else if( toLower( *str ) != (unsigned char)*second ) [[unlikely]]
return false;
};
// at end ?
if( end( scn ) )
// yes: err
return { errc::invalid_argument, scn };
// double's binary representation sign
uint64_t binSign = 0;
// positive sign ?
if( *scn == '+' ) [[unlikely]]
// at end ?
if( end( ++scn ) ) [[unlikely]]
// yes: err
return { errc::invalid_argument, str };
else;
// negative sign ?
else if( *scn == '-' )
{
// yes: remember sign
binSign = 1ull << 63;
// at end ?
if( end( ++scn ) )
// yes: err
return { errc::invalid_argument, str };
}
// apply binSign to a double
auto applySign = [&]( double d )
{
return bit_cast<double>( binSign | bit_cast<uint64_t>( d ) );
};
// NaN ?
if( xstricmp( scn, "nan" ) ) [[unlikely]]
{
// yes: apply sign to NaN
result = applySign( numeric_limits<double>::quiet_NaN() );
return { errc(), scn + 3 };
}
// SNaN ?
if( xstricmp( scn, "snan" ) ) [[unlikely]]
{
// yes: apply sign to NaN
result = applySign( numeric_limits<double>::signaling_NaN() );
return { errc(), scn + 4 };
}
// Inf
if( xstricmp( scn, "inf" ) ) [[unlikely]]
{
// yes: apply sign to Inf
result = applySign( numeric_limits<double>::infinity() );
return { errc(), scn + 3 };
}
// begin of prefix
Iterator prefixBegin = scn;
while( *scn >= '0' && *scn <= '9' && !end( ++scn ) );
Iterator
// end of prefix
prefixEnd = scn,
// begin and end of suffix initially empty
suffixBegin = scn,
suffixEnd = scn;
// has comma for suffix ?
if( !end( scn ) && *scn == '.' )
{
// suffix begin
suffixBegin = ++scn;
for( ; !end( scn ) && *scn >= '0' && *scn <= '9'; ++scn );
// suffix end
suffixEnd = scn;
}
// prefix and suffix empty ?
if( prefixBegin == prefixEnd && suffixBegin == suffixEnd ) [[unlikely]]
// yes: err
return { errc::invalid_argument, str };
// exponent initially zero
int64_t exp = 0;
// has 'e' for exponent ?
if( !end( scn ) && (*scn == 'e' || *scn == 'E') )
// yes: scan exponent
if( auto [ec, next] = parse_int( ++scn, exp, end ); ec == errc() )
[[likely]]
// succeeded: rembember end of exponent
scn = next;
else
// failed: 'e' without actual exponent
return { ec, scn };
// number of suffix digits
size_t nSuffixes;
if( exp >= 0 ) [[likely]]
// suffix is within suffix or right from suffix
if( suffixEnd - suffixBegin - exp > 0 ) [[likely]]
// suffix is within suffix
nSuffixes = suffixEnd - suffixBegin - (ptrdiff_t)exp;
else
// there are no suffixes
nSuffixes = 0;
else
if( prefixEnd - prefixBegin + exp >= 0 ) [[likely]]
// suffix is within prefix
nSuffixes = suffixEnd - suffixBegin - (ptrdiff_t)exp;
else
// there are no prefixes, all digits are suffixes
nSuffixes = suffixEnd - suffixBegin + (prefixEnd - prefixBegin);
// have non-default initialized doubles to save CPU-time
union ndi_dbl { double d; ndi_dbl() {} };
// thread-local vector with suffixes
thread_local vector<ndi_dbl> ndiSuffixDbls;
// resize suffixes vector, capacity will stick to the maximum
ndiSuffixDbls.resize( nSuffixes );
// have range checking with suffixes on debugging
span suffixDbls( &to_address( ndiSuffixDbls.begin() )->d, &to_address(
ndiSuffixDbls.end() )->d );
// iterator after last suffix
auto suffixDblsEnd = suffixDbls.begin();
double digMul;
int64_t nextExp;
auto suffix = [&]( Iterator first, Iterator end )
{
while( first != end ) [[likely]]
{
// if we're having maximum precision calculate digMul with pow10 for
every iteration
if constexpr( Precise )
digMul = pow10( nextExp-- );
// pow10-value of digit becomes zero ?
if( !bit_cast<uint64_t>( digMul ) )
// yes: no further suffix digits to calculate
return false;
// append suffix double
*suffixDblsEnd++ = (int)(*first++ - '0') * digMul;
// if we're having less precision calculate digMul cascaded
if constexpr( !Precise )
digMul /= 10.0;
}
// further suffix digits to calculate
return true;
};
// flag that signals that is suffix beyond the suffix in prefix
bool furtherSuffix;
if( exp >= 0 ) [[likely]]
// there's no suffix in prefix
nextExp = -1,
digMul = 1.0 / 10.0,
furtherSuffix = true;
else
{
// there's suffix in prefix
Iterator suffixInPrefixBegin;
if( prefixEnd - prefixBegin + exp >= 0 )
// sufix begins within prefix
suffixInPrefixBegin = prefixEnd + (ptrdiff_t)exp,
nextExp = -1,
digMul = 1.0 / 10.0;
else
{
// suffix begins before prefix
suffixInPrefixBegin = prefixBegin;
nextExp = (ptrdiff_t)exp + (prefixEnd - prefixBegin) - 1;
if constexpr( !Precise )
digMul = pow10( nextExp );
}
furtherSuffix = suffix( suffixInPrefixBegin, prefixEnd );
}
if( furtherSuffix && exp < suffixEnd - suffixBegin )
// there's suffix in suffix
if( exp <= 0 )
// (remaining) suffix begins at suffix begin
suffix( suffixBegin, suffixEnd );
else
// suffix begins at exp in suffix
suffix( suffixBegin + (ptrdiff_t)exp, suffixEnd );
result = 0.0;
// add suffixes from the tail to the beginning
for( ; suffixDblsEnd != suffixDbls.begin(); result += *--suffixDblsEnd );
// add prefix digits from end reverse to first
auto prefix = [&]( Iterator end, Iterator first )
{
while( end != first ) [[likely]]
{
// if we're having maximum precision calculate digMul with pow10 for
every iteration
if constexpr( Precise )
digMul = pow10( nextExp++ );
// pow10-value of digit becomes infinte ?
if( (bit_cast<uint64_t>( digMul ) & EXP_MASK) == EXP_MASK ) [[unlikely]]
{
// yes: infinte result, no further suffix digits to calculate
result = numeric_limits<double>::infinity();
return false;
}
// add digit to result
result += (int)(*--end - '0') * digMul;
// if we're having less precision calculate digMul cascaded
if constexpr( !Precise )
digMul *= 10.0;
}
return true;
};
// flag that signals that prefix digits are finite so far, i.e. not Inf
bool prefixFinite = true;
if( !exp ) [[likely]]
// prefix ends at suffix
nextExp = 0,
digMul = 1.0;
else if( exp > 0 ) [[likely]]
{
// there's prefix in suffix
Iterator prefixInSuffixEnd;
if( exp <= suffixEnd - suffixBegin )
// prefix ends within suffix
prefixInSuffixEnd = suffixBegin + (ptrdiff_t)exp,
nextExp = 0,
digMul = 1.0;
else
{
// prefix ends after suffix end
prefixInSuffixEnd = suffixEnd;
nextExp = exp - (suffixEnd - suffixBegin);
if constexpr( !Precise )
digMul = pow10( nextExp );
}
prefixFinite = prefix( prefixInSuffixEnd, suffixBegin );
}
else if( exp < 0 )
{
// prefix ends before suffix
nextExp = -exp;
if constexpr( !Precise )
digMul = pow10( -exp );
}
if( prefixFinite && prefixEnd - prefixBegin + exp > 0 ) [[likely]]
// prefix has prefix
if( exp >= 0 ) [[likely]]
// there's full prefix in prefix
prefixFinite = prefix( prefixEnd, prefixBegin );
else
// remaining prefix is within prefix
prefixFinite = prefix( prefixEnd + (ptrdiff_t)exp, prefixBegin );
if( !prefixFinite ) [[unlikely]]
{
// result is Inf or NaN
// make result +/-Inf
result = bit_cast<double>( binSign | EXP_MASK );
return { errc::result_out_of_range, scn };
}
result = applySign( result );
return { errc(), scn };
}
template<bool Precise = false, std::random_access_iterator Iterator>
requires std::integral<std::iter_value_t<Iterator>>
parse_result<Iterator> parse_double( Iterator str, Iterator end, double
&result )
{
return parse_double<Precise>( str, result, [&]( Iterator it ) { return
it == end; } );
}

I saw several design/implement issues.

Refer Wy::strnum, 10 years before std::from_chars

---------------------------------------
NAME
strnum - scan and convert string to a number

SYNOPSIS
Except POD types, C structures, all types are declared in namespace Wy.

#include <Wy.stdlib.h>

template<typename ValueType>
Errno strnum(
ValueType& value,
const char* nptr,
const char*& endptr,
const int& radix=10,
NumInterpFlag cflags=Interp_C_Notation ) ;

template<typename ValueType>
Errno strnum(
ValueType& value,
const char* nptr,
const char*& endptr,
int& radix,
NumInterpFlag cflags=Interp_C_Notation ) ;

template<typename ValueType>
Errno strnum(
ValueType& value,
const char* nptr,
const int& radix=10,
NumInterpFlag cflags=Interp_C_Notation ) ;

template<typename ValueType>
Errno strnum(
ValueType& value,
const char* nptr,
int& radix,
NumInterpFlag cflags=Interp_C_Notation ) ;

DESCRIPTION
Function templates strnum accepts two kinds of number strings, one is C
string (zero terminated), another is specified by range pointer [be‐
gin,end). For example:
const char nstr[]="34567";
int n;
const char *beg=nstr, *end=nstr+sizeof(nstr)-1;

strnum(n,"12345"); // C string, radix=10, format=C
strnum(n,"F2345",16,No_Flag);// C string, radix=16, format=exact

strnum(n,beg,end); // [beg,end), radix=10, format=C
strnum(n,beg,end,16,No_Flag);// [beg,end), radix=16, format=exact

strnum scan the string pointed by [nptr,endptr) as if the number is in
radix representation, and store the result into value. If endptr is
not specified, nptr is assumed pointing to a zero-terminated string. If
specified, endptr will be read and modified pointing to the last char‐
acter recognized plus one.

radix should be in the range [2-36] or 0. If Interp_C_Radix is set and
radix is 0, the radix in effect depends on the prefix of the string and
is written back if radix is a non-const reference type. If the radix
prefix is not recognized, radix 10 is assumed.

cflags is a bit-OR'd value of the enum NumInterpFlag members, control‐
ling the string-to-number conversion.

No_Flag ............ No flag is set.
All_Flag ........... All flags are set.
Skip_Leading_Space . Skip leading space characters as determined by
isspace(3wy)
Interp_Sign ........ Interpret sign symbol (i.e. '+' '-')
Interp_C_Radix ..... Interpret C radix prefix
Skip_Space_After_Sign Skip space characters as determined by
isspace(3wy). No effect if Interp_Sign is not
set.
Interp_C_Exponent .. Interpret C exponent suffix
Interp_C_NanInf .... Interpret string "nan" or "[+-]inf",
disregarding case.
Skip_Less_Significant Skip long and less significant digit characters
that would cause ERANGE otherwise.

Interp_C_Notation= Skip_Leading_Space |
Interp_Sign |
Interp_C_Radix |
Skip_Less_Significant |
Interp_C_Exponent |
Interp_C_NanInf

If ValueType is an integral type, the accepted string is in the form:

<S>::= [<b1>][+-][<b2>][]<ddd>

::= 0(xX) --> hexidecimal (radix=16)
::= 0 --> octal (radix=8)
<b1>::= <b2> --> space characters (isspace(3wy))
<ddd> --> non-empty digit string (_charnum(3wy))

<b1> is interpreted iff Skip_Leading_Space is set.
[-+] is interpreted iff Interp_Sign is set.
<b2> is interpreted iff Skip_Space_After_Sign and
Interp_Sign are set.
 is interpreted iff Interp_C_Radix is set.

Flags Skip_Less_Significant, Interp_C_Exponent, Interp_C_NanInf
have no effect for conversion of integral types.

If ValueType is a floating-point type, the accepted string is in the
form:

<S>::= [<b1>][+-][<b2>][](<S1>|<S2>|<S3>)[<Exp>]
<S1>::= <ddd>[<S3>]
<S2>::= <ddd>.
<S3>::= .<ggg>
<Exp>::= (eEpP)[+-]<kkk>

::= 0(xX) --> hexidecimal (radix=16)
<b1>::= <b2> --> space characters (isspace(3wy))
<ddd>::=<ggg>::=<kkk> --> non-empty digit string (_charnum(3wy))

<Exp> is interpreted iff Interp_C_Exponent is set.

If Interp_C_NanInf is set, string for nan,inf is interpreted.
If Skip_Less_Significant is set, less significant digits can be
recognized and ignored.

The reset flags are the same as for integral type.

RETURN
Ok Succeed

EINVAL radix is not in range [2-36], or is 0 while
Interp_C_Radix is not set.

ENOENT No leading sub-string can be converted to a number

EBADMSG An unrecognized character is encountered

ERANGE Result would be unrepresentable or the interpretation
gives up because of long string input.

EFAULT nptr is NULL.

The function template tries to recognize as many characters as possi‐
ble.

When returning Ok or EBADMSG, value is the converted value of the re‐
turned string [nptr,endptr).

SPECIALIZATION
The following lists types that are specialized by this library (type
char is not implemented, cast it to signed char or unsigned char):

signed char, signed short, signed int, signed long, signed long long
unsigned char, unsigned short, unsigned int, unsigned long,
unsigned long long, VLInt<T>.

Supported radix is in range [0,2-36].
For all replies, integral value is the accurate conversion of the re‐
turned string [nptr,endptr) if endptr is specified and not indicating
an empty range.

float, double, long double

Supported radix is in range [0,2-36].
If the string indicates a big number, value may be set to
[-]huge_value<ValueType>() (equal to HUGE_VAL, HUGE_VALF or HUGE_VALL).

[Implementation Note] Conversion accuracy may be limited due to using
C/C++ math functions to compose the result.

Wy.Timespec

Supported radix is [0,2,4,8,10,16,32].
String representation in radix 10 can be accurately converted.

SEE ALSO
Wy Wy.Timespec Wy.String Wy.ByteFlow Wy.VLInt

NOTE
Project is in development. https://sourceforge.net/projects/cscall

Bonita Montero

2024-04-11 06:57:44 UTC

Permalink

Post by wij
Refer Wy::strnum, 10 years before std::from_chars
---------------------------------------
NAME
strnum - scan and convert string to a number
SYNOPSIS
Except POD types, C structures, all types are declared in namespace Wy.
#include <Wy.stdlib.h>
template<typename ValueType>
Errno strnum(
ValueType& value,
const char* nptr,
const char*& endptr,
const int& radix=10,
NumInterpFlag cflags=Interp_C_Notation ) ;
template<typename ValueType>
Errno strnum(
ValueType& value,
const char* nptr,
const char*& endptr,
int& radix,
NumInterpFlag cflags=Interp_C_Notation ) ;
template<typename ValueType>
Errno strnum(
ValueType& value,
const char* nptr,
const int& radix=10,
NumInterpFlag cflags=Interp_C_Notation ) ;
template<typename ValueType>
Errno strnum(
ValueType& value,
const char* nptr,
int& radix,
NumInterpFlag cflags=Interp_C_Notation ) ;

Do these function also have my performance and my precision ?

wij

2024-04-11 10:54:20 UTC

Permalink

Post by Bonita Montero

Post by wij
Refer Wy::strnum, 10 years before std::from_chars
---------------------------------------
NAME
 strnum - scan and convert string to a number
SYNOPSIS
 Except POD types, C structures, all types are declared in namespace Wy.
 #include <Wy.stdlib.h>
 template<typename ValueType>
 Errno strnum(
 ValueType& value,
 const char* nptr,
 const char*& endptr,
 const int& radix=10,
 NumInterpFlag cflags=Interp_C_Notation ) ;
 template<typename ValueType>
 Errno strnum(
 ValueType& value,
 const char* nptr,
 const char*& endptr,
 int& radix,
 NumInterpFlag cflags=Interp_C_Notation ) ;
 template<typename ValueType>
 Errno strnum(
 ValueType& value,
 const char* nptr,
 const int& radix=10,
 NumInterpFlag cflags=Interp_C_Notation ) ;
 template<typename ValueType>
 Errno strnum(
 ValueType& value,
 const char* nptr,
 int& radix,
 NumInterpFlag cflags=Interp_C_Notation ) ;

Do these function also have my performance and my precision ?

'My performance and my precision' should be less optimal than libc or libc++,
but should not be worse than yours. But these are implement problems, not the
main issue of text-number conversion functions. (I don't have cases that 2x
slower (for example) can really impact the performance of any of my applicat