Bonita Montero
2024-05-04 12:16:47 UTC
With a vector a reverse iterator is derived from the normal iterator
and accesses with an offset of -1. With today's CPUs this isn't slower.
I wondered if the performance is still the same with ordered maps or
sets, where this would mean a real penalty since offsetting is done
twice, once for the access and once for iteration to the next element.
So I wrote a small benchmark for that:
#include <iostream>
#include <set>
#include <chrono>
#include <atomic>
using namespace std;
using namespace chrono;
atomic_int aSum;
int main()
{
set<int> st;
for( int i = 0; i != 100; ++i )
st.emplace( i );
auto timeLoop = []( char const *what, auto fn )
{
auto start = high_resolution_clock::now();
int sum = 0;
for( size_t i = 1'000'000; i; --i )
fn( sum );
cout << what << duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 100e6 << endl;
::aSum.store( sum, memory_order_relaxed );
};
timeLoop( "forward: ",
[&]( int &sum )
{
for( auto it = st.begin(); it != st.end(); ++it )
sum += *it;
} );
timeLoop( "reverse: ",
[&]( int &sum )
{
for( auto it = st.rbegin(); it != st.rend(); ++it )
sum += *it;
} );
timeLoop( "manual reverse: ",
[&]( int &sum )
{
for( auto it = st.end(); it != st.begin(); )
sum += *--it;
} );
}
This is the result with MSVC++ 18 on a 5.7GHz Zen4-CPU.
forward: 1.70247
reverse: 2.05701
manual reverse: 1.77813
The first result is from the bidirectional iterator, the second from the
reverse-iterator and the third from a bidirectional iterator iterating
reverse.
So iterate reverse only if necessary.
and accesses with an offset of -1. With today's CPUs this isn't slower.
I wondered if the performance is still the same with ordered maps or
sets, where this would mean a real penalty since offsetting is done
twice, once for the access and once for iteration to the next element.
So I wrote a small benchmark for that:
#include <iostream>
#include <set>
#include <chrono>
#include <atomic>
using namespace std;
using namespace chrono;
atomic_int aSum;
int main()
{
set<int> st;
for( int i = 0; i != 100; ++i )
st.emplace( i );
auto timeLoop = []( char const *what, auto fn )
{
auto start = high_resolution_clock::now();
int sum = 0;
for( size_t i = 1'000'000; i; --i )
fn( sum );
cout << what << duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 100e6 << endl;
::aSum.store( sum, memory_order_relaxed );
};
timeLoop( "forward: ",
[&]( int &sum )
{
for( auto it = st.begin(); it != st.end(); ++it )
sum += *it;
} );
timeLoop( "reverse: ",
[&]( int &sum )
{
for( auto it = st.rbegin(); it != st.rend(); ++it )
sum += *it;
} );
timeLoop( "manual reverse: ",
[&]( int &sum )
{
for( auto it = st.end(); it != st.begin(); )
sum += *--it;
} );
}
This is the result with MSVC++ 18 on a 5.7GHz Zen4-CPU.
forward: 1.70247
reverse: 2.05701
manual reverse: 1.77813
The first result is from the bidirectional iterator, the second from the
reverse-iterator and the third from a bidirectional iterator iterating
reverse.
So iterate reverse only if necessary.