Post by Ian CollinsPost by Alf P. SteinbachFor Visual C++, or more generally Microsoft's runtime library, i/o to
the console is immediate, and is flushed one character at a time. That
plays havoc with UTF-8 encoding for output. Which is not as bad as it
sounds, because Windows consoles, down at the API level, do not support
UTF-8 input, so it would be one-way i/o anyway, only output. In typical
Microsoft fashion this is fixed by having some extended functionality
that sets up special mode for the C and C++ standard streams, via their
_setmode function. And /that/ again is not as bad as it sounds, because
even in Unix-land one has to do some configuration to make the wide
streams work, as a minimum calling setlocale( "", LC_ALL ), so, to sum
up, console i/o doesn't work in C or C++ by default. Well, except ASCII.
Not necessarily.
#include <iostream>
int main()
{
const uint32_t value = 0xa9;
const char lower = 0x80|(value&0x3f);
const char upper = 0xc0|(value >> 6);
std::cout << upper << lower << std::endl;
}
©
This is a nice example of what I wrote (so I don't understand the "not
necessarily"), to wit, it avoids using wide streams and therefore
doesn't work in Windows, i.e. it's not portable:
<example>
C:\my\forums\clc++\012>g++ foo.cpp
foo.cpp: In function 'int main()':
foo.cpp:7:38: warning: overflow in implicit constant conversion [-Woverflow
const char lower = 0x80|(value&0x3f);
^
foo.cpp:8:38: warning: overflow in implicit constant conversion [-Woverflow
const char upper = 0xc0|(value >> 6);
^
C:\my\forums\clc++\012>chcp 65001 & a
Active code page: 65001
��
C:\my\forums\clc++\012>_
</example>
The reason for the /two/ odd result characters is that each byte is sent
separately to the console, which therefore interprets each byte separately.
If the program had used wide streams it /could/ have worked in Windows
by adding a translation unit with proper system-specific wide stream
configuration, which in Unix-land would be a call to setlocale.
Here's a program, based on yours, that illustrates the problem for byte
level streams in Windows. To understand it it's necessary to know that
the version of g++ I use, use Microsoft's runtime library, or else an
emulation of that runtime library's quirks and faults. That also has
other problems, but the main problem here is the lack of buffering:
<file "oops.cpp">
#include <stdio.h>
#include <windows.h>
auto main() -> int
{
unsigned const value = 0xA9;
const char utf8_bytes[] = { char( 0xC0|(value >> 6) ), char(
0x80|(value&0x3f) ) };
int const n_bytes = sizeof( utf8_bytes );
// Sending separate bytes to the console:
fwrite( utf8_bytes, 1, n_bytes, stdout );
printf( "\n" );
// Sending the all bytes of an UTF-8 character together:
HANDLE const winout = GetStdHandle( STD_OUTPUT_HANDLE );
WriteFile( winout, utf8_bytes, n_bytes, nullptr, nullptr );
printf( "\n" );
}
</file>
<example>
C:\my\forums\clc++\012>g++ oops.cpp
C:\my\forums\clc++\012>chcp 1252 & a
Active code page: 1252
©
©
C:\my\forums\clc++\012>chcp 65001 & a
Active code page: 65001
��
©
C:\my\forums\clc++\012> _
</example>
Cheers!,
- Alf