Discussion:
Serializing bit field structures
(too old to reply)
(2b|!2b)==?
2008-10-21 09:18:39 UTC
Permalink
I have a struct declared as follows:

struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as int)
unsigned int lst : 16; //2 bytes
unsigned int lsv : 16; //2 bytes
unsigned int x1 : 24; //3 bytes (float value represented as int)
unsigned int x2 : 24; //3 bytes (float value represented as int)
unsigned int x3 : 24; //3 bytes (float value represented as int)
unsigned int x4 : 24; //3 bytes (float value represented as int)
unsigned int bv : 16; //2 bytes
unsigned int ak : 24; //3 bytes (float value represented as int)
unsigned int av : 16; //2 bytes
unsigned int cv : 24; //3 bytes
};

I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...

I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
Thomas J. Gritzan
2008-10-21 15:22:07 UTC
Permalink
Post by (2b|!2b)==?
struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as
int)
unsigned int lst : 16; //2 bytes
unsigned int lsv : 16; //2 bytes
unsigned int x1 : 24; //3 bytes (float value represented as int)
unsigned int x2 : 24; //3 bytes (float value represented as int)
unsigned int x3 : 24; //3 bytes (float value represented as int)
unsigned int x4 : 24; //3 bytes (float value represented as int)
unsigned int bv : 16; //2 bytes
unsigned int ak : 24; //3 bytes (float value represented as int)
unsigned int av : 16; //2 bytes
unsigned int cv : 24; //3 bytes
};
I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...
Huh?
Post by (2b|!2b)==?
I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
In what endianness do you want to store it?

Let's assume 8 bit bytes, unsigned int at least sizeof(3), and you want
to output in network byte order (big endian).

Here's a quick'n'dirty solution just to show you the main idea:

// helper functions
void put8(std::ostream& out, unsigned int val)
{
assert(val <= 0xFF);
out.put(val);
}
void put16(std::ostream& out, unsigned int val)
{
assert(val <= 0xFFFF);
put8(out, val >> 8);
put8(out, val & 0xFF);
}
void put24(std::ostream& out, unsigned int val)
{
assert(val <= 0xFFFFFF);
put8(out, val >> 16);
put16(out, val & 0xFFFF);
}

// could be an operator<<, too.
void serialize(std::ostream& out, const RecordType1& data)
{
put24(out, data.dt);
put16(out, data.ts);
put24(out, data.lsp);
// and so on...
}

To read them back, you would read two or three bytes, left shift the
high bytes and binary-OR them together.

If you don't want to use ostream/istream, you would have to track the
current position in the array (in the put functions). An output iterator
might be an elegant solution.
--
Thomas
(2b|!2b)==?
2008-10-22 10:56:23 UTC
Permalink
Post by Thomas J. Gritzan
Post by (2b|!2b)==?
struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as
int)
unsigned int lst : 16; //2 bytes
unsigned int lsv : 16; //2 bytes
unsigned int x1 : 24; //3 bytes (float value represented as int)
unsigned int x2 : 24; //3 bytes (float value represented as int)
unsigned int x3 : 24; //3 bytes (float value represented as int)
unsigned int x4 : 24; //3 bytes (float value represented as int)
unsigned int bv : 16; //2 bytes
unsigned int ak : 24; //3 bytes (float value represented as int)
unsigned int av : 16; //2 bytes
unsigned int cv : 24; //3 bytes
};
I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...
Huh?
Post by (2b|!2b)==?
I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
In what endianness do you want to store it?
Let's assume 8 bit bytes, unsigned int at least sizeof(3), and you want
to output in network byte order (big endian).
// helper functions
void put8(std::ostream& out, unsigned int val)
{
assert(val <= 0xFF);
out.put(val);
}
void put16(std::ostream& out, unsigned int val)
{
assert(val <= 0xFFFF);
put8(out, val >> 8);
put8(out, val & 0xFF);
}
void put24(std::ostream& out, unsigned int val)
{
assert(val <= 0xFFFFFF);
put8(out, val >> 16);
put16(out, val & 0xFFFF);
}
// could be an operator<<, too.
void serialize(std::ostream& out, const RecordType1& data)
{
put24(out, data.dt);
put16(out, data.ts);
put24(out, data.lsp);
// and so on...
}
This is (almost) *EXACTLY* what I want to do. Thank you, thank you,
thank you. I have relaxed my endianness requirements - and will now be
using network byte order - since this covers all the machines I envisage
running this on.

I do have a more complicated struct which packs several fields in 2
bytes (please see fields flag, xmo and dxp below):

struct RecordType5 : public DbRecord
{
unsigned int dt : 24;
unsigned int ts : 16;
unsigned int stl : 24;
unsigned int lsp : 24;
unsigned int lst : 16;
unsigned int lsv : 16;
unsigned int bd : 24;
unsigned int bv : 16;
unsigned int ak : 24;
unsigned int av : 16;
unsigned int cv : 24;
unsigned int lvl : 24;
unsigned int strk : 24;
unsigned int flag: 1;
unsigned int xmo : 4;
unsigned int dxp : 11;
unsigned int its : 24;
unsigned int tb : 24;
unsigned int ta : 24;
unsigned int dl : 24;
unsigned int gm : 24;
unsigned int vg : 24;
unsigned int ro : 24;
unsigned int iv : 24;
};

I suppose I will need additional helper functions put1(), put4() and
put11(). Since these functions "straddle" 2 bytes, I am not sure how to
implement them, but I'd like to use similar putXbits() helper functions
as they are very elegant, simple and "do what it says on the tin" .
Could you please show how put1(), put4() and put11() could be written?
Post by Thomas J. Gritzan
To read them back, you would read two or three bytes, left shift the
high bytes and binary-OR them together.
If you don't want to use ostream/istream, you would have to track the
current position in the array (in the put functions). An output iterator
might be an elegant solution.
Yes. I want to "stack the bits" (i.e. serialize the bit field struct) to
a char* (char array or byte string). Once I have the bits stacked.packed
into a byte array (since I know the number of bytes that have been used
up by the bits, it means I know the size of the memory block. Armed with
a memory block (char array) and its size, it means I can use memcpy,
memmov etc to my hearts content. Once I can do that, I can do the rest.

(I must admit that I dont know too much about C++ streams). Maybe there
is a way to direct bytes from an ostream to a char array? - or maybe its
better to serialize directly to a char array?
Thomas J. Gritzan
2008-10-22 16:45:45 UTC
Permalink
Post by (2b|!2b)==?
Post by Thomas J. Gritzan
Post by (2b|!2b)==?
I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
In what endianness do you want to store it?
Let's assume 8 bit bytes, unsigned int at least sizeof(3), and you want
to output in network byte order (big endian).
[...]
Post by (2b|!2b)==?
Post by Thomas J. Gritzan
void put16(std::ostream& out, unsigned int val)
{
assert(val <= 0xFFFF);
put8(out, val >> 8);
put8(out, val & 0xFF);
}
[...]
Post by (2b|!2b)==?
This is (almost) *EXACTLY* what I want to do. Thank you, thank you,
thank you. I have relaxed my endianness requirements - and will now be
using network byte order - since this covers all the machines I envisage
running this on.
This code doesn't care what byte order your machine uses. It just stores
the data as big-endian.
Post by (2b|!2b)==?
I do have a more complicated struct which packs several fields in 2
struct RecordType5 : public DbRecord
{
[... more ...]
Post by (2b|!2b)==?
unsigned int flag: 1;
unsigned int xmo : 4;
unsigned int dxp : 11;
[...]
Post by (2b|!2b)==?
};
I suppose I will need additional helper functions put1(), put4() and
put11(). Since these functions "straddle" 2 bytes, I am not sure how to
implement them, but I'd like to use similar putXbits() helper functions
as they are very elegant, simple and "do what it says on the tin" .
Could you please show how put1(), put4() and put11() could be written?
In this case, you would have to cache fractions of a byte. That would
make the functions more complicated.

An easy way would be a function to handle all three variables at once,
like this:

void put1_4_11(unsigned flag, unsigned int4, unsigned int11)
{
put16( (flag << 15) | (int4 << 11) | int11 );
}
Post by (2b|!2b)==?
Post by Thomas J. Gritzan
If you don't want to use ostream/istream, you would have to track the
current position in the array (in the put functions). An output iterator
might be an elegant solution.
Yes. I want to "stack the bits" (i.e. serialize the bit field struct) to
a char* (char array or byte string). Once I have the bits stacked.packed
into a byte array (since I know the number of bytes that have been used
up by the bits, it means I know the size of the memory block. Armed with
a memory block (char array) and its size, it means I can use memcpy,
memmov etc to my hearts content. Once I can do that, I can do the rest.
(I must admit that I dont know too much about C++ streams). Maybe there
is a way to direct bytes from an ostream to a char array? - or maybe its
better to serialize directly to a char array?
There is. You could use an ostringstream (which derives from ostream),
then call its str() member function to get a std::string from it.

But in this case it might be better to use a std::vector<unsigned char>,
since you will end up with an array and not with a string.
Change every ostream& parameter (in my example code) to
std::vector<unsigned char>& and modify put8 to call push_back on it:

void put8(std::vector<unsigned char>& out, unsigned int val)
{
assert(val <= 0xFF);
out.push_back(val);
}

Then call serialize with an empty vector:

std::vector<unsigned char> buffer;
serialize(buffer, yourRecord);

The size is in buffer.size() and the data can be accessed by &buffer[0].

Ideally, this should be encapsulated into a class with the put*
functions as members.

For learning more about the C++ standard library (streams, std::vector
etc.) there is a book recommended multiple times in this group:
Accelerated C++, by A. Koenig and B. Moo.
I didn't read it myself, but I gave it to my brother for another try to
learn C++.
--
Thomas
diamondback
2008-10-21 17:04:30 UTC
Permalink
Post by (2b|!2b)==?
struct RecordType1
{
        unsigned int dt : 24;           //3 bytes
        unsigned int ts : 16;           //2 bytes
        unsigned int lsp : 24;          //3 bytes (float value represented as int)
        unsigned int lst : 16;          //2 bytes
        unsigned int lsv : 16;          //2 bytes
        unsigned int x1 : 24;           //3 bytes (float value represented as int)
        unsigned int x2 : 24;           //3 bytes (float value represented as int)
        unsigned int x3 : 24;           //3 bytes (float value represented as int)
        unsigned int x4 : 24;           //3 bytes (float value represented as int)
        unsigned int bv : 16;           //2 bytes
        unsigned int ak : 24;           //3 bytes (float value represented as int)
        unsigned int av : 16;           //2 bytes
        unsigned int cv : 24;           //3 bytes
};
I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...
I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
First of all, there is no way to get around the endian-ness issue. Any
client that reads this data needs to know what order the bytes are
arriving in. There is simply no way around it. The bytes arrive
serialized, "one-at-a-time" if you will.

But, I'll get to that in a moment. A quick and dirty way of dealing
with serialization is a trick with unions. So:

union RecSerializer
{
RecordType1 record;
unsigned char stream[sizeof(RecordType1)];
};

Now, record and stream both occupy the same memory, so the data can be
accessed via either member, depending on what you are doing. So, you
load the memory using the structure (record):

RecSerializer m_rs;
m_rs.record.dt = 1;
m_rs.record.ts = 2;
m_rs.record.lsp = 3;
...

Then you send it using the byte array (stream):

<networkConnection>.send( m_rs.stream, sizeof(RecordType1) );

Reading and de-serializing is simply a reverse of the sending process.

However, this does not take into account cross platform endianess
issues. Like I said above, this is the language barrier that confronts
anyone who does cross-platform network communication. You must deal
with it. Sorry. Luckily, you have some choices on how to do this:

The easiest(?) way is to just insist that everyone play nice and use
the same endianness. If you can accomplish this, please run for
President. I will vote for you...twice. Otherwise, you need to agree
to disagree and standardize on something. Luckily, the Internet
protocols use big-endian byte order and the POSIX byte order
functions htons, htonl, ntohs, and ntohl can be used for marshalling
and demarshalling data. These are platform independent functions that
reorder the bytes in standard data to conform to the Internet byte
order and back. All clients on your network must agree to conform to
the standard, obviously. However, these functions work on standard 2
or 4 byte boundaries only. So, these will not work for you in your
current design. My initial reaction, not knowing the details of your
system, would be to question if you absolutely must use bit-fields in
the structure? Processing would be easier, and potentially faster, if
you stuck with standard byte boundaries. But, I will assume you have
considered this and I will proceed under the assumption that the odd
byte boundaries are required.

A clever method of dealing with byte order could be to take a cue from
Unicode encoded files and include a Byte Order Mark (BOM) as the first
two bytes of the message. The BOM would have a value that could not be
accidentally inverted. Something simple like 0xFFEE, for example,
would work fine. With the BOM in place, you simply serialize and send
the message, ignoring byte order. However, the receiving client would
de-serialize and read the first two bytes. If the bytes are in the
expected order (0xFFEE), the de-serialization can continue with no
further processing. But, if the BOM is read backwards (0xEEFF), the
client knows that the message was sent with a different endianness and
must be further processed to extract the data.

So, your options are:
1) Get everyone to agree on endianness (and bring world peace)
2) Change your data definition to facilitate the use of POSIX byte
order conversion.
3) Use a "BOM" (or some other order marker) in your data definition.

I hope that helps. If not, I hope someone else has a better idea.
James Kanze
2008-10-22 08:32:48 UTC
Permalink
Post by diamondback
Post by (2b|!2b)==?
struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as int)
unsigned int lst : 16; //2 bytes
unsigned int lsv : 16; //2 bytes
unsigned int x1 : 24; //3 bytes (float value represented as int)
unsigned int x2 : 24; //3 bytes (float value represented as int)
unsigned int x3 : 24; //3 bytes (float value represented as int)
unsigned int x4 : 24; //3 bytes (float value represented as int)
unsigned int bv : 16; //2 bytes
unsigned int ak : 24; //3 bytes (float value represented as int)
unsigned int av : 16; //2 bytes
unsigned int cv : 24; //3 bytes
};
I need to serialize this struct by packing the bits into a
contiguous byte array, and then read it back from the byte
array. I cant use memcpy/sizeof because of boundary
alignment ...
I'd appreciate if anyone can show me how to do this. Ieally,
I would like to this in a cross platform (i.e. "ENDIAN-ness"
agnostic) way.
First of all, there is no way to get around the endian-ness
issue. Any client that reads this data needs to know what
order the bytes are arriving in. There is simply no way around
it. The bytes arrive serialized, "one-at-a-time" if you will.
More generally, he really has to define a serialization format,
period. Of course, for unsigned, endianness is about the only
issue. And he's done part of the work already, since he's
defined how to represent floats, except for the endianness.
Post by diamondback
But, I'll get to that in a moment. A quick and dirty way of dealing
union RecSerializer
{
RecordType1 record;
unsigned char stream[sizeof(RecordType1)];
};
Now, record and stream both occupy the same memory, so the
data can be accessed via either member, depending on what you
are doing.
Read access can only access the last member written; otherwise,
you have undefined behavior. Formally, a compiler is allowed to
arrange for some sort of secondary store to remember the last
field written, and check it when reading. I think that there
was once a compiler which did this, but it's certainly not
frequent. And of course, reading a record when you stored
random data through stream could result in a core dump or the
equivalent on some architectures (Unisys MCP, for example).
Post by diamondback
RecSerializer m_rs;
m_rs.record.dt = 1;
m_rs.record.ts = 2;
m_rs.record.lsp = 3;
...
<networkConnection>.send( m_rs.stream, sizeof(RecordType1) );
Reading and de-serializing is simply a reverse of the sending
process.
All of which is undefined behavior, and can in practice generate
a core dump on some less common architectures.
Post by diamondback
However, this does not take into account cross platform
endianess issues. Like I said above, this is the language
barrier that confronts anyone who does cross-platform network
communication. You must deal with it. Sorry. Luckily, you have
The easiest(?) way is to just insist that everyone play nice
and use the same endianness. If you can accomplish this,
please run for President. I will vote for you...twice.
Otherwise, you need to agree to disagree and standardize on
something. Luckily, the Internet protocols use big-endian byte
order and the POSIX byte order functions htons, htonl, ntohs,
and ntohl can be used for marshalling and demarshalling data.
These are platform independent functions[...]
They're not portable, and they aren't really meaningful for some
(many) platforms, since they consider that there can only be two
possible byte orders (there are 24 possible orderings for 4
bytes, and I've seen at least three in actual practice), and
they ignore all other representation issues (and possibly
alignment issues).

Repeat after me: endianness is just the tip of the iceberg. The
htonxxx and ntohxxx functions are just hacks, designed as a
quick work-around in order to communicate between two fixed
architectures, and are not generally useful (except perhaps when
addressing the system API---a system dependent context).

Given his description of the floating point format in another
thread, I would imagine something like:

oxxxstream&
oxxxstream::operator<<(
float value )
{
assert( value >= 0.0 && value < 8 ) ;
int exp ;
int mant
= frexp( value, &exp ) * (1 << 21) ;
std::streambuf* sb = rdbuf() ;
sb->sputc( (exp << 5) | (mant >> 16) ) ;
sb->sputc( (mant >> 8) & 0xFF ) ;
sb->sputc( mant & 0xFF ) ;
}

(This code lacks any error handling; you need to verify the
return value of sb->sputc, and set badbit in the stream if it is
EOF. And not do any further output if the stream has failed. I
generally use a special class for this, which maintains a
reference to the stream and the pointer to the streambuf, and
has a single put function:

void
GuardedOutput::put( unsigned char ch )
{
if ( myStream && myStreambuf->sputc( ch ) == EOF ) {
myStream.setstate( std::ios::badbit ) ;
}
}

Also, I normally avoid bitwise operators on signed types. In
this case, however, the types are partially conditioned by the
signature of frexp, and the precondition checks guarantees that
I'll never get a negative value with the operations I do, so the
signed int behaves exactly like an unsigned int.)

--
James Kanze (GABI Software) email:***@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
(2b|!2b)==?
2008-10-22 10:34:19 UTC
Permalink
Post by diamondback
Post by (2b|!2b)==?
struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as int)
unsigned int lst : 16; //2 bytes
unsigned int lsv : 16; //2 bytes
unsigned int x1 : 24; //3 bytes (float value represented as int)
unsigned int x2 : 24; //3 bytes (float value represented as int)
unsigned int x3 : 24; //3 bytes (float value represented as int)
unsigned int x4 : 24; //3 bytes (float value represented as int)
unsigned int bv : 16; //2 bytes
unsigned int ak : 24; //3 bytes (float value represented as int)
unsigned int av : 16; //2 bytes
unsigned int cv : 24; //3 bytes
};
I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...
I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
First of all, there is no way to get around the endian-ness issue. Any
client that reads this data needs to know what order the bytes are
arriving in. There is simply no way around it. The bytes arrive
serialized, "one-at-a-time" if you will.
But, I'll get to that in a moment. A quick and dirty way of dealing
union RecSerializer
{
RecordType1 record;
unsigned char stream[sizeof(RecordType1)];
};
This would have been an elegant solution, but unfortunately, it defeats
(to some extent), the purpose of the exercise - which is to reduce the
footprint of the data when stored in a database. For example, a 1 bit
field (declared as unsigned int bitflag:1) would still occupy one byte.
What I want to do, is to 'stack' the bits in the stucture, into a
contiguous byte array. So that the space (in bytes) occupied by the
serialized bit field structure is the "same" (i.e.up to the nearest
byte) as the byte array. Maybe I should have titled this post:
"Serializing/deserializing a bit field struct to/from a byte array
because that is in essence, what it is that I am really trying to do.
Post by diamondback
Now, record and stream both occupy the same memory, so the data can be
accessed via either member, depending on what you are doing. So, you
RecSerializer m_rs;
m_rs.record.dt = 1;
m_rs.record.ts = 2;
m_rs.record.lsp = 3;
...
<networkConnection>.send( m_rs.stream, sizeof(RecordType1) );
Reading and de-serializing is simply a reverse of the sending process.
However, this does not take into account cross platform endianess
issues. Like I said above, this is the language barrier that confronts
anyone who does cross-platform network communication. You must deal
For the sake of simplicity, I will relax the requirements of being
"endian agnostic". I will simply use network byte ordering. That will
cover the vast majority of platforms I anticpate running this on anyway.
Post by diamondback
The easiest(?) way is to just insist that everyone play nice and use
the same endianness. If you can accomplish this, please run for
President. I will vote for you...twice. Otherwise, you need to agree
to disagree and standardize on something. Luckily, the Internet
protocols use big-endian byte order and the POSIX byte order
functions htons, htonl, ntohs, and ntohl can be used for marshalling
and demarshalling data. These are platform independent functions that
reorder the bytes in standard data to conform to the Internet byte
order and back. All clients on your network must agree to conform to
the standard, obviously. However, these functions work on standard 2
or 4 byte boundaries only. So, these will not work for you in your
current design. My initial reaction, not knowing the details of your
system, would be to question if you absolutely must use bit-fields in
the structure? Processing would be easier, and potentially faster, if
you stuck with standard byte boundaries. But, I will assume you have
considered this and I will proceed under the assumption that the odd
byte boundaries are required.
A clever method of dealing with byte order could be to take a cue from
Unicode encoded files and include a Byte Order Mark (BOM) as the first
two bytes of the message. The BOM would have a value that could not be
accidentally inverted. Something simple like 0xFFEE, for example,
would work fine. With the BOM in place, you simply serialize and send
the message, ignoring byte order. However, the receiving client would
de-serialize and read the first two bytes. If the bytes are in the
expected order (0xFFEE), the de-serialization can continue with no
further processing. But, if the BOM is read backwards (0xEEFF), the
client knows that the message was sent with a different endianness and
must be further processed to extract the data.
1) Get everyone to agree on endianness (and bring world peace)
2) Change your data definition to facilitate the use of POSIX byte
order conversion.
3) Use a "BOM" (or some other order marker) in your data definition.
I hope that helps. If not, I hope someone else has a better idea.
Nick Keighley
2008-10-23 08:18:39 UTC
Permalink
Post by (2b|!2b)==?
Post by diamondback
Post by (2b|!2b)==?
struct RecordType1
{
        unsigned int dt : 24;           //3 bytes
        unsigned int ts : 16;           //2 bytes
<snip>
Post by (2b|!2b)==?
Post by diamondback
Post by (2b|!2b)==?
};
I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...
I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
First of all, there is no way to get around the endian-ness issue. Any
client that reads this data needs to know what order the bytes are
arriving in. There is simply no way around it. The bytes arrive
serialized, "one-at-a-time" if you will.
But, I'll get to that in a moment. A quick and dirty way of dealing
union RecSerializer
{
   RecordType1 record;
   unsigned char stream[sizeof(RecordType1)];
};
This would have been an elegant solution, but unfortunately, it defeats
(to some extent), the purpose of the exercise - which is to reduce the
footprint of the data when stored in a database. For example, a 1 bit
field (declared as unsigned int bitflag:1) would still occupy one byte.
What I want to do, is to 'stack' the bits in the stucture, into a
contiguous byte array. So that the space (in bytes) occupied by the
serialized bit field structure is the "same" (i.e.up to the nearest
"Serializing/deserializing a bit field struct to/from a byte array
because that is in essence, what it is that I am really trying to do.
this is an entirly sensible thing to do. I just don't think
bit fields are the way to do it. Write functions to compress
uncompress your bit oriented data into structures.

say a byte (assume bytes are 8-bits) looks like this

field size offset
flag 1 7
thingy 3 4
nib 4 0

struct Record
{
unsigned char nib;
unsigned char thingy;
unsigned char flag;
};

/* stuff a record struct into a byte */
void pack_record (unsigned char packed, const Record* rec);

/* unpack a byte into a record */
void unpack_record (Record* rec unsigned char packed);

This is all very C-ish and only works on a byte. The pack
and unpack functions could be member functions of Record.
They could operate on arrays (or vectors) of bytes.

The code is tedious to write but if you had lots
of it you could generate the code from descriptions
like the above (or maybe someone could do something
really hairy with meta template programming).

<snip>
Post by (2b|!2b)==?
For the sake of simplicity, I will relax the requirements of being
"endian agnostic". I will simply use network byte ordering. That will
cover the vast majority of platforms I anticpate running this on anyway.
I don't think this buys you much. Getting the endianess right
isn't that hard.


<snip>

--
Nick Keighley

James Kanze
2008-10-22 08:03:25 UTC
Permalink
Post by (2b|!2b)==?
struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as int)
unsigned int lst : 16; //2 bytes
unsigned int lsv : 16; //2 bytes
unsigned int x1 : 24; //3 bytes (float value represented as int)
unsigned int x2 : 24; //3 bytes (float value represented as int)
unsigned int x3 : 24; //3 bytes (float value represented as int)
unsigned int x4 : 24; //3 bytes (float value represented as int)
unsigned int bv : 16; //2 bytes
unsigned int ak : 24; //3 bytes (float value represented as int)
unsigned int av : 16; //2 bytes
unsigned int cv : 24; //3 bytes
};
Note that on a 32 bit machine, the only effect your bit fields
are likely to have here is to slow things down, since generally,
the compiler won't allocate a bit field in a way that would
cross a 32 bit boundary. lst and lsv will be put into a single
word, but that's about it, and you could get that by declaring
them as unsigned short. If you're really concerned about memory
use, you probably need to declare each field to be an array of
unsigned char of the correct size, and use memcpy for copying in
and out. Otherwise, just drop the bit fields---they don't buy
you anything. (Here---if you had 8 or ten in a row, of just a few
bits, they could make a difference. But there's rarely any
sense in having bit fields larger than 8 bits.)
Post by (2b|!2b)==?
I need to serialize this struct by packing the bits into a
contiguous byte array, and then read it back from the byte
array. I cant use memcpy/sizeof because of boundary alignment
...
I'd appreciate if anyone can show me how to do this. Ieally, I
would like to this in a cross platform (i.e. "ENDIAN-ness"
agnostic) way.
The first thing you'll have to do is define the format you want
for the serialized data. Once you've done that, you need to
process each field separately. If we assume that you have 16
bit unsigned integral values, and 24 bit custom floating point,
kept internally in an unsigned int, and that you decide to use
the standard network byte order (for unsigned values, byte order
is the only concern, and you've alread specified a floating
point format which maps it to an unsigned value), then something
like:

void
putIntValue( std::ostream& dest, unsigned value )
{
dest << ((value >> 8) & 0xFF)
<< ((value ) & 0xFF) ;
}

void
putFloatValue( std::ostream& dest, unsigned value )
{
dest << ((value >> 16) & 0xFF)
<< ((value >> 8) & 0xFF)
<< ((value ) & 0xFF) ;
}

would do the trick; even cleaner would be to define your own
stream types for this (inheriting from std::ios, but not from
std::istream or std::ostream), with << and >> operators for the
basic types your concerned with (with the conversions between
float and your representation also taking place in the << and >>
operator).

--
James Kanze (GABI Software) email:***@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
(2b|!2b)==?
2008-10-22 10:22:48 UTC
Permalink
Post by James Kanze
Post by (2b|!2b)==?
struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as int)
unsigned int lst : 16; //2 bytes
unsigned int lsv : 16; //2 bytes
unsigned int x1 : 24; //3 bytes (float value represented as int)
unsigned int x2 : 24; //3 bytes (float value represented as int)
unsigned int x3 : 24; //3 bytes (float value represented as int)
unsigned int x4 : 24; //3 bytes (float value represented as int)
unsigned int bv : 16; //2 bytes
unsigned int ak : 24; //3 bytes (float value represented as int)
unsigned int av : 16; //2 bytes
unsigned int cv : 24; //3 bytes
};
Note that on a 32 bit machine, the only effect your bit fields
are likely to have here is to slow things down, since generally,
the compiler won't allocate a bit field in a way that would
cross a 32 bit boundary. lst and lsv will be put into a single
word, but that's about it, and you could get that by declaring
them as unsigned short. If you're really concerned about memory
use, you probably need to declare each field to be an array of
unsigned char of the correct size, and use memcpy for copying in
and out. Otherwise, just drop the bit fields---they don't buy
you anything. (Here---if you had 8 or ten in a row, of just a few
bits, they could make a difference. But there's rarely any
sense in having bit fields larger than 8 bits.)
Ah, but its not memory use that I'm concerned with. Its disk space, The
structs are formats for a database I am writing. I am receiving an
additional 150Mb of data each day into the database, and using bit
fields to pack the data offers approx a 35-40% reduction in the storage
space required
Post by James Kanze
Post by (2b|!2b)==?
I need to serialize this struct by packing the bits into a
contiguous byte array, and then read it back from the byte
array. I cant use memcpy/sizeof because of boundary alignment
...
I'd appreciate if anyone can show me how to do this. Ieally, I
would like to this in a cross platform (i.e. "ENDIAN-ness"
agnostic) way.
The first thing you'll have to do is define the format you want
for the serialized data. Once you've done that, you need to
process each field separately. If we assume that you have 16
bit unsigned integral values, and 24 bit custom floating point,
kept internally in an unsigned int, and that you decide to use
the standard network byte order (for unsigned values, byte order
is the only concern, and you've alread specified a floating
point format which maps it to an unsigned value), then something
void
putIntValue( std::ostream& dest, unsigned value )
{
dest << ((value >> 8) & 0xFF)
<< ((value ) & 0xFF) ;
}
void
putFloatValue( std::ostream& dest, unsigned value )
{
dest << ((value >> 16) & 0xFF)
<< ((value >> 8) & 0xFF)
<< ((value ) & 0xFF) ;
}
Good idea to use network byte order (thanks). The functions above will
do the trick and are a good starting point...
Post by James Kanze
would do the trick; even cleaner would be to define your own
stream types for this (inheriting from std::ios, but not from
std::istream or std::ostream), with << and >> operators for the
basic types your concerned with (with the conversions between
float and your representation also taking place in the << and >>
operator).
Now this would really be cool - alas, my C++ knowledge comes short (I
have avoided streama as much as possible in the past bcos I never really
understood them). Can you recommend a good book? (or maybe provide
boiler plate code I could expand on)?

What I really want to do is this:

1). Write a serialize() function that will return a char* (a char array
or byte string), which contains the bits sequentially packed into a byte
string.
2). Write a deserialize() function that will accept a bytestring (char*)
of previously serialized bytes, and read bits sequentially (in reverse
order) and use that to populate the record

Since the size of the structure is fixed (I know how many bytes it would
take to hold the bits in the struct. Once I have serialized the bit
structure to a char array, I can use memcpy etc to copy the memory block
around etc.
Post by James Kanze
--
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
.rhavin grobert
2008-10-22 16:31:34 UTC
Permalink
[...] since generally the compiler won't allocate a bit field
in a way that would cross a 32 bit boundary. [...] But there's
rarely any sense in having bit fields larger than 8 bits.)
shure? consider:

typedef unsigned __int64 QUAD;

#pragma pack (push, 1)
struct foo {
union {
QUAD qData;
struct {
QUAD nFirstNibble : 4;
QUAD nSecondNibble : 4;
QUAD nThirdNibble : 4;
QUAD nBloodyRest : 54;
};
};
};
#pragma pack (pop)
Nick Keighley
2008-10-23 08:06:42 UTC
Permalink
Post by .rhavin grobert
[...] since generally the compiler won't allocate a bit field
in a way that would cross a 32 bit boundary. [...] But there's
rarely any sense in having bit fields larger than 8 bits.)
typedef unsigned __int64 QUAD;
eek! I suppose since you've hidden __int64 behind
a typedef you can live with that
Post by .rhavin grobert
#pragma pack (push, 1)
eek! ** 2

the guy want's *portable* stuff! How many platforms
does this work on?
Post by .rhavin grobert
struct foo {
  union {
    QUAD qData;
    struct {
      QUAD nFirstNibble  :  4;
      QUAD nSecondNibble :  4;
      QUAD nThirdNibble  :  4;
      QUAD nBloodyRest   : 54;
    };
  };};
#pragma pack (pop)
--
Nick Keighley
Nick Keighley
2008-10-22 10:12:37 UTC
Permalink
Post by (2b|!2b)==?
struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as
<snip>

(all examples given are 16 or 24 bits wide)
Post by (2b|!2b)==?
};
I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...
I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
1. you can't write this in an endian agnostic manner
"but that's the worse thing that could possible happen!".
As other have said, decide on an endianness and write platform
specific code to read/write the data in its correct endianess.

2. bitfields are even less portable than the above implies.
"but it's worse than that!"


I'd check the standard but I believe almost nothing
can be assumed about bitfield alignment or padding.
I'm not sure even order is guaranteed.

K&R section 6.9 (and I doubt C++ has changed anything) has this to
say:

"Almost everything about [bit] fields is implementation-dependent.
Whether a field may overlap a word boudary is [ID]. [...] Fields are
assigned left to right on some machines and right to left on others.
This means that although fields are useful for maintaining
internally-defined data structures, the question as to which end
comes first has to be carefully considered when picking apart
externally-defined data; [...]"

(typos and layout mangling in the above are my fault)


--
Nick Keighley

People who love sausages, respect the law,
and work with IT standards
shouldn't watch any of them being made.
Thomas J. Gritzan
2008-10-22 16:07:55 UTC
Permalink
Post by Nick Keighley
Post by (2b|!2b)==?
struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as
<snip>
(all examples given are 16 or 24 bits wide)
Post by (2b|!2b)==?
};
I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...
I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
1. you can't write this in an endian agnostic manner
"but that's the worse thing that could possible happen!".
As other have said, decide on an endianness and write platform
specific code to read/write the data in its correct endianess.
What makes you think so?

You have to decide on an endianness your data is stored in, of course,
but you don't have to care for your platform's byte order at all.

unsigned int val = /* some 2-byte value */;

unsigned char high = (val >> 8);
unsigned char low = val & 0xFF;

The more significant bits are in high, the other in low. You can store
them in big-endian (high, low) or little-endian (low, high) order,
depending on what data format you decided on, but your platform only has
to have 8-bit-bytes.
Post by Nick Keighley
2. bitfields are even less portable than the above implies.
"but it's worse than that!"
[...]

That is another reason why you shouldn't store a memory dump of the
struct, but rather format the values in some specific format.

The in-memory representation of data structures is platform specific,
but you don't care, because the compiler handles them. You only have to
agree on an on-disk representation, so that another platform's programs
can use the data, and your programs are resistent against compiler changes.
--
Thomas
Continue reading on narkive:
Loading...