how to convert 3 byte to float

Mario M. Mueller · Dec 8, 2007

Hi,

I have a binary file containing 3 byte float values (big endian). How can I
read them into python? The struct module does not work, since it expects 4
byte floats.

Any hints?

Mario

John Machin · Dec 8, 2007

Hi,

I have a binary file containing 3 byte float values (big endian). How can I
read them into python? The struct module does not work, since it expects 4
byte floats.

Any hints?

Mario

What does a three-byte float look like? To write an unpack routine
from scratch, one would need to know position & size of the mantissa
and exponent, position of sign bit, how infinities, NaN, -0 etc are
represented (if at all) ...

Bjoern Schliessmann · Dec 8, 2007

Mario said:
I have a binary file containing 3 byte float values (big endian).
How can I read them into python? The struct module does not work,
since it expects 4 byte floats.

Since the module crystalball is still in development, you'll have to
analyze your three byte float format and convert it either to a
IEEE 754 "single" float and use struct, or convert manually using
bitwise operators.

BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

Regards,

Björn

Mario M. Mueller · Dec 8, 2007

John said:
What does a three-byte float look like? To write an unpack routine
from scratch, one would need to know position & size of the mantissa
and exponent, position of sign bit, how infinities, NaN, -0 etc are
represented (if at all) ...

Unfortunatly I don't know anything in detail about these floats (yet). Any
documentation about it seems to be lost...

Maybe I can can get some
working C code in the next days.

Mario

Mario M. Mueller · Dec 8, 2007

Bjoern Schliessmann wrote:

[...]

BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

It's output of a digitizer (but not that old). I was also wondering about
the reason for this limitation (maybe the design is ~20 years old).

Mario

Bjoern Schliessmann · Dec 8, 2007

Mario said:
It's output of a digitizer (but not that old). I was also
wondering about the reason for this limitation (maybe the design
is ~20 years old).

Uh, that's weird. Since Python cannot guess its format, you'll have
to try it out. Why don't you try to let the device output
well-known values and write a python script to display them
bitwise? With some luck you can reverse engineer the format.

Regards,

Björn

Tommy Nordgren · Dec 8, 2007

Bjoern Schliessmann wrote:

[...]

BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

Click to expand...

It's output of a digitizer (but not that old). I was also wondering
about
the reason for this limitation (maybe the design is ~20 years old).

Mario
--

One thing to consider: It is possible that one of the bytes
contributes bits
to BOTH the mantissa and the exponent ; Do you know the relative
accurazy of the digitizer?

Tommy Nordgren · Dec 8, 2007

Uh, that's weird. Since Python cannot guess its format, you'll have
to try it out. Why don't you try to let the device output
well-known values and write a python script to display them
bitwise? With some luck you can reverse engineer the format.

Regards,

Björn

--
BOFH excuse #12:

dry joints on cable plug

It will probably require a high quality voltage standard to do this.
There are special high precission
voltage standard chips to do this. For producing other voltages than
the standard voltage of the chip,
use pulsewith modulation with a low pass filter.

Mario M. Mueller · Dec 8, 2007

Tommy Nordgren wrote:
[...]

One thing to consider: It is possible that one of the bytes
contributes bits to BOTH the mantissa and the exponent ;

From todays point of view I cannot exclude this.

Do you know the relative
accurazy of the digitizer?

Not yet. It's seismic data, that implies:

- values will be positive and negative
- value range should cover several orders of magnitude

Mario

jdsahr · Dec 8, 2007

Tommy Nordgren wrote:

[...]

One thing to consider: It is possible that one of the bytes
contributes bits to BOTH the mantissa and the exponent ;

Click to expand...

From todays point of view I cannot exclude this.

Do you know the relative
accurazy of the digitizer?

Click to expand...

Not yet. It's seismic data, that implies:

- values will be positive and negative
- value range should cover several orders of magnitude

Mario

What a strange thread.

However, I have had experience with a computer that had 3-byte words.

The Harris "H" series running VULCAN and VOS (yup) had three byte
words.

As I recall, the single-precision floats used two words, and ignored
two of the bytes. The double precision floats used all six bytes.
There was also a 12 byte quad precision (which is sort of
impressive).

However, ... are you *sure* that the digitizer was floating point?
There are a few "floating point" ADCs out there today, but not very
many, and I'd be amazed if there was a 20 year old one that was part
of a seismic instrument. A 12 bit ADC has over 60 dB of dynamic range
(in power) even with int encoding.

Hendrik van Rooyen · Dec 9, 2007

Tommy Nordgren said:
Bjoern Schliessmann wrote:

[...]

BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

Click to expand...

It's output of a digitizer (but not that old). I was also wondering
about
the reason for this limitation (maybe the design is ~20 years old).

Mario
--

Click to expand...

One thing to consider: It is possible that one of the bytes
contributes bits
to BOTH the mantissa and the exponent ; Do you know the relative
accurazy of the digitizer?

What is it digitising - if its an Analogue to Digital converter, then the 24
bits
may not be floats at all, but simple integer counts.

Is there no Fine Manual documenting the output format?

- Hendrik

Tim Roberts · Dec 9, 2007

Mario M. Mueller said:
Bjoern Schliessmann wrote:

[...]

BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

Click to expand...

It's output of a digitizer (but not that old). I was also wondering about
the reason for this limitation (maybe the design is ~20 years old).

The IEEE-754 standard was adopted in 1985. Before that (and after that,
too), many people used whatever bit layout they wanted for floating point
numbers.

Mario M. Mueller · Dec 9, 2007

Hendrik van Rooyen schrieb:

[...]

What is it digitising - if its an Analogue to Digital converter, then the
24 bits may not be floats at all, but simple integer counts.

Personally I would expect simple counts (since other seismic formats don't
even think of using floats because most digitizers deliver counts). But I
was told that there are floats inside.

But if I assume counts I get some reasonable numbers out of the file.

import struct

def convert(sample):
s0 = ord(sample[0])
s1 = ord(sample[1])
s2 = ord(sample[2])

sign = (s0 >> 7) & 1

if sign:
s = struct.unpack('>i','%c%c%c%c' % (s0,s1,chr(0xFF),s2))[0]
else:
s = struct.unpack('>i','%c%c%c%c' % (s0,s1,chr(0x00),s2))[0]
return s

f=open('test.bin', 'rb')
data=f.read()
f.close()

data_len = len(data)
sample_count = data_len/3

samples = []
for i in range(0,data_len,3):
samples.append(data[i:i+3])

for sample in samples:
print convert(sample)

But I'm experiencing some strange jumps in the data (seismic data is mostly
quite smooth at 40 Hz sampling rate). I think I did some mistake in the
byte order...

I uploaded a short sample data file under
http://www.FastShare.org/download/test.bin - maybe one can give me another
hint... In a full data example max value is 1179760 (in case one looks only
at the eye-cathing "65535"+- values).

Is there no Fine Manual documenting the output format?

No, that's the challenge.

Mario

PS: It seems that we are going straightly to off-topic, but whereto switch?

marek.rocki · Dec 9, 2007

Mario M. Mueller napisa³(a):

Personally I would expect simple counts (since other seismic formats don't
even think of using floats because most digitizers deliver counts). But I
was told that there are floats inside.

But if I assume counts I get some reasonable numbers out of the file.

I looked at the data and it seems to be 24-bit big-endian integers. I
even plotted it and the graph looks quite reasonable (though I have no
expertise in seismic data at all).

But I'm experiencing some strange jumps in the data (seismic data is mostly
quite smooth at 40 Hz sampling rate). I think I did some mistake in the
byte order...

Probably. In your code sample, when you pad it to 32-bits, why are you
inserting every third byte, instead of the most significant one? Maybe
the following will work:

if sign:
s = struct.unpack('>i','%c%c%c%c' % (chr(0xFF),s0,s1,s2))[0]
else:
s = struct.unpack('>i','%c%c%c%c' % (chr(0x00),s0,s1,s2))[0]

Regards,
Marek

John Machin · Dec 9, 2007

Mario M. Mueller napisa³(a):

I looked at the data and it seems to be 24-bit big-endian integers.

Agreed. The first byte is only 0x00 or 0xFF. Where the first byte is
0x00, the 2nd byte is only 0 or 1. Where the first byte is 0xFF, the
second byte is only 0xFE or 0xFF. The 3rd byte is more-or-less
uniformly distributed. In other words, I see no indications that the
data is other than 24-bit big-endian twos-complement integers. The
actual range of the sample needs only 10 bits.

*If* it is floating point, it's either unnormalised or a weird format
or both.

I even plotted it and the graph looks quite reasonable (though I have no
expertise in seismic data at all).

Same here.

But I'm experiencing some strange jumps in the data (seismic data is mostly
quite smooth at 40 Hz sampling rate). I think I did some mistake in the
byte order...

Click to expand...

Probably. In your code sample, when you pad it to 32-bits, why are you
inserting every third byte, instead of the most significant one? Maybe
the following will work:

if sign:
s = struct.unpack('>i','%c%c%c%c' % (chr(0xFF),s0,s1,s2))[0]
else:
s = struct.unpack('>i','%c%c%c%c' % (chr(0x00),s0,s1,s2))[0]

s/Probably/Definitely/
s/Maybe/Definitely/

Cheers,
John

Hendrik van Rooyen · Dec 10, 2007

I uploaded a short sample data file under
http://www.FastShare.org/download/test.bin - maybe one can give me another
hint... In a full data example max value is 1179760 (in case one looks only
at the eye-cathing "65535"+- values).

I clicked on the link and got nothing but rubbish trying to predict how old
I would get, so I gave up and am flying blind.

Some A to D's are not two's complement, but have strange formats with an
independent sign bit in the highest order.

And of course there is big and little endian - so there are something like
2x2 = 4 things to try - twos compl. big and little, and signed big and little.

Unless there is a protocol interfering - how do you know, in the byte stream,
where a value starts and stops - is it cr,lf delimited?

- Hendrik

Mario M. Mueller · Dec 10, 2007

Probably. In your code sample, when you pad it to 32-bits, why are you
inserting every third byte, instead of the most significant one?

Hm, I'm not that good when we go the the very basics. A good friend of mine
(Robert, now you don't need to go on reading this thread) helped me a lot,
but finally I got a bit confused.

Maybe the following will work:

if sign:
s = struct.unpack('>i','%c%c%c%c' % (chr(0xFF),s0,s1,s2))[0]
else:
s = struct.unpack('>i','%c%c%c%c' % (chr(0x00),s0,s1,s2))[0]

Perfect! Today I got a bit of c code - the results are identical.

Thanks to all of you,
Mario

Mario M. Mueller · Dec 10, 2007

Hendrik said:
I clicked on the link and got nothing but rubbish trying to predict how
old I would get, so I gave up and am flying blind.

Sorry, it's a one click hoster. I cannot tell anything about this rubbish -
I use Adblock Plus.

Some A to D's are not two's complement, but have strange formats with an
independent sign bit in the highest order.

And of course there is big and little endian - so there are something like
2x2 = 4 things to try - twos compl. big and little, and signed big and
little.

Unless there is a protocol interfering - how do you know, in the byte
stream, where a value starts and stops - is it cr,lf delimited?

I have files containing only the data. Marek's fix to my code solves the
challenge.

Mario

convert binary to float	4	Jun 1, 2008
Extracting 3-byte integers	4	Jun 27, 2006
Significant digits in a float?	65	Apr 28, 2014
how to convert from network to host byte order	4	Mar 5, 2009
a.index(float('nan')) fails	25	Oct 26, 2012
reading binary file into memory. Converting from char to uint32,float, double, ASCII strings etc (st	37	Oct 15, 2011
How to convert int/float etc into byte[]?	6	Aug 24, 2007
How to use Densenet121 in monai	0	Feb 16, 2024

how to convert 3 byte to float

Mario M. Mueller

John Machin

Bjoern Schliessmann

Mario M. Mueller

Mario M. Mueller

Bjoern Schliessmann

Tommy Nordgren

Tommy Nordgren

Mario M. Mueller

jdsahr

Hendrik van Rooyen

Tim Roberts

Mario M. Mueller

marek.rocki

John Machin

Hendrik van Rooyen

Mario M. Mueller

Mario M. Mueller

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads