how to convert 3 byte to float

  • Thread starter Mario M. Mueller
  • Start date
M

Mario M. Mueller

Hi,

I have a binary file containing 3 byte float values (big endian). How can I
read them into python? The struct module does not work, since it expects 4
byte floats.

Any hints?

Mario
 
J

John Machin

Hi,

I have a binary file containing 3 byte float values (big endian). How can I
read them into python? The struct module does not work, since it expects 4
byte floats.

Any hints?

Mario

What does a three-byte float look like? To write an unpack routine
from scratch, one would need to know position & size of the mantissa
and exponent, position of sign bit, how infinities, NaN, -0 etc are
represented (if at all) ...
 
B

Bjoern Schliessmann

Mario said:
I have a binary file containing 3 byte float values (big endian).
How can I read them into python? The struct module does not work,
since it expects 4 byte floats.

Since the module crystalball is still in development, you'll have to
analyze your three byte float format and convert it either to a
IEEE 754 "single" float and use struct, or convert manually using
bitwise operators.

BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

Regards,


Björn
 
M

Mario M. Mueller

John said:
What does a three-byte float look like? To write an unpack routine
from scratch, one would need to know position & size of the mantissa
and exponent, position of sign bit, how infinities, NaN, -0 etc are
represented (if at all) ...

Unfortunatly I don't know anything in detail about these floats (yet). Any
documentation about it seems to be lost... :( Maybe I can can get some
working C code in the next days.

Mario
 
M

Mario M. Mueller

Bjoern Schliessmann wrote:

[...]
BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

It's output of a digitizer (but not that old). I was also wondering about
the reason for this limitation (maybe the design is ~20 years old).

Mario
 
B

Bjoern Schliessmann

Mario said:
It's output of a digitizer (but not that old). I was also
wondering about the reason for this limitation (maybe the design
is ~20 years old).

Uh, that's weird. Since Python cannot guess its format, you'll have
to try it out. Why don't you try to let the device output
well-known values and write a python script to display them
bitwise? With some luck you can reverse engineer the format.

Regards,


Björn
 
T

Tommy Nordgren

Bjoern Schliessmann wrote:

[...]
BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

It's output of a digitizer (but not that old). I was also wondering
about
the reason for this limitation (maybe the design is ~20 years old).

Mario
--
One thing to consider: It is possible that one of the bytes
contributes bits
to BOTH the mantissa and the exponent ; Do you know the relative
accurazy of the digitizer?
 
T

Tommy Nordgren

Uh, that's weird. Since Python cannot guess its format, you'll have
to try it out. Why don't you try to let the device output
well-known values and write a python script to display them
bitwise? With some luck you can reverse engineer the format.

Regards,


Björn

--
BOFH excuse #12:

dry joints on cable plug
It will probably require a high quality voltage standard to do this.
There are special high precission
voltage standard chips to do this. For producing other voltages than
the standard voltage of the chip,
use pulsewith modulation with a low pass filter.
 
M

Mario M. Mueller

Tommy Nordgren wrote:
[...]
One thing to consider: It is possible that one of the bytes
contributes bits to BOTH the mantissa and the exponent ;

From todays point of view I cannot exclude this.
Do you know the relative
accurazy of the digitizer?

Not yet. It's seismic data, that implies:

- values will be positive and negative
- value range should cover several orders of magnitude

Mario
 
J

jdsahr

Tommy Nordgren wrote:

[...]
One thing to consider: It is possible that one of the bytes
contributes bits to BOTH the mantissa and the exponent ;

From todays point of view I cannot exclude this.
Do you know the relative
accurazy of the digitizer?

Not yet. It's seismic data, that implies:

- values will be positive and negative
- value range should cover several orders of magnitude

Mario

What a strange thread.

However, I have had experience with a computer that had 3-byte words.

The Harris "H" series running VULCAN and VOS (yup) had three byte
words.

As I recall, the single-precision floats used two words, and ignored
two of the bytes. The double precision floats used all six bytes.
There was also a 12 byte quad precision (which is sort of
impressive).

However, ... are you *sure* that the digitizer was floating point?
There are a few "floating point" ADCs out there today, but not very
many, and I'd be amazed if there was a 20 year old one that was part
of a seismic instrument. A 12 bit ADC has over 60 dB of dynamic range
(in power) even with int encoding.
 
H

Hendrik van Rooyen

Tommy Nordgren said:
Bjoern Schliessmann wrote:

[...]
BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

It's output of a digitizer (but not that old). I was also wondering
about
the reason for this limitation (maybe the design is ~20 years old).

Mario
--
One thing to consider: It is possible that one of the bytes
contributes bits
to BOTH the mantissa and the exponent ; Do you know the relative
accurazy of the digitizer?

What is it digitising - if its an Analogue to Digital converter, then the 24
bits
may not be floats at all, but simple integer counts.

Is there no Fine Manual documenting the output format?

- Hendrik
 
T

Tim Roberts

Mario M. Mueller said:
Bjoern Schliessmann wrote:

[...]
BTW, who in his mind designs three byte floats? Memory isn't that
expensive anymore. Even C bool is four bytes long.

It's output of a digitizer (but not that old). I was also wondering about
the reason for this limitation (maybe the design is ~20 years old).

The IEEE-754 standard was adopted in 1985. Before that (and after that,
too), many people used whatever bit layout they wanted for floating point
numbers.
 
M

Mario M. Mueller

Hendrik van Rooyen schrieb:

[...]
What is it digitising - if its an Analogue to Digital converter, then the
24 bits may not be floats at all, but simple integer counts.

Personally I would expect simple counts (since other seismic formats don't
even think of using floats because most digitizers deliver counts). But I
was told that there are floats inside.

But if I assume counts I get some reasonable numbers out of the file.

import struct

def convert(sample):
s0 = ord(sample[0])
s1 = ord(sample[1])
s2 = ord(sample[2])

sign = (s0 >> 7) & 1

if sign:
s = struct.unpack('>i','%c%c%c%c' % (s0,s1,chr(0xFF),s2))[0]
else:
s = struct.unpack('>i','%c%c%c%c' % (s0,s1,chr(0x00),s2))[0]
return s

f=open('test.bin', 'rb')
data=f.read()
f.close()

data_len = len(data)
sample_count = data_len/3

samples = []
for i in range(0,data_len,3):
samples.append(data[i:i+3])

for sample in samples:
print convert(sample)

But I'm experiencing some strange jumps in the data (seismic data is mostly
quite smooth at 40 Hz sampling rate). I think I did some mistake in the
byte order...

I uploaded a short sample data file under
http://www.FastShare.org/download/test.bin - maybe one can give me another
hint... In a full data example max value is 1179760 (in case one looks only
at the eye-cathing "65535"+- values).
Is there no Fine Manual documenting the output format?

No, that's the challenge.

Mario

PS: It seems that we are going straightly to off-topic, but whereto switch?
 
M

marek.rocki

Mario M. Mueller napisa³(a):
Personally I would expect simple counts (since other seismic formats don't
even think of using floats because most digitizers deliver counts). But I
was told that there are floats inside.

But if I assume counts I get some reasonable numbers out of the file.

I looked at the data and it seems to be 24-bit big-endian integers. I
even plotted it and the graph looks quite reasonable (though I have no
expertise in seismic data at all).
But I'm experiencing some strange jumps in the data (seismic data is mostly
quite smooth at 40 Hz sampling rate). I think I did some mistake in the
byte order...

Probably. In your code sample, when you pad it to 32-bits, why are you
inserting every third byte, instead of the most significant one? Maybe
the following will work:

if sign:
s = struct.unpack('>i','%c%c%c%c' % (chr(0xFF),s0,s1,s2))[0]
else:
s = struct.unpack('>i','%c%c%c%c' % (chr(0x00),s0,s1,s2))[0]

Regards,
Marek
 
J

John Machin

Mario M. Mueller napisa³(a):



I looked at the data and it seems to be 24-bit big-endian integers.

Agreed. The first byte is only 0x00 or 0xFF. Where the first byte is
0x00, the 2nd byte is only 0 or 1. Where the first byte is 0xFF, the
second byte is only 0xFE or 0xFF. The 3rd byte is more-or-less
uniformly distributed. In other words, I see no indications that the
data is other than 24-bit big-endian twos-complement integers. The
actual range of the sample needs only 10 bits.

*If* it is floating point, it's either unnormalised or a weird format
or both.
I even plotted it and the graph looks quite reasonable (though I have no
expertise in seismic data at all).

Same here.
But I'm experiencing some strange jumps in the data (seismic data is mostly
quite smooth at 40 Hz sampling rate). I think I did some mistake in the
byte order...

Probably. In your code sample, when you pad it to 32-bits, why are you
inserting every third byte, instead of the most significant one? Maybe
the following will work:

if sign:
s = struct.unpack('>i','%c%c%c%c' % (chr(0xFF),s0,s1,s2))[0]
else:
s = struct.unpack('>i','%c%c%c%c' % (chr(0x00),s0,s1,s2))[0]

s/Probably/Definitely/
s/Maybe/Definitely/
:)
Cheers,
John
 
H

Hendrik van Rooyen

I uploaded a short sample data file under
http://www.FastShare.org/download/test.bin - maybe one can give me another
hint... In a full data example max value is 1179760 (in case one looks only
at the eye-cathing "65535"+- values).

I clicked on the link and got nothing but rubbish trying to predict how old
I would get, so I gave up and am flying blind.

Some A to D's are not two's complement, but have strange formats with an
independent sign bit in the highest order.

And of course there is big and little endian - so there are something like
2x2 = 4 things to try - twos compl. big and little, and signed big and little.

Unless there is a protocol interfering - how do you know, in the byte stream,
where a value starts and stops - is it cr,lf delimited?

- Hendrik
 
M

Mario M. Mueller

Probably. In your code sample, when you pad it to 32-bits, why are you
inserting every third byte, instead of the most significant one?

Hm, I'm not that good when we go the the very basics. A good friend of mine
(Robert, now you don't need to go on reading this thread) helped me a lot,
but finally I got a bit confused. ;)
Maybe the following will work:

if sign:
s = struct.unpack('>i','%c%c%c%c' % (chr(0xFF),s0,s1,s2))[0]
else:
s = struct.unpack('>i','%c%c%c%c' % (chr(0x00),s0,s1,s2))[0]

Perfect! Today I got a bit of c code - the results are identical. :)

Thanks to all of you,
Mario
 
M

Mario M. Mueller

Hendrik said:
I clicked on the link and got nothing but rubbish trying to predict how
old I would get, so I gave up and am flying blind.

Sorry, it's a one click hoster. I cannot tell anything about this rubbish -
I use Adblock Plus. :)
Some A to D's are not two's complement, but have strange formats with an
independent sign bit in the highest order.

And of course there is big and little endian - so there are something like
2x2 = 4 things to try - twos compl. big and little, and signed big and
little.

Unless there is a protocol interfering - how do you know, in the byte
stream, where a value starts and stops - is it cr,lf delimited?

I have files containing only the data. Marek's fix to my code solves the
challenge.

Mario
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top