Human readable number formatting

A

Alex Willmer

When reporting file sizes to the user, it's nice to print '16.1 MB',
rather than '16123270 B'. This is the behaviour the command 'df -h'
implements. There's no python function that I could find to perform this
formatting , so I've taken a stab at it:

import math
def human_readable(n, suffix='B', places=2):
'''Return a human friendly approximation of n, using SI prefixes'''
prefixes = ['','k','M','G','T']
base, step, limit = 10, 3, 100

if n == 0:
magnitude = 0 #cannot take log(0)
else:
magnitude = math.log(n, base)

order = int(round(magnitude)) // step
return '%.1f %s%s' % (float(n)/base**(order*step), \
prefixes[order], suffix)

Example usage
print [human_readable(x) for x in [0, 1, 23.5, 100, 1000/3, 500,
1000000, 12.345e9]]
['0.0 B', '1.0 B', '23.5 B', '100.0 B', '0.3 kB', '0.5 kB', '1.0 MB',
'12.3 GB']

I'd hoped to generalise this to base 2 (eg human_readable(1024, base=2)
== '1 KiB' and enforcing of 3 digits at most (ie human_readable(100) ==
'0.1 KB' instead of '100 B). However I can't get the right results
adapting the above code.

Here's where I'd like to ask for your help.
Am I chasing the right target, in basing my function on log()?
Does this function already exist in some python module?
Any hints, or would anyone care to finish it off/enhance it?

With thanks

Alex
 
J

jepler

Compared to your program, I think the key to mine is to divide by "limit"
before taking the log. In this way, things below the "limit" go to the next lower integer.

I think that instead of having 'step' and 'base', there should be a single
value which would be 1000 or 1024.

import math

def MakeFormat(prefixes, step, limit, base):
def Format(n, suffix='B', places=2):
if abs(n) < limit:
if n == int(n):
return "%s %s" % (n, suffix)
else:
return "%.1f %s" % (n, suffix)
magnitude = math.log(abs(n) / limit, base) / step
magnitude = min(int(magnitude)+1, len(prefixes)-1)

return '%.1f %s%s' % (
float(n) / base ** (magnitude * step),
prefixes[magnitude], suffix)
return Format

DecimalFormat = MakeFormat(
prefixes = ['', 'k', 'M', 'G', 'T'],
step = 3,
limit = 100,
base = 10)


BinaryFormat = MakeFormat(
prefixes = ['', 'ki', 'Mi', 'Gi', 'Ti'],
step = 10,
limit = 100,
base = 2)

values = [0, 1, 23.5, 100, 1000/3, 500, 1000000, 12.345e9]
print [DecimalFormat(v) for v in values]
print [BinaryFormat(v) for v in values]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDOd84Jd01MZaTXX0RApi5AKChFiER/MmrIdYwfMMlCbhmTf/vjgCgpXsv
MhxevhDrWNnP5gomuNNCaMw=
=4jCj
-----END PGP SIGNATURE-----
 
M

Mike Meyer

Alex Willmer said:
When reporting file sizes to the user, it's nice to print '16.1 MB',
rather than '16123270 B'. This is the behaviour the command 'df -h'
implements. There's no python function that I could find to perform this
formatting , so I've taken a stab at it:

import math
def human_readable(n, suffix='B', places=2):
'''Return a human friendly approximation of n, using SI prefixes'''
prefixes = ['','k','M','G','T']
base, step, limit = 10, 3, 100

if n == 0:
magnitude = 0 #cannot take log(0)
else:
magnitude = math.log(n, base)

order = int(round(magnitude)) // step
return '%.1f %s%s' % (float(n)/base**(order*step), \
prefixes[order], suffix)

Example usage
print [human_readable(x) for x in [0, 1, 23.5, 100, 1000/3, 500,
1000000, 12.345e9]]
['0.0 B', '1.0 B', '23.5 B', '100.0 B', '0.3 kB', '0.5 kB', '1.0 MB',
'12.3 GB']

I'd hoped to generalise this to base 2 (eg human_readable(1024, base=2)
== '1 KiB' and enforcing of 3 digits at most (ie human_readable(100) ==
'0.1 KB' instead of '100 B). However I can't get the right results
adapting the above code.

Here's where I'd like to ask for your help.
Am I chasing the right target, in basing my function on log()?

I wouldn't have done it that way, but that's not worth very much. Can
you use the log() variation to change form proper scientific units
to the CS powers-of-two variation?

if not, I would do it this way:

def human_readable(n, suffix = 'B', places = 2):
prefixes = ['', 'K', 'M', 'G', 'T', 'P', 'E']

top = 10 ** places
index = 0
n = float(n)
while abs(n) > top:
n /= 10
index += 1
return '%.1f %s%s' % (n, prefixes[index], suffix)
Does this function already exist in some python module?

humanize_number is a cross-platform C library function, about 150
lines of code. It uses the loop I gave above. It might be worthwhile
to swipe the code (it's BSD-licensed), wrap it, and submit a PR to add
it to the standard library - just so you get properly tested code.

<mike
 
E

Erik Max Francis

Alex said:
When reporting file sizes to the user, it's nice to print '16.1 MB',
rather than '16123270 B'. This is the behaviour the command 'df -h'
implements. There's no python function that I could find to perform this
formatting , so I've taken a stab at it:

BOTEC at

http://www.alcyone.com/software/botec/

contains a class called SI which does this formatting (and supports all
SI prefixes).
 
J

J Correia

Alex Willmer said:
When reporting file sizes to the user, it's nice to print '16.1 MB',
rather than '16123270 B'. This is the behaviour the command 'df -h'
implements. There's no python function that I could find to perform this
formatting , so I've taken a stab at it:

import math
def human_readable(n, suffix='B', places=2):
'''Return a human friendly approximation of n, using SI prefixes'''
prefixes = ['','k','M','G','T']
base, step, limit = 10, 3, 100

if n == 0:
magnitude = 0 #cannot take log(0)
else:
magnitude = math.log(n, base)

order = int(round(magnitude)) // step
return '%.1f %s%s' % (float(n)/base**(order*step), \
prefixes[order], suffix)

Example usage
print [human_readable(x) for x in [0, 1, 23.5, 100, 1000/3, 500,
1000000, 12.345e9]]
['0.0 B', '1.0 B', '23.5 B', '100.0 B', '0.3 kB', '0.5 kB', '1.0 MB',
'12.3 GB']

I'd hoped to generalise this to base 2 (eg human_readable(1024, base=2)
== '1 KiB' and enforcing of 3 digits at most (ie human_readable(100) ==
'0.1 KB' instead of '100 B). However I can't get the right results
adapting the above code.

Here's where I'd like to ask for your help.
Am I chasing the right target, in basing my function on log()?
Does this function already exist in some python module?
Any hints, or would anyone care to finish it off/enhance it?

With thanks

Alex

This'll probably do what you want with some minor modifications.

def fmt3(num):
for x in ['','Kb','Mb','Gb','Tb']:
if num<1024:
return "%3.1f%s" % (num, x)
num /=1024
print [fmt3(x) for x in [0, 1, 23.5, 100, 1000/3, 500, 1000000, 12.345e9]]
['0.0', '1.0', '23.5', '100.0', '333.0', '500.0', '976.6Kb', '11.5Gb']

HTH.
 
M

MrJean1

Here is another function for human formatting:

<pre>

def sistr(value, prec=None, K=1024.0, k=1000.0, sign='', blank=' '):
'''
Convert value to a signed string with an SI prefix.

The 'prec' value specifies the number of fractional
digits to be included. Use 'prec=0' to omit any
fraction. If 'prec' is not specified or None, the
precision is adjusted to make the returned string 6
characters (without the sign).

The 'sign' character is used for positive values.
Negative values are always prefixed with '-'.

Uppercase 'K' is the scale factor for values above
1.0 and lowercase 'k' scales values below 1.0.

The 'blank' character is used as the SI prefix for
values between k and K, i.e. value without an SI
prefix. Set 'blank' to None, False or '' if no
alignment is required.

name symbol 10** symbol name
=================================
deca da + 1 - d deci
hecto h + 2 - c centi
- - - - - - - - - - - - - - - - -
Kilo K + 3 - m milli
Mega M + 6 - /u micro
Giga G + 9 - n nano
Tera T + 12 - p pico
Peta P + 15 - f femto
Exa E + 18 - a atto
Zetta Z + 21 - z zepto
Yotta Y + 24 - y yocto
---------------------------------
Xona X + 27 - x xonto
Weka W + 30 - w wekto
Vunda V + 33 - v vunkto
Uda U + 36 - u* unto
Treda TD* + 39 - td trekto
Sorta S + 42 - s sotro
Rinta R + 45 - r rimto
Quexa Q + 48 - q quekto
Pepta PP + 51 - pk pekro
Ocha O + 54 - o otro
Nena N + 57 - nk nekto
MInga MI + 60 - mk mikto
Luma L + 63 - l lunto

The prefixes below the line are non-sanctioned SI
and are only used until the symbols marked * to
avoid ambiguity. The symbols above the dotted
line are not used and '/u' is returned as 'u'.

See http://en.wikipedia.org/wiki/Binary_prefix or
http://www.bipm.org/en/si/prefixes.html and maybe
http://jimvb.home.mindspring.com/unitsystem.htm
'''
s, v, p = sign, float(value), None
if v < 0.0:
s, v = '-', -v
if v < K:
if v >= 1.0:
p = blank
elif k > 10.0:
for f in iter('munpfazyxwv'): # no unto, ...
v *= k # scale up
if v >= 1.0:
p = f
break
elif K > 10.0:
for f in iter('KMGTPEZYXWVU'): # no Treda, ...
v /= K # scale down
if v < K:
p = f
break
# format value
if p is None: # too large, small or invalid K, k
return "%.0e*" % value
elif prec is None:
if v < 100.0:
if v < 10.0:
prec = 3
else:
prec = 2
else:
if v < 1000.0:
prec = 1
else:
prec = 0
elif prec < 0:
prec = 0 # rounds
return "%s%0.*f%s" % (s, prec, v, p)


if __name__ == '__main__':
x = 17
while x < 1.0e18:
print sistr(x), x
x *= 17
x = 0.12
while x > 1.0e-18:
print sistr(x), x
x *= 0.12

</pre>

/Jean Brouwers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top