NEWBIE: ishexdigit revisited

engsolnom · Dec 29, 2003

After looking at the suggestions for a ishexdigit method, (thanks
again), I decided on the following, partly because I don't have to
import string, and I believe it's pretty readable by Python newbies,
which we (including myself) have at work:

def ishexdigit(sx):
ix = 0
for cx in sx:
ix += 1
if not cx in '0123456789abcdefABCDEF': return 0
if ix % 2 == 0: return 1
else: return 'Extra nibble'

# Try it out:

sx = '0123abcDEF' # 5 bytes
print ishexdigit(sx)
if ishexdigit(sx):
print 'All hex'
else:
print 'Not hex'

sx = 'S123abcDEF' # The 'S' is not hex
print ishexdigit(sx)
if ishexdigit(sx):
print 'All hex'
else:
print 'Not hex'

sx = '123abcDEF' # 4 bytes plus a nibble
print ishexdigit(sx)
if ishexdigit(sx):
print 'All hex'
else:
print 'Not hex'

Results:

1
All hex

0
Not hex

Extra nibble
All hex

Notice that the user is warned (if he/she cares to be), that the
string isn't on byte boundries.

Norm

Jeff Epler · Dec 29, 2003

The only suggestion I would make is to skip calculating ix and instead
just use len(sx). The length of a string is already stored, so it
doesn't require Python to count the number of chars again (unlike C's
strlen()).

Well, actually, I'll make two suggestions: If it's an error in some
cases to have an extra nibble, I'd use a 'raise' statement instead of a
different return value, something like:

def ishexdigit(sx, silent=True):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return False
if silent or len(sx) % 2 == 0: return True
raise ValueError, "Extra nibble in '%s'" % sx

Now, ishexdigit('0') or ishexdigit('0', True) will return 1,
and ishexdigit('0', False) will cause an exception.

Jeff

Paul Rubin · Dec 29, 2003

After looking at the suggestions for a ishexdigit method, (thanks
again), I decided on the following, partly because I don't have to
import string, and I believe it's pretty readable by Python newbies,
which we (including myself) have at work:

def ishexdigit(sx):
ix = 0
for cx in sx:
ix += 1
if not cx in '0123456789abcdefABCDEF': return 0
if ix % 2 == 0: return 1
else: return 'Extra nibble'

Some remarks:

1) I think the name is a misnomer: "ishexdigit" should test just one
digit, not a multi-digit string. This function should be called
"ishexnumber" instead.

2) I'm not sure why you return 'extra nibble' if there's an odd number
of digits. Isn't '123' a perfectly good hex number (= 291 decimal)?

3) Even if you do want to check that the length is odd, you don't
need to count the chars in the loop:

def ishexnumber(sx):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

(the final 'else' is not needed)

4) You could also use a regular expression to test for hex digits:

def ishexnumber(sx):
import re
if not re.match('[0123456789abcdefABCDEF]*$', sx): return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

5) If you want the number of bytes to be even because the hex string
is supposed to represent a character string, and you're going to
do the conversion next, there's already a library function for that:

import binascii
s = binascii.unhexlify(sx)

6) If you want it to be a number but still need the digit count to
be even for some reason, then checking for the special value
'Extra nibble' is messy. It's usually better to raise an exception
instead:

def ishexnumber(sx):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return 0
if len(sx) % 2 != 0:
raise ValueError, 'Extra nibble in hex string'
return 1

The caller then has to catch the exception, of course.

7) If you want to just check that the hex string represents an
integer, possibly the most robust way is:

def ishexnumber(sx):
try:
n = int(sx, 16)
except ValueError:
return 0
return 1

Note this will fail for hex strings are too long to fit in a short int.

Paul Rubin · Dec 30, 2003

def ishexnumber(sx):
import re
if not re.match('[0123456789abcdefABCDEF]*$', sx): return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

Click to expand...

If I run this many times, as is likely in our application, does the
'import re' chew up memory?

Importing the re module does use some memory, but it's the same amount
of memory whether you call the function once or many times. Normally
you'd put 'import re' at the top of the file that the function is
defined in, by the way, rather than inside the function, but either
way works.

Our strings will almost always be long ones, and the byte values will
range from zero to 255, but I'll tuck this nugget into the archive.

This sounds like you almost certainly want to use the binascii module
and not write your own function.

engsolnom · Dec 30, 2003

Some remarks:

1) I think the name is a misnomer: "ishexdigit" should test just one
digit, not a multi-digit string. This function should be called
"ishexnumber" instead.

I used 'isdigit' as a model, since it takes a string also. But
ishexnumber works for me too, except I'll usually pass long strings to
it..

2) I'm not sure why you return 'extra nibble' if there's an odd number
of digits. Isn't '123' a perfectly good hex number (= 291 decimal)?

3) Even if you do want to check that the length is odd, you don't
need to count the chars in the loop:

def ishexnumber(sx):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

(the final 'else' is not needed)

Forehead slap! "if len(sx) % 2 == 0:" makes better sense...thanks!

4) You could also use a regular expression to test for hex digits:

def ishexnumber(sx):
import re
if not re.match('[0123456789abcdefABCDEF]*$', sx): return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

If I run this many times, as is likely in our application, does the
'import re' chew up memory?

5) If you want the number of bytes to be even because the hex string
is supposed to represent a character string, and you're going to
do the conversion next, there's already a library function for that:

import binascii
s = binascii.unhexlify(sx)

6) If you want it to be a number but still need the digit count to
be even for some reason, then checking for the special value
'Extra nibble' is messy. It's usually better to raise an exception
instead:

def ishexnumber(sx):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return 0
if len(sx) % 2 != 0:
raise ValueError, 'Extra nibble in hex string'
return 1

The caller then has to catch the exception, of course.

I do need to learn more about exceptions.

7) If you want to just check that the hex string represents an
integer, possibly the most robust way is:

def ishexnumber(sx):
try:
n = int(sx, 16)
except ValueError:
return 0
return 1

Note this will fail for hex strings are too long to fit in a short int.

Our strings will almost always be long ones, and the byte values will
range from zero to 255, but I'll tuck this nugget into the archive.

Thanks for the comments.
Norm

Kirk Strauser · Dec 30, 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2003-12-30T00:44:34Z, (e-mail address removed) writes:

4) You could also use a regular expression to test for hex digits:

def ishexnumber(sx):
import re
if not re.match('[0123456789abcdefABCDEF]*$', sx): return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

Click to expand...

If I run this many times, as is likely in our application, does the
'import re' chew up memory?

For production use, I'd factor out the 're' stuff (for performance) and add
descriptive variable names (it doesn't cost anything) like so:

import re
hexpattern = re.compile(r'[0123456789abcdefABCDEF]*$')

def ishexnumber(checkstring):
if not hexpattern.match(checkstring): return 0
if not len(checkstring) % 2: return 1
return 'Extra nibble'

That way, the pattern is only compiled once.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/8aa65sRg+Y0CpvERAlINAJoDMrDiTHdxtaF8aKNBf+KYjxdD6wCgm+UR
i4ZbaJklkaOzJ9TVQH5JMz0=
=bncS
-----END PGP SIGNATURE-----

Terry Reedy · Dec 30, 2003

def ishexnumber(sx):

If I run this many times, as is likely in our application, does the
'import re' chew up memory?

No. Imported modules are cached in sys.modules. After the first import,
'import x' is equivalent to "x = sys.modules['x']" (or something like
this).

Paul Rubin · Dec 30, 2003

Kirk Strauser said:
For production use, I'd factor out the 're' stuff (for performance) and add
descriptive variable names (it doesn't cost anything) like so:

import re
hexpattern = re.compile(r'[0123456789abcdefABCDEF]*$')
...
That way, the pattern is only compiled once.

It normally only gets compiled once anyway, then cached. If you
have a really large number of different regexps I guess the cache
can overflow and recompilation happens.

Kirk Strauser · Dec 30, 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2003-12-30T19:52:01Z said:
It normally only gets compiled once anyway, then cached. If you
have a really large number of different regexps I guess the cache
can overflow and recompilation happens.

I'm not seeing that here with 2.3.3:
.... for i in xrange(1000000):
.... re.match(r'unlikely.*pat(..(.))?tern$', 'test')
........ pat = re.compile(r'unlikely.*pat(..(.))?tern$')
.... for i in xrange(1000000):
.... pat.match('test')
....1.04278099537

Pre-compiling the pattern is a huge win on my system.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/8erD5sRg+Y0CpvERAtr8AJ43Ae2HO94T8uji3EsMRiLtRcDEEACeOhz6
Ouaa03uSgyVYkbtm2ETgZ9Q=
=rbH6
-----END PGP SIGNATURE-----

Paul Rubin · Dec 30, 2003

Kirk Strauser said:
... for i in xrange(1000000):
... re.match(r'unlikely.*pat(..(.))?tern$', 'test')
...
... pat = re.compile(r'unlikely.*pat(..(.))?tern$')
... for i in xrange(1000000):
... pat.match('test')
...
1.04278099537

Pre-compiling the pattern is a huge win on my system.

You're using an extreme example and seeing the cost of the cache
lookup in the first example. You're certainly not seeing the pattern
get recompiled a million times in 5 seconds. Doing would be many
times slower.

Kirk Strauser · Dec 30, 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2003-12-30T21:45:13Z said:
You're using an extreme example and seeing the cost of the cache lookup in
the first example.

The OP may be coding in an extreme situation. Given the free performance
gain without requiring any more work, resources, or syntax, I can't think of
a good reason *not* to pre-compile any regexp used more than a few times.

You're certainly not seeing the pattern get recompiled a million times in
5 seconds. Doing would be many times slower.

Regardless of the cause, there's still a 400% increase in overhead in my
example by not pre-compiling the pattern.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/8fZB5sRg+Y0CpvERAnMHAKCcnpcvqnFwtoexLImr9RzkMe5AugCfeZf4
YsqhMvp9d2+XG8aQFphRJqA=
=WvXA
-----END PGP SIGNATURE-----

BITCOIN PROGRAMMING - CODE INCLUDED - needs slight modification in linux terminal - NSA please do not block	0	Nov 2, 2024
NEWBIE: What's the instance name?	22	Dec 29, 2003
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
Homework in C - Help Needed	1	Oct 16, 2024
Help with weave.blitz()	0	Dec 11, 2006
Javascript programming in TheThingsNetwork	1	May 12, 2022
Arduino Code Please Help	0	Oct 30, 2024
SRAM "Hread" problem	0	Jan 15, 2009

NEWBIE: ishexdigit revisited

engsolnom

Jeff Epler

Paul Rubin

Paul Rubin

engsolnom

Kirk Strauser

Terry Reedy

Paul Rubin

Kirk Strauser

Paul Rubin

Kirk Strauser

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads