NEWBIE: ishexdigit revisited

E

engsolnom

After looking at the suggestions for a ishexdigit method, (thanks
again), I decided on the following, partly because I don't have to
import string, and I believe it's pretty readable by Python newbies,
which we (including myself) have at work:

def ishexdigit(sx):
ix = 0
for cx in sx:
ix += 1
if not cx in '0123456789abcdefABCDEF': return 0
if ix % 2 == 0: return 1
else: return 'Extra nibble'

# Try it out:

sx = '0123abcDEF' # 5 bytes
print ishexdigit(sx)
if ishexdigit(sx):
print 'All hex'
else:
print 'Not hex'

sx = 'S123abcDEF' # The 'S' is not hex
print ishexdigit(sx)
if ishexdigit(sx):
print 'All hex'
else:
print 'Not hex'

sx = '123abcDEF' # 4 bytes plus a nibble
print ishexdigit(sx)
if ishexdigit(sx):
print 'All hex'
else:
print 'Not hex'

Results:

1
All hex

0
Not hex

Extra nibble
All hex

Notice that the user is warned (if he/she cares to be), that the
string isn't on byte boundries.

Norm
 
J

Jeff Epler

The only suggestion I would make is to skip calculating ix and instead
just use len(sx). The length of a string is already stored, so it
doesn't require Python to count the number of chars again (unlike C's
strlen()).

Well, actually, I'll make two suggestions: If it's an error in some
cases to have an extra nibble, I'd use a 'raise' statement instead of a
different return value, something like:

def ishexdigit(sx, silent=True):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return False
if silent or len(sx) % 2 == 0: return True
raise ValueError, "Extra nibble in '%s'" % sx

Now, ishexdigit('0') or ishexdigit('0', True) will return 1,
and ishexdigit('0', False) will cause an exception.

Jeff
 
P

Paul Rubin

After looking at the suggestions for a ishexdigit method, (thanks
again), I decided on the following, partly because I don't have to
import string, and I believe it's pretty readable by Python newbies,
which we (including myself) have at work:

def ishexdigit(sx):
ix = 0
for cx in sx:
ix += 1
if not cx in '0123456789abcdefABCDEF': return 0
if ix % 2 == 0: return 1
else: return 'Extra nibble'

Some remarks:

1) I think the name is a misnomer: "ishexdigit" should test just one
digit, not a multi-digit string. This function should be called
"ishexnumber" instead.

2) I'm not sure why you return 'extra nibble' if there's an odd number
of digits. Isn't '123' a perfectly good hex number (= 291 decimal)?

3) Even if you do want to check that the length is odd, you don't
need to count the chars in the loop:

def ishexnumber(sx):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

(the final 'else' is not needed)

4) You could also use a regular expression to test for hex digits:

def ishexnumber(sx):
import re
if not re.match('[0123456789abcdefABCDEF]*$', sx): return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

5) If you want the number of bytes to be even because the hex string
is supposed to represent a character string, and you're going to
do the conversion next, there's already a library function for that:

import binascii
s = binascii.unhexlify(sx)

6) If you want it to be a number but still need the digit count to
be even for some reason, then checking for the special value
'Extra nibble' is messy. It's usually better to raise an exception
instead:

def ishexnumber(sx):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return 0
if len(sx) % 2 != 0:
raise ValueError, 'Extra nibble in hex string'
return 1

The caller then has to catch the exception, of course.

7) If you want to just check that the hex string represents an
integer, possibly the most robust way is:

def ishexnumber(sx):
try:
n = int(sx, 16)
except ValueError:
return 0
return 1

Note this will fail for hex strings are too long to fit in a short int.
 
P

Paul Rubin

def ishexnumber(sx):
import re
if not re.match('[0123456789abcdefABCDEF]*$', sx): return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'
If I run this many times, as is likely in our application, does the
'import re' chew up memory?

Importing the re module does use some memory, but it's the same amount
of memory whether you call the function once or many times. Normally
you'd put 'import re' at the top of the file that the function is
defined in, by the way, rather than inside the function, but either
way works.
Our strings will almost always be long ones, and the byte values will
range from zero to 255, but I'll tuck this nugget into the archive.

This sounds like you almost certainly want to use the binascii module
and not write your own function.
 
E

engsolnom

Some remarks:

1) I think the name is a misnomer: "ishexdigit" should test just one
digit, not a multi-digit string. This function should be called
"ishexnumber" instead.

I used 'isdigit' as a model, since it takes a string also. But
ishexnumber works for me too, except I'll usually pass long strings to
it..:)
2) I'm not sure why you return 'extra nibble' if there's an odd number
of digits. Isn't '123' a perfectly good hex number (= 291 decimal)?

3) Even if you do want to check that the length is odd, you don't
need to count the chars in the loop:

def ishexnumber(sx):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'

(the final 'else' is not needed)

Forehead slap! "if len(sx) % 2 == 0:" makes better sense...thanks!
4) You could also use a regular expression to test for hex digits:

def ishexnumber(sx):
import re
if not re.match('[0123456789abcdefABCDEF]*$', sx): return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'
If I run this many times, as is likely in our application, does the
'import re' chew up memory?
5) If you want the number of bytes to be even because the hex string
is supposed to represent a character string, and you're going to
do the conversion next, there's already a library function for that:

import binascii
s = binascii.unhexlify(sx)
6) If you want it to be a number but still need the digit count to
be even for some reason, then checking for the special value
'Extra nibble' is messy. It's usually better to raise an exception
instead:

def ishexnumber(sx):
for cx in sx:
if not cx in '0123456789abcdefABCDEF': return 0
if len(sx) % 2 != 0:
raise ValueError, 'Extra nibble in hex string'
return 1

The caller then has to catch the exception, of course.
I do need to learn more about exceptions.
7) If you want to just check that the hex string represents an
integer, possibly the most robust way is:

def ishexnumber(sx):
try:
n = int(sx, 16)
except ValueError:
return 0
return 1

Note this will fail for hex strings are too long to fit in a short int.

Our strings will almost always be long ones, and the byte values will
range from zero to 255, but I'll tuck this nugget into the archive.

Thanks for the comments.
Norm
 
K

Kirk Strauser

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2003-12-30T00:44:34Z, (e-mail address removed) writes:

4) You could also use a regular expression to test for hex digits:

def ishexnumber(sx):
import re
if not re.match('[0123456789abcdefABCDEF]*$', sx): return 0
if len(sx) % 2 == 0: return 1
return 'Extra nibble'
If I run this many times, as is likely in our application, does the
'import re' chew up memory?

For production use, I'd factor out the 're' stuff (for performance) and add
descriptive variable names (it doesn't cost anything) like so:

import re
hexpattern = re.compile(r'[0123456789abcdefABCDEF]*$')

def ishexnumber(checkstring):
if not hexpattern.match(checkstring): return 0
if not len(checkstring) % 2: return 1
return 'Extra nibble'

That way, the pattern is only compiled once.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/8aa65sRg+Y0CpvERAlINAJoDMrDiTHdxtaF8aKNBf+KYjxdD6wCgm+UR
i4ZbaJklkaOzJ9TVQH5JMz0=
=bncS
-----END PGP SIGNATURE-----
 
T

Terry Reedy

def ishexnumber(sx):
If I run this many times, as is likely in our application, does the
'import re' chew up memory?

No. Imported modules are cached in sys.modules. After the first import,
'import x' is equivalent to "x = sys.modules['x']" (or something like
this).
 
P

Paul Rubin

Kirk Strauser said:
For production use, I'd factor out the 're' stuff (for performance) and add
descriptive variable names (it doesn't cost anything) like so:

import re
hexpattern = re.compile(r'[0123456789abcdefABCDEF]*$')
...
That way, the pattern is only compiled once.

It normally only gets compiled once anyway, then cached. If you
have a really large number of different regexps I guess the cache
can overflow and recompilation happens.
 
K

Kirk Strauser

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2003-12-30T19:52:01Z said:
It normally only gets compiled once anyway, then cached. If you
have a really large number of different regexps I guess the cache
can overflow and recompilation happens.

I'm not seeing that here with 2.3.3:
.... for i in xrange(1000000):
.... re.match(r'unlikely.*pat(..(.))?tern$', 'test')
........ pat = re.compile(r'unlikely.*pat(..(.))?tern$')
.... for i in xrange(1000000):
.... pat.match('test')
....1.04278099537

Pre-compiling the pattern is a huge win on my system.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/8erD5sRg+Y0CpvERAtr8AJ43Ae2HO94T8uji3EsMRiLtRcDEEACeOhz6
Ouaa03uSgyVYkbtm2ETgZ9Q=
=rbH6
-----END PGP SIGNATURE-----
 
P

Paul Rubin

Kirk Strauser said:
... for i in xrange(1000000):
... re.match(r'unlikely.*pat(..(.))?tern$', 'test')
...
... pat = re.compile(r'unlikely.*pat(..(.))?tern$')
... for i in xrange(1000000):
... pat.match('test')
...
1.04278099537

Pre-compiling the pattern is a huge win on my system.

You're using an extreme example and seeing the cost of the cache
lookup in the first example. You're certainly not seeing the pattern
get recompiled a million times in 5 seconds. Doing would be many
times slower.
 
K

Kirk Strauser

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2003-12-30T21:45:13Z said:
You're using an extreme example and seeing the cost of the cache lookup in
the first example.

The OP may be coding in an extreme situation. Given the free performance
gain without requiring any more work, resources, or syntax, I can't think of
a good reason *not* to pre-compile any regexp used more than a few times.
You're certainly not seeing the pattern get recompiled a million times in
5 seconds. Doing would be many times slower.

Regardless of the cause, there's still a 400% increase in overhead in my
example by not pre-compiling the pattern.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/8fZB5sRg+Y0CpvERAnMHAKCcnpcvqnFwtoexLImr9RzkMe5AugCfeZf4
YsqhMvp9d2+XG8aQFphRJqA=
=WvXA
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,174
Messages
2,570,940
Members
47,484
Latest member
JackRichard

Latest Threads

Top