String + number split

G

Graham Breed

Stevie_mac said:
Hello again, I can do this, but I'm sure there is a much more elegant way...

A string, with a number on the end, strip off the number & discard any underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

Hmm, well, there's always
matcher = re.compile('\d+$|_')
return matcher.sub('', s), int(matcher.findall(s)[-1])
('arialbold1', 14)

I'm scary enough that I probably would do it this way.


Graham
 
S

Stevie_mac

Hello again, I can do this, but I'm sure there is a much more elegant way...

A string, with a number on the end, strip off the number & discard any underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

Cheers
 
P

Peter Hansen

Stevie_mac said:
Hello again, I can do this, but I'm sure there is a much more elegant way...

A string, with a number on the end, strip off the number & discard any underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

It's impossible to say if it's more elegant, since you didn't
post your attempt. (Homework? I'll assume not.)
.... return tuple(re.split('(\d+)', fs.replace('_', ''))[:2])
....('arial', '14')

I wouldn't actually code it this way myself, but it meets
your requirements as stated.

-Peter
 
S

Stevie_mac

It's impossible to say if it's more elegant, since you didn't
post your attempt.
Yeh, sorry bout tha!
(Homework? I'll assume not.)
Only a student of life!

Heres my solution...

import string
def makefont(sFontName):
num = []
nam = []
acnt = 0
#reverse it
l = list(sFontName); l.reverse(); ''.join(l)
sFontName = string.join(l,'')
#now loop it while isdigit(), store number, then store alphas
for c in sFontName:
if c.isdigit() and acnt == 0:
num.append(c)
elif c.isalpha() or acnt > 1:
acnt += 1
nam.append(c)
nam.reverse()
num.reverse()
return (string.join( nam, '' ), int(string.join( num, '' )))

Now you see why i was asking for a more elegant solution!

PS, the number on the end may vary & the _ could be any non alpha char!

font12 becomes ('font',12)
arial_14 becomes ('arial',14)
arial__8 becomes ('arial',8)
times 6 becomes ('times',6)



(Homework? I'll assume not.)
... return tuple(re.split('(\d+)', fs.replace('_', ''))[:2])
...('arial', '14')

I wouldn't actually code it this way myself, but it meets
your requirements as stated.

-Peter
 
S

Stevie_mac

now thats what im talking about!
note to self, learn reg exp!

Thing is, dont understand it one bit! any chance you comment it! (tell me where to go if you want!) I'd appreciate it
if you could - ta.

Graham Breed said:
Stevie_mac said:
Hello again, I can do this, but I'm sure there is a much more elegant way...

A string, with a number on the end, strip off the number & discard any underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

Hmm, well, there's always
matcher = re.compile('\d+$|_')
return matcher.sub('', s), int(matcher.findall(s)[-1])
('arialbold1', 14)

I'm scary enough that I probably would do it this way.


Graham
 
J

Jack Diederich

Stevie_mac said:
Hello again, I can do this, but I'm sure there is a much more elegant
way...

A string, with a number on the end, strip off the number & discard any
underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

It's impossible to say if it's more elegant, since you didn't
post your attempt. (Homework? I'll assume not.)
... return tuple(re.split('(\d+)', fs.replace('_', ''))[:2])
...('arial', '14')

There are many ways you could write the regexp, this is one I find
easist to read.

def mysplit(instr):
# one or more alpha followed by zero or more non-digits followed by one or more digits
grouper = re.compile('([a-zA-Z]+)\D*(\d+)')
match_ob = grouper.match(instr)
name = match_ob.groups(1)
num = int(match_ob.groups(2))
return (name, num)
('arial', 14)

I carry arround a 'coerce' method, that looks like this
def coerce(types, inputs):
return map(lambda f,val:f(val), types, inputs)

which changes the last few lines in mysplit into
return coerce([str, int], match_ob.groups())

-jackdied
 
G

Graham Breed

Stevie_mac said:
now thats what im talking about!
note to self, learn reg exp!

Thing is, dont understand it one bit! any chance you comment it! (tell me where to go if you want!) I'd appreciate it
if you could - ta.

Yes, certainly. Well, where to go would be the standard documentation,
which you can now get from the command line:
Help on module sre:
....

but in this case, the expression is

\d+$|_

\d matches a digit
+ means match more than 1 digit
$ ensures the number is at the end of the string

| means either \d+$ or _ can match

_ is a literal underscore

All I did then is substitute any matches with an empty string, which is
the same as deleting all matches. Then (on the original string) return
the last match, which will be the number at the end, and convert it to
an integer. If the string isn't in the format you said it would be,
you'll get either an IndexError or a ValueError.

What your code was actually doing is subtly different -- assuming an
uninterrupted string of letters exists at the front of the string, and
anything that isn't alphabetic before the number starts. The other
things people are suggesting are better tailored for this problem. The
expression(s) you should use depend on the exact problem you're trying
to solve.


Graham
 
J

Joe Mason

PS, the number on the end may vary & the _ could be any non alpha char!

font12 becomes ('font',12)
arial_14 becomes ('arial',14)
arial__8 becomes ('arial',8)
times 6 becomes ('times',6)
.... r = re.compile("""
.... ([A-Za-z]+) # group 1: 1 or more letters
.... [^0-9]* # 0 or more non-digits; not a group because not in ()
.... ([0-9]+) # group 2: 1 or more numbers
.... """, re.VERBOSE)
.... m = r.match(fs)
.... if m: return (m.group(1), m.group(2))
.... else: raise ValueError, "Badly formatted font string"
.... ('times', '6')

Joe
 
W

William Park

Stevie_mac said:
Heres my solution...

import string
def makefont(sFontName):
num = []
nam = []
acnt = 0
#reverse it
l = list(sFontName); l.reverse(); ''.join(l)
sFontName = string.join(l,'')
#now loop it while isdigit(), store number, then store alphas
for c in sFontName:
if c.isdigit() and acnt == 0:
num.append(c)
elif c.isalpha() or acnt > 1:
acnt += 1
nam.append(c)
nam.reverse()
num.reverse()
return (string.join( nam, '' ), int(string.join( num, '' )))

Now you see why i was asking for a more elegant solution!

PS, the number on the end may vary & the _ could be any non alpha char!

font12 becomes ('font',12)
arial_14 becomes ('arial',14)
arial__8 becomes ('arial',8)
times 6 becomes ('times',6)

Assuming you only have 2 fields to worry about, play around with
1. re.split('[^a-z0-9]+', '...')
2. re.findall('[a-z]+|[0-9]+', '...')

Essentially, you want to pickout '[a-z]+' first and then '[0-9]+', ie.
([a-z]+)[^0-9]+([0-9]+)
 
T

Thomas stegen

Stevie_mac said:
now thats what im talking about!
note to self, learn reg exp!

While you are at it I would recommend looking at finite state
automatons since give a very simple theoretical viewpoint
at what regular expressions are capable/incapable of
doing. After that it is just syntax :) (It is also give
a good hint at how you can implement reg exes yourself.)
 
P

Paul McGuire

Stevie_mac said:
It's impossible to say if it's more elegant, since you didn't
post your attempt.
Yeh, sorry bout tha!
(Homework? I'll assume not.)
Only a student of life!

Heres my solution...

import string
def makefont(sFontName):
num = []
nam = []
acnt = 0
#reverse it
l = list(sFontName); l.reverse(); ''.join(l)
sFontName = string.join(l,'')
#now loop it while isdigit(), store number, then store alphas
for c in sFontName:
if c.isdigit() and acnt == 0:
num.append(c)
elif c.isalpha() or acnt > 1:
acnt += 1
nam.append(c)
nam.reverse()
num.reverse()
return (string.join( nam, '' ), int(string.join( num, '' )))

Now you see why i was asking for a more elegant solution!

PS, the number on the end may vary & the _ could be any non alpha char!

font12 becomes ('font',12)
arial_14 becomes ('arial',14)
arial__8 becomes ('arial',8)
times 6 becomes ('times',6)

(You might find using pyparsing a little more readable than regexp...)

-------------------------------------------
# download pyparsing at http://pyparsing.sourceforge.net
from pyparsing import Word, alphas, CharsNotIn, nums, Optional

str2int = lambda s,l,t: [int(t[0])]
fontNameSpec = ( Word(alphas).setResultsName("font") +
Optional( CharsNotIn(alphas+nums).suppress() ) +
Word(nums).setParseAction(str2int).setResultsName("size") )

testdata = [ "font12", "arial_14", "arial__8", "times 6" ]

for fname in testdata:
results = fontNameSpec.parseString( fname )
print results
for k in results.keys():
print "-", "results."+k, ":", results[k]
print

-------------------------------------------

prints:
['font', 12]
- results.font : font
- results.size : 12

['arial', 14]
- results.font : arial
- results.size : 14

['arial', 8]
- results.font : arial
- results.size : 8

['times', 6]
- results.font : times
- results.size : 6
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,184
Messages
2,570,978
Members
47,561
Latest member
gjsign

Latest Threads

Top