String + number split

Graham Breed · Apr 12, 2004

Stevie_mac said:
Hello again, I can do this, but I'm sure there is a much more elegant way...

A string, with a number on the end, strip off the number & discard any underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

Hmm, well, there's always
matcher = re.compile('\d+$|_')
return matcher.sub('', s), int(matcher.findall(s)[-1])
('arialbold1', 14)

I'm scary enough that I probably would do it this way.

Graham

Stevie_mac · Apr 12, 2004

Hello again, I can do this, but I'm sure there is a much more elegant way...

A string, with a number on the end, strip off the number & discard any underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

Cheers

Peter Hansen · Apr 12, 2004

Stevie_mac said:
Hello again, I can do this, but I'm sure there is a much more elegant way...

A string, with a number on the end, strip off the number & discard any underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

It's impossible to say if it's more elegant, since you didn't
post your attempt. (Homework? I'll assume not.)
.... return tuple(re.split('(\d+)', fs.replace('_', ''))[:2])
....('arial', '14')

I wouldn't actually code it this way myself, but it meets
your requirements as stated.

-Peter

Stevie_mac · Apr 12, 2004

It's impossible to say if it's more elegant, since you didn't

post your attempt.

Yeh, sorry bout tha!

(Homework? I'll assume not.)

Only a student of life!

Heres my solution...

import string
def makefont(sFontName):
num = []
nam = []
acnt = 0
#reverse it
l = list(sFontName); l.reverse(); ''.join(l)
sFontName = string.join(l,'')
#now loop it while isdigit(), store number, then store alphas
for c in sFontName:
if c.isdigit() and acnt == 0:
num.append(c)
elif c.isalpha() or acnt > 1:
acnt += 1
nam.append(c)
nam.reverse()
num.reverse()
return (string.join( nam, '' ), int(string.join( num, '' )))

Now you see why i was asking for a more elegant solution!

PS, the number on the end may vary & the _ could be any non alpha char!

font12 becomes ('font',12)
arial_14 becomes ('arial',14)
arial__8 becomes ('arial',8)
times 6 becomes ('times',6)

(Homework? I'll assume not.)

... return tuple(re.split('(\d+)', fs.replace('_', ''))[:2])
...('arial', '14')

I wouldn't actually code it this way myself, but it meets
your requirements as stated.

-Peter

Stevie_mac · Apr 12, 2004

now thats what im talking about!
note to self, learn reg exp!

Thing is, dont understand it one bit! any chance you comment it! (tell me where to go if you want!) I'd appreciate it
if you could - ta.

Graham Breed said:
Stevie_mac said:

Hello again, I can do this, but I'm sure there is a much more elegant way...

A string, with a number on the end, strip off the number & discard any underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

Click to expand...

Hmm, well, there's always
matcher = re.compile('\d+$|_')
return matcher.sub('', s), int(matcher.findall(s)[-1])
('arialbold1', 14)

I'm scary enough that I probably would do it this way.

Graham

Jack Diederich · Apr 12, 2004

Stevie_mac said:
Stevie_mac said:

Hello again, I can do this, but I'm sure there is a much more elegant
way...

A string, with a number on the end, strip off the number & discard any
underscores

eg...

font12 becomes ('font',12)
arial_14 becomes ('arial',14)

Click to expand...

It's impossible to say if it's more elegant, since you didn't
post your attempt. (Homework? I'll assume not.)
... return tuple(re.split('(\d+)', fs.replace('_', ''))[:2])
...('arial', '14')

There are many ways you could write the regexp, this is one I find
easist to read.

def mysplit(instr):
# one or more alpha followed by zero or more non-digits followed by one or more digits
grouper = re.compile('([a-zA-Z]+)\D*(\d+)')
match_ob = grouper.match(instr)
name = match_ob.groups(1)
num = int(match_ob.groups(2))
return (name, num)
('arial', 14)

I carry arround a 'coerce' method, that looks like this
def coerce(types, inputs):
return map(lambda f,val:f(val), types, inputs)

which changes the last few lines in mysplit into
return coerce([str, int], match_ob.groups())

-jackdied

Graham Breed · Apr 12, 2004

Stevie_mac said:
now thats what im talking about!
note to self, learn reg exp!

Thing is, dont understand it one bit! any chance you comment it! (tell me where to go if you want!) I'd appreciate it
if you could - ta.

Yes, certainly. Well, where to go would be the standard documentation,
which you can now get from the command line:
Help on module sre:
....

but in this case, the expression is

\d+$|_

\d matches a digit
+ means match more than 1 digit
$ ensures the number is at the end of the string

| means either \d+$ or _ can match

_ is a literal underscore

All I did then is substitute any matches with an empty string, which is
the same as deleting all matches. Then (on the original string) return
the last match, which will be the number at the end, and convert it to
an integer. If the string isn't in the format you said it would be,
you'll get either an IndexError or a ValueError.

What your code was actually doing is subtly different -- assuming an
uninterrupted string of letters exists at the front of the string, and
anything that isn't alphabetic before the number starts. The other
things people are suggesting are better tailored for this problem. The
expression(s) you should use depend on the exact problem you're trying
to solve.

Graham

Joe Mason · Apr 12, 2004

PS, the number on the end may vary & the _ could be any non alpha char!

font12 becomes ('font',12)
arial_14 becomes ('arial',14)
arial__8 becomes ('arial',8)
times 6 becomes ('times',6)

.... r = re.compile("""
.... ([A-Za-z]+) # group 1: 1 or more letters
.... [^0-9]* # 0 or more non-digits; not a group because not in ()
.... ([0-9]+) # group 2: 1 or more numbers
.... """, re.VERBOSE)
.... m = r.match(fs)
.... if m: return (m.group(1), m.group(2))
.... else: raise ValueError, "Badly formatted font string"
.... ('times', '6')

Joe

William Park · Apr 12, 2004

Stevie_mac said:
Heres my solution...

import string
def makefont(sFontName):
num = []
nam = []
acnt = 0
#reverse it
l = list(sFontName); l.reverse(); ''.join(l)
sFontName = string.join(l,'')
#now loop it while isdigit(), store number, then store alphas
for c in sFontName:
if c.isdigit() and acnt == 0:
num.append(c)
elif c.isalpha() or acnt > 1:
acnt += 1
nam.append(c)
nam.reverse()
num.reverse()
return (string.join( nam, '' ), int(string.join( num, '' )))

Now you see why i was asking for a more elegant solution!

PS, the number on the end may vary & the _ could be any non alpha char!

font12 becomes ('font',12)
arial_14 becomes ('arial',14)
arial__8 becomes ('arial',8)
times 6 becomes ('times',6)

Assuming you only have 2 fields to worry about, play around with
1. re.split('[^a-z0-9]+', '...')
2. re.findall('[a-z]+|[0-9]+', '...')

Essentially, you want to pickout '[a-z]+' first and then '[0-9]+', ie.
([a-z]+)[^0-9]+([0-9]+)

Thomas stegen · Apr 12, 2004

Stevie_mac said:
now thats what im talking about!
note to self, learn reg exp!

While you are at it I would recommend looking at finite state
automatons since give a very simple theoretical viewpoint
at what regular expressions are capable/incapable of
doing. After that it is just syntax

(It is also give
a good hint at how you can implement reg exes yourself.)

Stevie_mac · Apr 13, 2004

Thank you all very much.

Paul McGuire · Apr 13, 2004

Stevie_mac said:
It's impossible to say if it's more elegant, since you didn't
post your attempt.

Click to expand...

Yeh, sorry bout tha!

(Homework? I'll assume not.)

Click to expand...

Only a student of life!

Heres my solution...

import string
def makefont(sFontName):
num = []
nam = []
acnt = 0
#reverse it
l = list(sFontName); l.reverse(); ''.join(l)
sFontName = string.join(l,'')
#now loop it while isdigit(), store number, then store alphas
for c in sFontName:
if c.isdigit() and acnt == 0:
num.append(c)
elif c.isalpha() or acnt > 1:
acnt += 1
nam.append(c)
nam.reverse()
num.reverse()
return (string.join( nam, '' ), int(string.join( num, '' )))

Now you see why i was asking for a more elegant solution!

PS, the number on the end may vary & the _ could be any non alpha char!

font12 becomes ('font',12)
arial_14 becomes ('arial',14)
arial__8 becomes ('arial',8)
times 6 becomes ('times',6)

(You might find using pyparsing a little more readable than regexp...)

-------------------------------------------
# download pyparsing at http://pyparsing.sourceforge.net
from pyparsing import Word, alphas, CharsNotIn, nums, Optional

str2int = lambda s,l,t: [int(t[0])]
fontNameSpec = ( Word(alphas).setResultsName("font") +
Optional( CharsNotIn(alphas+nums).suppress() ) +
Word(nums).setParseAction(str2int).setResultsName("size") )

testdata = [ "font12", "arial_14", "arial__8", "times 6" ]

for fname in testdata:
results = fontNameSpec.parseString( fname )
print results
for k in results.keys():
print "-", "results."+k, ":", results[k]
print

-------------------------------------------

prints:
['font', 12]
- results.font : font
- results.size : 12

['arial', 14]
- results.font : arial
- results.size : 14

['arial', 8]
- results.font : arial
- results.size : 8

['times', 6]
- results.font : times
- results.size : 6

Dont work, it´s something whit the loops?	1	Jun 30, 2021
Inexplicable Compilation Error: Boost's String Split function	6	Oct 22, 2011
String#split regex \W on non-ASCII text	1	Nov 9, 2010
split a string of space separated substrings - elegant solution?	6	Jul 31, 2007
KirbyBase : replacing string exceptions	2	Nov 23, 2009
TF-IDF	2	Aug 19, 2021
A way of checking if a string contains a number	6	Dec 12, 2007
Elementary string-parsing	16	Feb 4, 2008

String + number split

Graham Breed

Stevie_mac

Peter Hansen

Stevie_mac

Stevie_mac

Jack Diederich

Graham Breed

Joe Mason

William Park

Thomas stegen

Stevie_mac

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads