Split a string based on change of character

A

Andrew Savige

Python beginner here.

For a string 'ABBBCC', I want to produce a list ['A', 'BBB', 'CC'].
That is, break the string into pieces based on change of character.
What's the best way to do this in Python?

Using Python 2.5.1, I tried:

import re
s = re.split(r'(?<=(.))(?!\1)', 'ABBBCC')
for e in s: print e

but was surprised when it printed:

ABBBCC

I expected something like:

A
A
BBB
B
CC
C

(the extra fields because of the capturing parens).

Thanks,
/-\



____________________________________________________________________________________
Yahoo!7 Mail has just got even bigger and better with unlimited storage on all webmail accounts.
http://au.docs.yahoo.com/mail/unlimitedstorage.html
 
A

attn.steven.kuo

Python beginner here.

For a string 'ABBBCC', I want to produce a list ['A', 'BBB', 'CC'].
That is, break the string into pieces based on change of character.
What's the best way to do this in Python?

Using Python 2.5.1, I tried:

import re
s = re.split(r'(?<=(.))(?!\1)', 'ABBBCC')
for e in s: print e

but was surprised when it printed:

ABBBCC

I expected something like:

A
A
BBB
B
CC
C

(the extra fields because of the capturing parens).


Using itertools:

import itertools

s = 'ABBBCC'
print [''.join(grp) for key, grp in itertools.groupby(s)]


Using re:

import re

pat = re.compile(r'((\w)\2*)')
print [t[0] for t in re.findall(pat, s)]


By the way, your pattern seems to work in perl:

$ perl -le '$, = " "; print split(/(?<=(.))(?!\1)/, "ABBBCC");'
A A BBB B CC C

Was that the type of regular expressions you were expecting?
 
A

attn.steven.kuo

(snipped)
Yes. Here's a simpler example without any backreferences:

s = re.split(r'(?<=\d)(?=\D)', '1B2D3')

That works in Perl but not in Python.
Is it that "chaining" assertions together like this is not supported in Python
re?
Or is that the case only in the split function?


The match objects returned by finditer return
the expected span positions:
.... print mobj.span()
....
(1, 1)
(3, 3)

From your original post:
.... print mobj.span()
....
(1, 1)
(4, 4)
(6, 6)


So, it seems split doesn't split on what
amounts to a zero-width assertion. I
couldn't find this explanation from a
quick look at the documentation, however.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top