J
Jeffrey Barish
I have a regular expression that I use to extract the surname:
surname = r'(?u).+ (\w+)'
However, when I apply it to this Unicode string, I get only the first 3
letters of the surname:
name = 'Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k'
surname_re = re.compile(surname)
m = surname_re.search(name)
m.groups()
('Dvo\xc5',)
I suppose that there is an encoding problem, but I don't understand Unicode
well enough to know what to do to digest properly the Unicode characters in
the surname.
surname = r'(?u).+ (\w+)'
However, when I apply it to this Unicode string, I get only the first 3
letters of the surname:
name = 'Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k'
surname_re = re.compile(surname)
m = surname_re.search(name)
m.groups()
('Dvo\xc5',)
I suppose that there is an encoding problem, but I don't understand Unicode
well enough to know what to do to digest properly the Unicode characters in
the surname.