split a string of space separated substrings - elegant solution?

H

Helmut Jarausch

Hi,

I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')

For that, one probably has to protect white space between
quotes, then split by white space and finally converted the
'protected white space' to normal white space again.
Is there an elegant solution - perhaps without using a lexer
and something else. With regular expressions alone it seems
clumsy.

Many thanks for a hint,

Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
 
W

Wojciech =?iso-8859-2?Q?Mu=B3a?=

Helmut said:
Hi,

I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')

import csv

s = 'abc "xy z" "1 2 3" "a \\" x"'
r = iter(csv.reader(, delimiter=" ", escapechar="\\"))
print r.next()

w.
 
C

Carsten Haese

Hi,

I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')
['abc', 'xy z', '1 2 3', 'a " x']

I hope that's elegant enough ;)
 
J

Jerry Hill

I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')

Using the csv module gets you most of the way there. For instance:
import csv
text = r'abc "xy z" "1 2 3" "a \" x"'
reader = csv.reader([text], delimiter=" ", escapechar='\\')
for row in reader:
print row

['abc', 'xy z', '', '1 2 3', '', 'a " x']
That does leave you with empty elements where you had double spaces
between items though. you could fix that with something like:
row = [element for element in row if element != '']
print row

['abc', 'xy z', '1 2 3', 'a " x']
The CSV module can handle lots of delimited data other that quote and
comma delimited. See the docs at:
http://docs.python.org/lib/module-csv.html and PEP 305:
http://www.python.org/dev/peps/pep-0305/
 
P

Paul McGuire

I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')

Pyparsing has built-in support for special treatment of quoted
strings. Observe:

from pyparsing import *

data = r'abc "xy z" "1 2 3" "a \" x"'

quotedString.setParseAction(removeQuotes)
print OneOrMore(quotedString |
Word(printables) ).parseString(data)

prints:

['abc', 'xy z', '1 2 3', 'a \\" x']

Or perhaps a bit trickier, do the same while skipping items inside /*
*/ comments:

data = r'abc /* 456 "xy z" */ "1 2 3" "a \" x"'

quotedString.setParseAction(removeQuotes)
print OneOrMore(quotedString |
Word(printables) ) \
.ignore(cStyleComment).parseString(data)

prints:

['abc', '1 2 3', 'a \\" x']


-- Paul
 
H

Helmut Jarausch

Many thanks to all of you!
It's amazing how many elegant solutions there are in Python.


--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
 
K

Karthik Gurusamy

Many thanks to all of you!
It's amazing how many elegant solutions there are in Python.

Here is yet another solution.

pexpect.split_command_line()
From the documentation:
split_command_line(command_line)
This splits a command line into a list of arguments.
It splits arguments on spaces, but handles
embedded quotes, doublequotes, and escaped characters.
It's impossible to do this with a regular expression, so
I wrote a little state machine to parse the command line.

http://pexpect.sourceforge.net/pexpect.html

But I am surprised to see there is a standard module already doing
this (shlex)

Karthik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top