Array of Chars to String

J

James Stroud

Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:
"Bad"

I can write this like this:

astr = "Bob Carol Ted Alice"
letters = "adB"

import sets
alist = [lttr for lttr in astr if lttr in Set(letters)]
newstr = ""
for lttr in alist:
newstr += lttr

But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.

Any ideas?

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
A

Alexander Schmolck

James Stroud said:
But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.

"".join

'as
 
B

Bengt Richter

Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:
"Bad"

I can write this like this:

astr = "Bob Carol Ted Alice"
letters = "adB"

import sets
alist = [lttr for lttr in astr if lttr in Set(letters)]
newstr = ""
for lttr in alist:
newstr += lttr

But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.

Any ideas?

James
I think this will be worth it if your string to modify is _very_ long:
>>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... 'Bad'

see help(str.translate)

If you want to use it in a loop, with the same "letters" I'd want to eliminate the repeated
calculation of the deletions. You could make a factory function that returns a function
that uses deletions from a closure cell. But don't optimize prematurely ;-)

Regards,
Bengt Richter
 
M

Michael Spencer

Bengt said:
> I think this will be worth it if your string to modify is _very_ long:
def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
...'Bad'
According to my measurements the string doesn't have to be long at all before
your method is faster - cool use of str.translate:
>>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... 'Bad'
... return "".join(letter for letter in s if letter in set(letters))
... ... return "".join(letter for letter in s if letter in letters)

... print "List multiplier: %s" % multiplier
... print shell.timefunc(func_join, "Bob Carol Ted Alice" * multiplier, 'adB')
... print shell.timefunc(func_join1, "Bob Carol Ted Alice" * multiplier,
'adB')
... print shell.timefunc(some_func, "Bob Carol Ted Alice" * multiplier, 'adB')
...
List multiplier: 1
func_join(...) 11267 iterations, 44.38usec per call
func_join1(...) 38371 iterations, 13.03usec per call
some_func(...) 1230 iterations, 406.69usec per call
List multiplier: 10
func_join(...) 1381 iterations, 362.40usec per call
func_join1(...) 7984 iterations, 62.63usec per call
some_func(...) 1226 iterations, 407.94usec per call
List multiplier: 100
func_join(...) 140 iterations, 3.59msec per call
func_join1(...) 873 iterations, 0.57msec per call
some_func(...) 1184 iterations, 422.42usec per call
List multiplier: 1000
func_join(...) 15 iterations, 35.50msec per call
func_join1(...) 90 iterations, 5.57msec per call
some_func(...) 949 iterations, 0.53msec per call
List multiplier: 10000
func_join(...) 2 iterations, 356.53msec per call
func_join1(...) 9 iterations, 55.59msec per call
some_func(...) 313 iterations, 1.60msec per call
Michael
 
M

Michael Spencer

Michael said:
Bengt said:
I think this will be worth it if your string to modify is _very_ long:
def some_func(s, letters, table=''.join([chr(i) for i in
xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in
letters]))
...
some_func("Bob Carol Ted Alice", 'adB')
'Bad'
According to my measurements the string doesn't have to be long at all
before your method is faster - cool use of str.translate:
....and here's a version that appears faster than "".join across all lengths of
strings: ... return s.translate(table, table.translate(table, letters))
...
Timings follow:
>>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... ... return s.translate(table, table.translate(table, letters))
... ... print "List multiplier: %s" % multiplier
... print shell.timefunc(some_func, "Bob Carol Ted Alice" * multiplier, 'adB')
... print shell.timefunc(some_func1, "Bob Carol Ted Alice" * multiplier,
'adB')
...
List multiplier: 1
some_func(...) 1224 iterations, 408.57usec per call
some_func1(...) 61035 iterations, 8.19usec per call
List multiplier: 10
some_func(...) 1223 iterations, 408.95usec per call
some_func1(...) 54420 iterations, 9.19usec per call
List multiplier: 100
some_func(...) 1190 iterations, 420.48usec per call
some_func1(...) 23436 iterations, 21.34usec per call
List multiplier: 1000
some_func(...) 951 iterations, 0.53msec per call
some_func1(...) 3870 iterations, 129.21usec per call
List multiplier: 10000
some_func(...) 309 iterations, 1.62msec per call
some_func1(...) 417 iterations, 1.20msec per call
 
B

Bengt Richter

Michael said:
Bengt said:
I think this will be worth it if your string to modify is _very_ long:
def some_func(s, letters, table=''.join([chr(i) for i in
xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in
letters]))
...
some_func("Bob Carol Ted Alice", 'adB')
'Bad'
According to my measurements the string doesn't have to be long at all
before your method is faster - cool use of str.translate:
...and here's a version that appears faster than "".join across all lengths of
strings:... return s.translate(table, table.translate(table, letters))
...
Good one! ;-)

BTW, since str has .translate, why not .maketrans?

Anyway, this will be something to keep in mind when doing character-based joinery ;-)
Timings follow:
Let's just say improved ;-)
(or see parent post)

Regards,
Bengt Richter
 
P

Peter Otten

Michael said:
...     return "".join(letter for letter in s if letter in set(letters))

Make that

def func_join(s, letters):
letter_set = set(letters)
return "".join(letter for letter in s if letter in letter_set)

for a fair timing of a set lookup as opposed to set creation.

Peter
 
S

Scott David Daniels

Bengt said:
... BTW, since str has .translate, why not .maketrans?
Probably because, while I can imagine u'whatever'.translate using a
256-wide table (and raising exceptions for other the rest), I have
more problems imagining the size of the table for a UCS-4 unicode
setup (32 bits per character). I suppose it could be done, but a
naïve program might be in for a big shock about memory consumption.

--Scott David Daniels
(e-mail address removed)
 
M

Michael Spencer

Peter said:
Michael Spencer wrote:




Make that

def func_join(s, letters):
letter_set = set(letters)
return "".join(letter for letter in s if letter in letter_set)

for a fair timing of a set lookup as opposed to set creation.

Peter
Sorry - yes! I trip up over the early-binding of the outer loop, but the
late-binding of the condition

Anyway, here are the revised timings, which confirm the speed-advantage of the
translate approach. And, as before, with such a short list of white-listed
letters, it does not pay to create a set at all, even outside the loop. Note
the speed advantage of func_translate1 is 50:1 for long strings, so as Bengt
pointed out, it's worth keeping this in mind for character-based filtering/joining.
... return "".join(letter for letter in s if letter in letters)
... ... letter_set = set(letters)
... return "".join(letter for letter in s if letter in letter_set)
... ... return s.translate(table, table.translate(table, letters))
...
... print "List multiplier: %s" % multiplier
... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
multiplier, 'adB')
... print shell.timefunc(func_join1, "Bob Carol Ted Alice" * multiplier,
'adB')
... print shell.timefunc(func_join2, "Bob Carol Ted Alice" * multiplier,
'adB')
...
List multiplier: 1
func_translate1(...) 62295 iterations, 8.03usec per call
func_join1(...) 36510 iterations, 13.69usec per call
func_join2(...) 30139 iterations, 16.59usec per call
List multiplier: 10
func_translate1(...) 53145 iterations, 9.41usec per call
func_join1(...) 7821 iterations, 63.93usec per call
func_join2(...) 7031 iterations, 71.12usec per call
List multiplier: 100
func_translate1(...) 23170 iterations, 21.58usec per call
func_join1(...) 858 iterations, 0.58msec per call
func_join2(...) 777 iterations, 0.64msec per call
List multiplier: 1000
func_translate1(...) 3761 iterations, 132.96usec per call
func_join1(...) 87 iterations, 5.76msec per call
func_join2(...) 81 iterations, 6.18msec per call
List multiplier: 10000
func_translate1(...) 407 iterations, 1.23msec per call
func_join1(...) 9 iterations, 56.27msec per call
func_join2(...) 8 iterations, 64.76msec per call
 
K

Kent Johnson

Michael said:
Anyway, here are the revised timings...
... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
multiplier, 'adB')

What is shell.timefunc?

Thanks,
Kent
 
M

Michael Spencer

Kent said:
What is shell.timefunc?

This snippet, which I attach to my interactive shell, since I find timeit
awkward to use in that context:

def _get_timer():
if sys.platform == "win32":
return time.clock
else:
return time.time
return

def timefunc(func, *args, **kwds):
timer = _get_timer()
count, totaltime = 0, 0
while totaltime < 0.5:
t1 = timer()
res = func(*args, **kwds)
t2 = timer()
totaltime += (t2-t1)
count += 1
if count > 1000:
unit = "usec"
timeper = totaltime * 1000000 / count
else:
unit = "msec"
timeper = totaltime * 1000 / count
return "%s(...) %s iterations, %.2f%s per call" % \
(func.__name__, count, timeper, unit)

Michael
 
M

Michael Spencer

Martin said:
Apparently nobody has proposed this yet:
'Bad'


Everyone is seeking early PEP 3000 compliance ;-)

filter wins on conciseness - it's short enought to use in-line, but for a fair
speed comparison, I've wrapped it in a function, below; str.translate is far
ahead on speed for all but the shortest strings:

def func_translate1(s, letters, table=string.maketrans("","")):
return s.translate(table, table.translate(table, letters))

def func_filter1(s, letters):
in_set = letters.__contains__
return filter(in_set, s)

def func_filter2(s, letters):
in_set = set(letters).__contains__
return filter(in_set, s)

... s = "Bob Carol Ted Alice" * m
... letters = "adB"
... print "List length: %s" % len(s)
... print shell.timefunc(func_translate1, s, letters)
... print shell.timefunc(func_filter1, s, letters)
... print shell.timefunc(func_filter2, s, letters)
...
List length: 19
func_translate1(...) 64179 iterations, 7.79usec per call
func_filter1(...) 63706 iterations, 7.85usec per call
func_filter2(...) 45336 iterations, 11.03usec per call
List length: 190
func_translate1(...) 54950 iterations, 9.10usec per call
func_filter1(...) 12224 iterations, 40.90usec per call
func_filter2(...) 10737 iterations, 46.57usec per call
List length: 1900
func_translate1(...) 22760 iterations, 21.97usec per call
func_filter1(...) 1293 iterations, 386.87usec per call
func_filter2(...) 1184 iterations, 422.52usec per call
List length: 19000
func_translate1(...) 3713 iterations, 134.67usec per call
func_filter1(...) 137 iterations, 3.67msec per call
func_filter2(...) 124 iterations, 4.05msec per call
List length: 190000
func_translate1(...) 426 iterations, 1.18msec per call
func_filter1(...) 14 iterations, 38.29msec per call
func_filter2(...) 13 iterations, 40.59msec per call
Michael
 
B

Bengt Richter

Apparently nobody has proposed this yet:

'Bad'
Baaad ;-)
But since I'm playing the other side of the table for
the moment, isn't filter to be deprecated?

Regards,
Bengt Richter
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Bengt said:
But since I'm playing the other side of the table for
the moment, isn't filter to be deprecated?

How could we know? It might be removed in P3k, but does that
mean it is deprecated, as in "being disapproved", i.e. "being
passed unfavorable judgement on"?

In Python, I consider something deprecated when the documentation
says it is deprecated, or when using it raises a DeprecationWarning.
Neither is the case for filter.

That it is going to be removed in P3k does not bother me much:
I wouldn't be suprised if Python 3000 is released 995 years from
now :)

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

extension to list extend 2
Cleaning up a string 3
WTF? 4
overloading *something 11
Python Pseudo-Switch 3
pdb question - spew out "steps" until crash needed 2
Importing to 0
dict subclass and pickle bug (?) 4

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top