Split a string based on change of character

A

Andrew Savige

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

import itertools
s = "ZBBBCCZZ"
x = [''.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python's itertools?

Thanks,
/-\




____________________________________________________________________________________
Sick of deleting your inbox? Yahoo!7 Mail has free unlimited storage.
http://au.docs.yahoo.com/mail/unlimitedstorage.html
 
P

Peña, Botp

From: Andrew Savige [mailto:[email protected]]=20
# s =3D "ZBBBCZZ"
# x =3D s.scan(/((.)\2*)/).map {|i| i[0]}

when it comes to string patterns like this, nothing beats regex
=20
# import itertools
# s =3D "ZBBBCCZZ"
# x =3D [''.join(g) for k, g in itertools.groupby(s)]
# Does anyone know if Ruby has a similar library to Python's itertools?

hmm, you seem to like this than your previous regex+map solution, why? =
(i ask because i prefer your first solution --not that it's ruby)

in 1.9 or the upcoming ruby, it keeps getting better and better and may =
look like this,

s =3D "ZBBBCZZ"
x =3D s.split('').group_by{|x| x}.entries

or possibly to

x =3D s.split('').group_by.entries

but unfortunately i don't have a 1.9 build here to test (grrr, shouldn't =
have deleted that vm).

kind regards -botp
 
J

Jeremy Hinegardner

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

import itertools
s = "ZBBBCCZZ"
x = [''.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python's itertools?

Nothing off the top of my head, but how does this work for you ?

in_str.split('').inject([]) do |m,l|
if m.last and m.last[0].chr == l
m[-1] += l
else
m << l
end
m
end

Its not too lines, but it will return the same array

enjoy

-jeremy
 
W

Wolfgang Nádasi-donner

Andrew said:
s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

Maybe this ist faster:

result = []
"ZBBBCZZ".scan(/((.)\2*)/){erg.push [$~[0]]}
p erg # => [["Z"], ["BBB"], ["C"], ["ZZ"]]

Wolfgang Nádasi-Donner
 
W

Wolfgang Nádasi-donner

Maybe this ist faster:

result = []
"ZBBBCZZ".scan(/((.)\2*)/){erg.push [$~[0]]}
p erg # => [["Z"], ["BBB"], ["C"], ["ZZ"]]

Wolfgang Nádasi-Donner

result = []
"ZBBBCZZ".scan(/((.)\2*)/){result.push [$~[0]]}
p erg # => [["Z"], ["BBB"], ["C"], ["ZZ"]]

Sorry - typo by translation of variable name :-(

Wolfgang Nádasi-Donner
 
S

Simon Kröger

Andrew said:
For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

you may want to write it as ...map{|i,|i}
I'm new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

import itertools
s = "ZBBBCCZZ"
x = [''.join(g) for k, g in itertools.groupby(s)]
Does anyone know if Ruby has a similar library to Python's itertools?


No idea, here is another variant to play with:

x = /#{s.gsub(/(.)\1*/, '(\1+)')}/.match(s).captures

funny little problem.

cheers

Simon
 
D

dblack

--1926193751-250232837-1186836270=:3923
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-250232837-1186836270=:3923"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-250232837-1186836270=:3923
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

From: Andrew Savige [mailto:[email protected]]
# s =3D "ZBBBCZZ"
# x =3D s.scan(/((.)\2*)/).map {|i| i[0]}

when it comes to string patterns like this, nothing beats regex

# import itertools
# s =3D "ZBBBCCZZ"
# x =3D [''.join(g) for k, g in itertools.groupby(s)]
# Does anyone know if Ruby has a similar library to Python's itertools?

hmm, you seem to like this than your previous regex+map solution, why? (i=
ask because i prefer your first solution --not that it's ruby)
in 1.9 or the upcoming ruby, it keeps getting better and better and may l= ook like this,

s =3D "ZBBBCZZ"
x =3D s.split('').group_by{|x| x}.entries

or possibly to

x =3D s.split('').group_by.entries

I'm going to have to get special glasses that can read invisible
ink.... :)


David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-250232837-1186836270=:3923--
--1926193751-250232837-1186836270=:3923--
 
D

dblack

Hi --

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

Probably not better, but just for fun, here's a way using the strscan
extension. I'd be very interested if anyone can get this to be less
clunky -- in particular, the - [""] at the end.

require 'strscan'
s = StringScanner.new("AABCCCDAAAEE")

s.string.split(//).inject([]) {|a,b| a << s.scan_until(/(?!#{b})/) } - [""]

=> ["AA", "B", "CCC", "D", "AAA", "EE"]


David

--
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
 
X

Xavier Noria

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C",
"ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

Yeah, it's short but I agree with things you dislike about it. My
approach was essentially the same as Jeremy's;

s.split(//).inject([]) {|g, c| (g.last && g.last[c] ? g.last : g)
<< c; g}

That's just playing around though, I think that approach is not better.

In my view a better idiom would be to split on character switches.
That would be concise. But as you know if you put groups you get them
back. I see no way to express the condition for boundaries without
using groups.

-- fxn
 
J

James Edward Gray II

Hi --

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C",
"ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better
way to
do it.

Probably not better, but just for fun, here's a way using the strscan
extension. I'd be very interested if anyone can get this to be less
clunky -- in particular, the - [""] at the end.

require 'strscan'
s = StringScanner.new("AABCCCDAAAEE")

s.string.split(//).inject([]) {|a,b| a << s.scan_until(/(?!#
{b})/) } - [""]

=> ["AA", "B", "CCC", "D", "AAA", "EE"]

My best effort:
require "strscan" => true
scanner = StringScanner.new("ZBBBCZZ")
=> # said:
char_runs = Array.new => []
char_runs << scanner.matched while scanner.scan(/(.)\1*/m) => nil
char_runs
=> ["Z", "BBB", "C", "ZZ"]

James Edward Gray II
 
B

botp

I'm going to have to get special glasses that can read invisible
ink.... :)

whoops, sorry =)
that should be

fr
x = s.split('').group_by{|x| x}.entries.map{|x| x.join}

to
x = s.split('').group_by.entries.map{|x| x.join}

i assume that group_by without a block would group the elements by
themselves. maybe i should name it group not group_by :)

kind regards -botp
 
W

William James

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

import itertools
s = "ZBBBCCZZ"
x = [''.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python's itertools?

Thanks,
/-\

____________________________________________________________________________________
Sick of deleting your inbox? Yahoo!7 Mail has free unlimited storage.http://au.docs.yahoo.com/mail/unlimitedstorage.html

s = "ZBBBCZZ"
==>"ZBBBCZZ"
s.scan( /((.)\2*)/ ).transpose.first
==>["Z", "BBB", "C", "ZZ"]
s.gsub( /(.)(?!\1)/, "\\1\n" ).split
==>["Z", "BBB", "C", "ZZ"]
 
P

Peña, Botp

From: William James [mailto:[email protected]]=20
# s =3D "ZBBBCZZ"
# =3D=3D>"ZBBBCZZ"
# s.scan( /((.)\2*)/ ).transpose.first
# =3D=3D>["Z", "BBB", "C", "ZZ"]
# s.gsub( /(.)(?!\1)/, "\\1\n" ).split
# =3D=3D>["Z", "BBB", "C", "ZZ"]

ruby hacker, James, that is cool! gotta keep this.
kind regards -botp
 
S

Simon Kröger

Peña said:
From: William James [mailto:[email protected]]
# s = "ZBBBCZZ"
# ==>"ZBBBCZZ"
# s.scan( /((.)\2*)/ ).transpose.first
# ==>["Z", "BBB", "C", "ZZ"]
# s.gsub( /(.)(?!\1)/, "\\1\n" ).split
# ==>["Z", "BBB", "C", "ZZ"]

ruby hacker, James, that is cool! gotta keep this.
kind regards -botp

Yeah, nice!

i think one can simplify from

s.gsub( /(.)(?!\1)/, "\\1\n" ).split

to

s.gsub(/(.)\1*/, '\0 ').split

?

cheers

Simon
 
D

dblack

Hi --

whoops, sorry =)
that should be

fr
x = s.split('').group_by{|x| x}.entries.map{|x| x.join}

to
x = s.split('').group_by.entries.map{|x| x.join}

i assume that group_by without a block would group the elements by
themselves. maybe i should name it group not group_by :)

Actually I think group_by with nothing specified just returns an
enumerator over the array itself, so it probably will never be used (I
hope :)

I don't think group_by will work for this problem, though, because it
groups everything together:

irb(main):014:0> s
=> "AABCDAAE"
irb(main):015:0> s.split(//).group_by {|x| x }.map {|e| e.join }
=> ["AAAAA", "CC", "EE", "DD", "BB"]

Notice how all the A's got put in one result.


David

--
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
 
B

botp

Hi --

whoops, sorry =)
that should be

fr
x = s.split('').group_by{|x| x}.entries.map{|x| x.join}

to
x = s.split('').group_by.entries.map{|x| x.join}

i assume that group_by without a block would group the elements by
themselves. maybe i should name it group not group_by :)

Actually I think group_by with nothing specified just returns an
enumerator over the array itself, so it probably will never be used (I
hope :)

I don't think group_by will work for this problem, though, because it
groups everything together:

irb(main):014:0> s
=> "AABCDAAE"
irb(main):015:0> s.split(//).group_by {|x| x }.map {|e| e.join }
=> ["AAAAA", "CC", "EE", "DD", "BB"]

Notice how all the A's got put in one result.


David

--
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
 
B

botp

irb(main):015:0> s.split(//).group_by {|x| x }.map {|e| e.join }
=> ["AAAAA", "CC", "EE", "DD", "BB"]
Notice how all the A's got put in one result.

arrghh, sorry, yes. it's really grouping w no regards to sequence.
thank you for the update
kind regards -botp
 
P

Peña, Botp

From: James Edward Gray II [mailto:[email protected]]=20
# >> require "strscan" # =3D> true
# >> scanner =3D StringScanner.new("ZBBBCZZ") # =3D> #<StringScanner =
0/7 @ "ZBBBC...">
# >> char_runs =3D Array.new # =3D> []
# >> char_runs << scanner.matched while scanner.scan(/(.)\1*/m) # =3D> =
nil
# >> char_runs # =3D> ["Z", "BBB", "C", "ZZ"]

i just started playing w string scan after getting a hint fr dblack and =
reading this rubyish example fr James. i think stringscanner is an ideal =
solution for string scanning related problems. I noticed that =
stringscanner#scan returns the match, so,

irb> s =3D StringScanner.new("ZBBBCZZ")=20
=3D> <StringScanner 0/7 @ "ZBBBC...">
irb> a=3D[] =20
=3D> []
irb> a << x while x=3Ds.scan(/(.)\1*/m) =20
=3D> nil
irb> a =20
=3D> ["Z", "BBB", "C", "ZZ"]

again, short and readable. ruby rocks.
kind regards -botp

ps: stringscanner docs are here =
http://www.ruby-doc.org/stdlib/libdoc/strscan/rdoc/index.html
 
W

William James

Peña said:
From: William James [mailto:[email protected]]
# s = "ZBBBCZZ"
# ==>"ZBBBCZZ"
# s.scan( /((.)\2*)/ ).transpose.first
# ==>["Z", "BBB", "C", "ZZ"]
# s.gsub( /(.)(?!\1)/, "\\1\n" ).split
# ==>["Z", "BBB", "C", "ZZ"]
ruby hacker, James, that is cool! gotta keep this.
kind regards -botp

Yeah, nice!

i think one can simplify from

s.gsub( /(.)(?!\1)/, "\\1\n" ).split

to

s.gsub(/(.)\1*/, '\0 ').split

?

cheers

Simon

Yes, with the possible exception of
"\\1\n" . I was anticipating the need to allow
the string to contain any character but a
newline.

s = 'ZBBBC ZZ'
==>"ZBBBC ZZ"
s.gsub(/(.)\1*/, "\\0\n").split("\n")
==>["Z", "BBB", "C", " ", "ZZ"]
 
B

Brad Phelan

Andrew said:
For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

Another variant which gets rid of one of the capture
groups and does introduce an artificial split character

Enumerator.new(s, :scan, /(.)\1*/).map {$&}

Note the $& will not work in the example
x = s.scan(/((.)\2*)/).map {|i| i[0]}

because the map is run on an array after the
scan has happened. To run the map inline with
the scan you need the Enumerator object.

I doubt using Enumerator is any faster though.

Wouldn't it be nicer if scan returned an
enumerable instead of an array. We could
define

class String
def scan_enum regexp
Enumerator.new self, :scan, regexp
end
end

and then be able to do

s.scan_enum(/(.)\1*/).map {$&}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top