Regular expressions and long text

Guillermo.Acilu · Jun 20, 2008

[Note: parts of this message were removed to make it a legal post.]

Hello guys,

I've started with Ruby a month ago and I am doing some works with strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo

Sandro Paganotti · Jun 20, 2008

[Note: parts of this message were removed to make it a legal post.]

I did not understand if you want to split the string on the full stop
str.split(".")
or divide the string in words and split them in two groups:

str = "Ruby is great. We all know that."
([(v=str.split(" "))[0...k=((l=(v.size))/2)]]+[v[k..l]]).map{|e|e.join(" ")}
=> ["Ruby is great.", "We all know that."]

Hello guys,

I've started with Ruby a month ago and I am doing some works with strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo

Bryan JJ Buckley · Jun 21, 2008

You can split on a regex for a full-stop followed by (optional) whitespace.
=> ["Ruby is great", "We all know that"]

Zhukov Pavel · Jun 24, 2008

(e-mail address removed) pisze:

[Note: parts of this message were removed to make it a legal post.]

Hello guys,
[cut]

I would like to do the following:
str = "Ruby is great. We all know that."
and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Click to expand...

Hi,
maybe you should to try this: words = str.split(/\.\s*/)

it works for me:

irb(main):008:0> str = "Ruby is great. We all know that."
=> "Ruby is great. We all know that."
irb(main):009:0> words = str.split(/\.\s*/)
=> ["Ruby is great", "We all know that"]
irb(main):010:0> words[0]
=> "Ruby is great"
irb(main):011:0> words[1]
=> "We all know that"

greetings

even more simple

irb(main):001:0> "Ruby is great. We all know that.".split(".")
=> ["Ruby is great", " We all know that"]

Raveendran Jazzez · Jun 25, 2008

Hi,

I think u expect this output.. so pls try it..

str="Ruby is great. We all know that."
a= str.split('.').join(' ')
words=[]
words=a.scan(/\w+/)

=> words=["Ruby","is","great","We","all","know","that"]

Regards,
P.Raveendran
http://raveendran.wordpress.com

Hello guys,

I've started with Ruby a month ago and I am doing some works with
strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo

Caike · Jul 4, 2008

Hi,

I think u expect this output.. so pls try it..

str="Ruby is great. We all know that."
a= str.split('.').join(' ')
words=[]
words=a.scan(/\w+/)

=> words=["Ruby","is","great","We","all","know","that"]

Regards,
P.Raveendranhttp://raveendran.wordpress.com

Hello guys,

Click to expand...

I've started with Ruby a month ago and I am doing some works with
strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

Click to expand...

I have used the following assignment to work with the words:

Click to expand...

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

Click to expand...

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

Click to expand...

I would like to do the following:

Click to expand...

str = "Ruby is great. We all know that."

Click to expand...

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Click to expand...

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Guillermo

Click to expand...

If you want to stick to a regex based solution.

str = "one one one. two. three." => "one one one. two. three."
str.scan(/\w[\s|\w]*./)

Click to expand...

=> ["one one one.", "two.", "three."]

And you could keep going adding more words in the same pattern

str = "one one one. two. three. four. five." => "one one one. two. three. four. five."
str.scan(/\w[\s|\w]*./)

Click to expand...

=> ["one one one.", "two.", "three.", "four.", "five."]

It may not be the best solution to this problem, but it is always good
have your regexp skills up to date

Hassan Schroeder · Jul 4, 2008

Very late to this thread, but...

You can split on a regex for a full-stop followed by (optional) whitespace.
=> ["Ruby is great", "We all know that"]

str="Dr. Feelgood will meet you at the corner of Foo St. and Bar Dr.
tonight at 8:00; bring $2.98 -- exact change -- to resolve the 5.5%
interest you owe."

Jun Young Kim · Dec 11, 2008

Hi,

I've one program to replace text's contents.

def replace (aPatten, aReplace)

# I need some logic to translate string to patten

contents = File.read("data")
contents.gsub!(aPatten, aReplace)
File.open("result", "w") do |file|
file << contents
end
end

Another class give an aPatten argument as a "/[aeiou]/" and aReplace
as a "*". Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead
of "/[aeiou]/".

Any ideas on how to do I convert string to patten?

Robert Klemme · Dec 11, 2008

2008/12/11 Jun Young Kim said:
I've one program to replace text's contents.

def replace (aPatten, aReplace)

# I need some logic to translate string to patten

contents = File.read("data")
contents.gsub!(aPatten, aReplace)
File.open("result", "w") do |file|
file << contents
end
end

Another class give an aPatten argument as a "/[aeiou]/" and aReplace as a
"*". Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead of
"/[aeiou]/".

Any ideas on how to do I convert string to patten?

How about looking at the documentation?

http://www.ruby-doc.org/core/classes/Regexp.html

Btw, I rather tend to make it a requirement that the argument has the
appropriate type. Since #gsub is capable of working with String and
Regexp as pattern, I would not change your method's implementation but
the code invoking it.

Taking this one step further: I would choose a different abstraction:

def transform from_file, to_file
repl = yield(File.read(from_file)) and
File.open(to_file, "w") do |io|
io.write(repl)
end
end

Then you can do

transform "data", "result" do |content|
content.gsub! /[aeiou]/, "*"
content
end

Cheers

robert

Brian Candler · Dec 11, 2008

Any ideas on how to do I convert string to patten?

irb(main):001:0> Regexp.new("[aeiou]")
=> /[aeiou]/

Jun Young Kim · Dec 11, 2008

thanks for your reply, brian.

How about Regexp.new("/[aeiou]/") ?
=3D> /\/[aeiou]\//

2008. 12. 11, =BF=C0=C8=C4 5:38, Brian Candler =C0=DB=BC=BA:

Any ideas on how to do I convert string to patten?

Click to expand...

irb(main):001:0> Regexp.new("[aeiou]")
=3D> /[aeiou]/

--=20
Posted via http://www.ruby-forum.com/.

Jun Young Kim · Dec 12, 2008

I mean I have a regular expression as a string.

puts aPattern
=3D> "/[aeiou]/"

When I convert it as a Regexp instance, the result is
=3D> /\/[aeiou]\//

At this point, the given regular pattern is not regular expression =20
anymore, it's just a string.

2008. 12. 11, =BF=C0=C8=C4 6:06, Jun Young Kim =C0=DB=BC=BA:

thanks for your reply, brian.

How about Regexp.new("/[aeiou]/") ?
=3D> /\/[aeiou]\//

2008. 12. 11, =BF=C0=C8=C4 5:38, Brian Candler =C0=DB=BC=BA:

Any ideas on how to do I convert string to patten?

Click to expand...

irb(main):001:0> Regexp.new("[aeiou]")
=3D> /[aeiou]/

--=20
Posted via http://www.ruby-forum.com/.

Click to expand...

Pena, Botp · Dec 12, 2008

From: Jun Young Kim [mailto:[email protected]]=20
# I mean I have a regular expression as a string.
# puts aPattern
# =3D> "/[aeiou]/"
# When I convert it as a Regexp instance, the result is
# =3D> /\/[aeiou]\//
# At this point, the given regular pattern is not regular expression =20
# anymore, it's just a string.

it is stil a regex, not just the regex that you expected though.

you can either remove the surrounding slashes

s=3D"/[aeiou]/"
Regexp.new s[1..-2]
#=3D> /[aeiou]/

or you can just eval it straight away

eval(s)
#=3D> /[aeiou]/

Jun Young Kim · Dec 12, 2008

Hi , all

There is a ruby parse library , as you know, called "Treetop".

some part of logic in my program try to parse regular expressions as a
single token.

let me give example for easy understanding.

translate /[aeiou]/ "*"

this means translate all chars having a /[aeiou]/ to *.

any idea to create rule to parse it ?

regular expressions	3	Apr 26, 2010
Regular expressions, capture repeated groups	4	Jul 8, 2010
Parsing Log records with regular expressions	2	Feb 3, 2011
Regular Expressions	4	Jun 17, 2008
Regular expressions in Ruby	7	Jun 14, 2010
C language. work with text	3	Dec 10, 2021
Processing regular expressions?	2	Oct 15, 2010
Batch modifying text - content and context based	5	Jan 19, 2023

Regular expressions and long text

Guillermo.Acilu

Sandro Paganotti

Bryan JJ Buckley

Zhukov Pavel

Raveendran Jazzez

Caike

Hassan Schroeder

Jun Young Kim

Robert Klemme

Brian Candler

Jun Young Kim

Jun Young Kim

Pena, Botp

Jun Young Kim

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads