Regular expressions and long text

G

Guillermo.Acilu

[Note: parts of this message were removed to make it a legal post.]

Hello guys,

I've started with Ruby a month ago and I am doing some works with strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo
 
S

Sandro Paganotti

[Note: parts of this message were removed to make it a legal post.]

I did not understand if you want to split the string on the full stop
str.split(".")
or divide the string in words and split them in two groups:


str = "Ruby is great. We all know that."
([(v=str.split(" "))[0...k=((l=(v.size))/2)]]+[v[k..l]]).map{|e|e.join(" ")}
=> ["Ruby is great.", "We all know that."]


Hello guys,

I've started with Ruby a month ago and I am doing some works with strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo
 
B

Bryan JJ Buckley

You can split on a regex for a full-stop followed by (optional) whitespace.
=> ["Ruby is great", "We all know that"]
 
Z

Zhukov Pavel

(e-mail address removed) pisze:
[Note: parts of this message were removed to make it a legal post.]

Hello guys,
[cut]

I would like to do the following:
str = "Ruby is great. We all know that."
and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?
Hi,
maybe you should to try this: words = str.split(/\.\s*/)

it works for me:

irb(main):008:0> str = "Ruby is great. We all know that."
=> "Ruby is great. We all know that."
irb(main):009:0> words = str.split(/\.\s*/)
=> ["Ruby is great", "We all know that"]
irb(main):010:0> words[0]
=> "Ruby is great"
irb(main):011:0> words[1]
=> "We all know that"

greetings

even more simple

irb(main):001:0> "Ruby is great. We all know that.".split(".")
=> ["Ruby is great", " We all know that"]
 
R

Raveendran Jazzez

Hi,

I think u expect this output.. so pls try it..

str="Ruby is great. We all know that."
a= str.split('.').join(' ')
words=[]
words=a.scan(/\w+/)


=> words=["Ruby","is","great","We","all","know","that"]

Regards,
P.Raveendran
http://raveendran.wordpress.com


Hello guys,

I've started with Ruby a month ago and I am doing some works with
strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo
 
C

Caike

Hi,

I think u expect this output.. so pls try it..

str="Ruby is great. We all know that."
a= str.split('.').join(' ')
words=[]
words=a.scan(/\w+/)

=> words=["Ruby","is","great","We","all","know","that"]

Regards,
P.Raveendranhttp://raveendran.wordpress.com


Hello guys,
I've started with Ruby a month ago and I am doing some works with
strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.
I have used the following assignment to work with the words:
str = "Ruby is great"
words = []
words = str.scan(/\w+/)
The result is words[0]="Ruby" words[1]="is" and words[3]="great"
I would like to do the following:
str = "Ruby is great. We all know that."
and get words[0]="Ruby is great" and ruby[1]="We all know that"
Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Guillermo

If you want to stick to a regex based solution.
str = "one one one. two. three." => "one one one. two. three."
str.scan(/\w[\s|\w]*./)
=> ["one one one.", "two.", "three."]

And you could keep going adding more words in the same pattern
str = "one one one. two. three. four. five." => "one one one. two. three. four. five."
str.scan(/\w[\s|\w]*./)
=> ["one one one.", "two.", "three.", "four.", "five."]


It may not be the best solution to this problem, but it is always good
have your regexp skills up to date ;)
 
H

Hassan Schroeder

Very late to this thread, but...

You can split on a regex for a full-stop followed by (optional) whitespace.
=> ["Ruby is great", "We all know that"]

str="Dr. Feelgood will meet you at the corner of Foo St. and Bar Dr.
tonight at 8:00; bring $2.98 -- exact change -- to resolve the 5.5%
interest you owe."

:)
 
J

Jun Young Kim

Hi,

I've one program to replace text's contents.

def replace (aPatten, aReplace)

# I need some logic to translate string to patten

contents = File.read("data")
contents.gsub!(aPatten, aReplace)
File.open("result", "w") do |file|
file << contents
end
end


Another class give an aPatten argument as a "/[aeiou]/" and aReplace
as a "*". Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead
of "/[aeiou]/".

Any ideas on how to do I convert string to patten?
 
R

Robert Klemme

2008/12/11 Jun Young Kim said:
I've one program to replace text's contents.

def replace (aPatten, aReplace)

# I need some logic to translate string to patten

contents = File.read("data")
contents.gsub!(aPatten, aReplace)
File.open("result", "w") do |file|
file << contents
end
end


Another class give an aPatten argument as a "/[aeiou]/" and aReplace as a
"*". Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead of
"/[aeiou]/".

Any ideas on how to do I convert string to patten?

How about looking at the documentation?

http://www.ruby-doc.org/core/classes/Regexp.html

Btw, I rather tend to make it a requirement that the argument has the
appropriate type. Since #gsub is capable of working with String and
Regexp as pattern, I would not change your method's implementation but
the code invoking it.

Taking this one step further: I would choose a different abstraction:

def transform from_file, to_file
repl = yield(File.read(from_file)) and
File.open(to_file, "w") do |io|
io.write(repl)
end
end

Then you can do

transform "data", "result" do |content|
content.gsub! /[aeiou]/, "*"
content
end

Cheers

robert
 
B

Brian Candler

Any ideas on how to do I convert string to patten?

irb(main):001:0> Regexp.new("[aeiou]")
=> /[aeiou]/
 
J

Jun Young Kim

I mean I have a regular expression as a string.

puts aPattern
=3D> "/[aeiou]/"

When I convert it as a Regexp instance, the result is
=3D> /\/[aeiou]\//

At this point, the given regular pattern is not regular expression =20
anymore, it's just a string.

2008. 12. 11, =BF=C0=C8=C4 6:06, Jun Young Kim =C0=DB=BC=BA:
thanks for your reply, brian.

How about Regexp.new("/[aeiou]/") ?
=3D> /\/[aeiou]\//

2008. 12. 11, =BF=C0=C8=C4 5:38, Brian Candler =C0=DB=BC=BA:
Any ideas on how to do I convert string to patten?

irb(main):001:0> Regexp.new("[aeiou]")
=3D> /[aeiou]/

--=20
Posted via http://www.ruby-forum.com/.
 
P

Pena, Botp

From: Jun Young Kim [mailto:[email protected]]=20
# I mean I have a regular expression as a string.
# puts aPattern
# =3D> "/[aeiou]/"
# When I convert it as a Regexp instance, the result is
# =3D> /\/[aeiou]\//
# At this point, the given regular pattern is not regular expression =20
# anymore, it's just a string.

it is stil a regex, not just the regex that you expected though.

you can either remove the surrounding slashes

s=3D"/[aeiou]/"
Regexp.new s[1..-2]
#=3D> /[aeiou]/

or you can just eval it straight away

eval(s)
#=3D> /[aeiou]/
 
J

Jun Young Kim

Hi , all

There is a ruby parse library , as you know, called "Treetop".

some part of logic in my program try to parse regular expressions as a
single token.

let me give example for easy understanding.

translate /[aeiou]/ "*"

this means translate all chars having a /[aeiou]/ to *.

any idea to create rule to parse it ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top