A
Adam Akhtar
Hi Ive been hacking away at this all morning and getting nowhere fast.
Im relatively new to ruby and im not so hot at regex.
Im trying to grab text data from a website that shows events and then
putting each event into its own class. I figured out how to get the
screen scraped stuff into a clean state. Its just processing it into my
class htat im having problems.
Here is a few events in their natural format
---start of file----
Toto and Boz Scaggs
Seminal American rock band with the talented blues-rock musician. Mar
21, 7pm, ¥13,000. JCB Hall, Suidobashi. Tel: Udo 03-3402-5999.
Kreva
Hip-hop track maker. Mar 21, 7pm, ¥5,000. Akasaka Blitz.
Tel: Disk Garage 03-5436-9600.
Blood Red Shoes
Rock duo from the UK. Mar 21, 7pm, ¥5,000. Shibuya Club Quattro.
Tel: Creativeman 03-3462-6969.
etcetcetc
---end-----
First i grab the file into a string. As all the concerts are seperated
by 4 newlines I use
concertevents = filetext.split(/\n\n\n\n/)
to get an array of events.
Id then like to process these further by keeping the group name seperate
from the rest of the other details. So I thought I'd do
artist = conevt.slice(/[^\n]*/) #get artist info
which assumes the group name will only be on one line. Fine for this
prototype.
The details are a bit trickier as some spill onto a second line (but
seperated by a blank line). The second event is so. I tried
description = conevt.slice(/.*\n\n(.*\n\n.*)/,1) #get desc
Although my RegexCoach programm says it works with the first event, when
i run the programme it seems slice returns nil to description. It
definately works for the second event which takes up 3 lines.
So first question is how should I alter the above regex to make it work
for those cases above - any hints tips or if you feel like it answers.
At this stage im up for easier longer ways rather than the shorter more
cryptic ones.
Second am i going about this the write way. Should I have just avoided
regex and simply read the file line by line using if structures to
figure out which lines are with which event???
Does anyone know of any good resources e.g. tutorials on this subject
i.e. screen scraping, cleaning the grabbed text and then processing it
into your own classes.
wow its a long post....ill leave it at that.
Im relatively new to ruby and im not so hot at regex.
Im trying to grab text data from a website that shows events and then
putting each event into its own class. I figured out how to get the
screen scraped stuff into a clean state. Its just processing it into my
class htat im having problems.
Here is a few events in their natural format
---start of file----
Toto and Boz Scaggs
Seminal American rock band with the talented blues-rock musician. Mar
21, 7pm, ¥13,000. JCB Hall, Suidobashi. Tel: Udo 03-3402-5999.
Kreva
Hip-hop track maker. Mar 21, 7pm, ¥5,000. Akasaka Blitz.
Tel: Disk Garage 03-5436-9600.
Blood Red Shoes
Rock duo from the UK. Mar 21, 7pm, ¥5,000. Shibuya Club Quattro.
Tel: Creativeman 03-3462-6969.
etcetcetc
---end-----
First i grab the file into a string. As all the concerts are seperated
by 4 newlines I use
concertevents = filetext.split(/\n\n\n\n/)
to get an array of events.
Id then like to process these further by keeping the group name seperate
from the rest of the other details. So I thought I'd do
artist = conevt.slice(/[^\n]*/) #get artist info
which assumes the group name will only be on one line. Fine for this
prototype.
The details are a bit trickier as some spill onto a second line (but
seperated by a blank line). The second event is so. I tried
description = conevt.slice(/.*\n\n(.*\n\n.*)/,1) #get desc
Although my RegexCoach programm says it works with the first event, when
i run the programme it seems slice returns nil to description. It
definately works for the second event which takes up 3 lines.
So first question is how should I alter the above regex to make it work
for those cases above - any hints tips or if you feel like it answers.
At this stage im up for easier longer ways rather than the shorter more
cryptic ones.
Second am i going about this the write way. Should I have just avoided
regex and simply read the file line by line using if structures to
figure out which lines are with which event???
Does anyone know of any good resources e.g. tutorials on this subject
i.e. screen scraping, cleaning the grabbed text and then processing it
into your own classes.
wow its a long post....ill leave it at that.