Total Beginner - Extracting Data from a Database Online (Screenshot)

logan.c.graham · May 24, 2013

Hey guys,

I'm learning Python and I'm experimenting with different projects -- I like learning by doing. I'm wondering if you can help me here:

http://i.imgur.com/KgvSKWk.jpg

What this is is a publicly-accessible webpage that's a simple database of people who have used the website. Ideally what I'd like to end up with is an excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email tm.

I'd like to use Python to do it -- crawl the page and extract the data in a usable way.

I'd love your input! I'm just a learner.

Dave Angel · May 24, 2013

Hey guys,

I'm learning Python
Welcome.

and I'm experimenting with different projects -- I like learning by doing. I'm wondering if you can help me here:

na

What this is is a publicly-accessible webpage

No, it's just a jpeg file, an image.

that's a simple database of people who have used the website. Ideally what I'd like to end up with is an excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email tm.

I'd like to use Python to do it -- crawl the page and extract the data in a usable way.

But there's no page to crawl. You may have to start by finding an ocr
to interpret the image as characters. Or find some other source for
your data.

Carlos Nepomuceno · May 25, 2013

### table_data_extraction.py ###
# Usage: table[id][row][column]
# tables[0]       : 1st table
# tables[1][2]    : 3rd row of 2nd table
# tables[3][4][5] : cell content of 6th column of 5th row of 4th table
# len(table)      : quantity of tables
# len(table[6])   : quantity of rows of 7th table
# len(table[7][8]): quantity of columns of 9th row of 8th table

impor re
import urllib2

#to retrieve the contents of the page
page = urllib2.urlopen("http://example.com/page.html").read().strip()

#to create the tables list
tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]

Pretty simple. Good luck!

----------------------------------------

Dave Angel · May 25, 2013

<SNIP>
page = urllib2.urlopen("http://example.com/page.html").read().strip()

#to create the tables list
tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]

Pretty simple. Good luck!

Only if the page is html, which the OP's was not. It was an image. Try
parsing that with regex.

Chris Angelico · May 25, 2013

http://i.imgur.com/KgvSKWk.jpg

What this is is a publicly-accessible webpage...

If that's a screenshot of something that we'd be able to access
directly, then why not just post a link to the actual thing? More
likely I'm thinking it's NOT publicly accessible, which is why it's
been censored.

ChrisA

neil.suffield · May 25, 2013

If you are talking about accessing a web page, rather than an image, then you want to do what is known as screen scraping.

One of the best tools for this is called BeautifulSoup.

http://www.crummy.com/software/BeautifulSoup/

neil.suffield · May 25, 2013

If you are talking about accessing a web page, rather than an image, then what you want to do is known as 'screen scraping'.

One of the best tools for this is called BeautifulSoup.

http://www.crummy.com/software/BeautifulSoup/

logan.c.graham · May 26, 2013

Sorry to be unclear -- it's a screenshot of the webpage, which is publicly accessible, but it contains sensitive information. A bad combination, admittedly, and something that'll be soon fixed.

John Ladasky · May 26, 2013

#to create the tables list
tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]

Pretty simple.

Two nested list comprehensions, with regex pattern matching?

Logan did say he was a "total beginner." :^)

logan.c.graham · May 28, 2013

#to create the tables list

Click to expand...

tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]
Pretty simple.

Click to expand...

Two nested list comprehensions, with regex pattern matching?

Logan did say he was a "total beginner." :^)

Oh goodness, yes, I have no clue.

Carlos Nepomuceno · May 28, 2013

----------------------------------------

Date: Mon, 27 May 2013 17:58:00 -0700
Subject: Re: Total Beginner - Extracting Data from a Database Online (Screenshot)
From: (e-mail address removed)
To: (e-mail address removed) [...]

Oh goodness, yes, I have no clue.

For example:

# to retrieve the contents of all column '# fb' (11th column from the imageyou sent)

c11 = [tables[0][r][10] for r in range(len(tables[0]))]
#      ----------------                -------------
#      this is the content             this is the quantity
#      of the 11th cell                of rows in table[0]
#      of row 'r'

Phil Connell · May 28, 2013

----------------------------------------

Date: Mon, 27 May 2013 17:58:00 -0700
Subject: Re: Total Beginner - Extracting Data from a Database Online (Screenshot)
From: (e-mail address removed)
To: (e-mail address removed) [...]

Oh goodness, yes, I have no clue.

Click to expand...

For example:

# to retrieve the contents of all column '# fb' (11th column from the image you sent)

c11 = [tables[0][r][10] for r in range(len(tables[0]))]

Or rather:

c11 = [row[10] for row in tables[0]]

In most cases, range(len(x)) is a sign that you're doing it wrong

Reading data from a Microsoft Access 2003 database	14	May 19, 2010
What's the best way to extract 2 values from a CSV file from each row systematically?	6	Sep 23, 2013
Extracting Rich Text data formats from win32clipboard	2	Aug 26, 2003
Organising packages/modules - importing functions from a common.py ina separate directory?	7	Oct 28, 2013
Parsing String, Dictionary Lookups, Writing to Database Table	4	Sep 11, 2006
Building a Perl based online survey and data gathering, reporting and analysis tool	1	Oct 5, 2004
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Exposing Excel as a Webservice	2	Sep 14, 2006

Total Beginner - Extracting Data from a Database Online (Screenshot)

logan.c.graham

Dave Angel

Carlos Nepomuceno

Dave Angel

Chris Angelico

neil.suffield

neil.suffield

logan.c.graham

John Ladasky

logan.c.graham

Carlos Nepomuceno

Phil Connell

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads