Web Links Gatherer (ver 2)

By using Beautiful Soup, we can change the code as seen at the previous post to the code below… and it even works much better… just by changing the regex function, it return a better result :

# otoy -- http://otoyrood.wordpress.com
# 0x102010

from urllib import urlopen
from BeautifulSoup import BeautifulSoup

text = urlopen('http://otoyrood.wordpress.com').read()
soup = BeautifulSoup(text)

pages = set()
for header in soup('a'):

print 'n'.join(sorted(pages))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

About Hanny Haliwela

try to be or not try to be