Ticker

10/recent/ticker-posts

Extracting URL's from websites

Extracting URL's from websites

We need to packages to successfully run this script:
Install Python 2.7
Open command prompt and navigate to C:\Python27\Scripts and type the following:

pip install requests
&
pip install beautifulsoup4

Now open Python GUI - File - New and type the codes below

from bs4 import BeautifulSoup
import requests

ask_user = raw_input ("Please enter the URL you want to harvest: ")
user_ask = requests.get("http://" +ask_user)

momo = user_ask.text
soup = BeautifulSoup(momo)
for link in soup.find_all('a'):
    print (link.get('href')) # The print function should be indented, use shift key 4 times

Now save it as webextractor.py and run it.

You can also run it through command prompt:

Open command prompt and navigate to the location where you saved webextractor.py
Type:
python webextractor.py


Note : If you encounter Warning (from warnings module): leave it as it is.

Happy coding! Please subscribe


Post a Comment

1 Comments

I'd love to hear your thoughts!