Extracting URL's from websites
We need to packages to successfully run this script:
Install Python 2.7
Open command prompt and navigate to C:\Python27\Scripts and type the following:
pip install requests
&
pip install beautifulsoup4
Now open Python GUI - File - New and type the codes below
from bs4 import BeautifulSoup
import requests
ask_user = raw_input ("Please enter the URL you want to harvest: ")
user_ask = requests.get("http://" +ask_user)
momo = user_ask.text
soup = BeautifulSoup(momo)
for link in soup.find_all('a'):
print (link.get('href')) # The print function should be indented, use shift key 4 times
Now save it as webextractor.py and run it.
You can also run it through command prompt:
Open command prompt and navigate to the location where you saved webextractor.py
Type:
python webextractor.py
Note : If you encounter Warning (from warnings module): leave it as it is.
Happy coding! Please subscribe
We need to packages to successfully run this script:
Install Python 2.7
Open command prompt and navigate to C:\Python27\Scripts and type the following:
pip install requests
&
pip install beautifulsoup4
Now open Python GUI - File - New and type the codes below
from bs4 import BeautifulSoup
import requests
ask_user = raw_input ("Please enter the URL you want to harvest: ")
user_ask = requests.get("http://" +ask_user)
momo = user_ask.text
soup = BeautifulSoup(momo)
for link in soup.find_all('a'):
print (link.get('href')) # The print function should be indented, use shift key 4 times
Now save it as webextractor.py and run it.
You can also run it through command prompt:
Open command prompt and navigate to the location where you saved webextractor.py
Type:
python webextractor.py
Happy coding! Please subscribe


1 Comments
Sweet & Simple. Thank you!
ReplyDeleteI'd love to hear your thoughts!