Python Web Scraping Like a Boss
Welcome champ! If you want to go through this tutorial, you need basic knowledge of these :
- Python syntax
- lists, dictionaries
- for loop
Step 1 - Import these libraries
import requests
from bs4 import BeautifulSoup
In first line, we will send a get request to get HTML code from website we want and we need requests library for that. In second line, BeautifulSoup is going to help us search through HTML tags.
Step 2 - Get that freakin' HTML code
r = requests.get("https://pythonhow.com/example.html")
content = r.content
- We had our get request from given url
- We assigned its content to a variable
Step 3 - Parsing that bad boy
soup = BeautifulSoup(content,"html.parser")
elements = soup.find_all("div",{"class":"cities"})
In this code, we assigned soup variable. Than we took the div tags in cities class using soup.
Step 4 - Looping those tags
for element in elements:
cities = element.find_all("h2")[0].text
print(cities)
info = element.find_all("p")[0].text
print(info)
Please note that find_all() method returns a list. If you use find() method it only returns first tag it finds.
Here we searching through elements list and do these
- For each element object, grab first h2 tag's text (which is city name) and print
- For each element object, grab first p tag's text (which is city info) and print
Output :
London
London is the capital of England and it's been a British settlement since 2000 years ago.
Paris
Paris is the capital city of France. It was declared capital since 508.
Tokyo
Tokyo is the capital of Japan and one of the most populated cities in the world.
Nice short little tutorial. You should consider using the tag #SteemSTEM next time you do a post on Science, Technology, Engineering, or Math. They have a curation team to upvote quality STEM content.
Thank you for information