Python Web Scraping Like a Boss

in #python7 years ago

giphy.gif

Welcome champ! If you want to go through this tutorial, you need basic knowledge of these :

  • Python syntax
  • lists, dictionaries
  • for loop

Step 1 - Import these libraries

import requests
from bs4 import BeautifulSoup

In first line, we will send a get request to get HTML code from website we want and we need requests library for that. In second line, BeautifulSoup is going to help us search through HTML tags.

Step 2 - Get that freakin' HTML code

r = requests.get("https://pythonhow.com/example.html")
content = r.content

  • We had our get request from given url
  • We assigned its content to a variable

Step 3 - Parsing that bad boy

soup = BeautifulSoup(content,"html.parser")
elements = soup.find_all("div",{"class":"cities"})

In this code, we assigned soup variable. Than we took the div tags in cities class using soup.

Step 4 - Looping those tags

for element in elements:
    cities = element.find_all("h2")[0].text
    print(cities)
    info = element.find_all("p")[0].text
    print(info)

Please note that find_all() method returns a list. If you use find() method it only returns first tag it finds.

Here we searching through elements list and do these

  • For each element object, grab first h2 tag's text (which is city name) and print
  • For each element object, grab first p tag's text (which is city info) and print

Output :

London
London is the capital of England and it's been a British settlement since 2000 years ago. 
Paris
Paris is the capital city of France. It was declared capital since 508.
Tokyo
Tokyo is the capital of Japan and one of the most populated cities in the world.

You nailed it champ! Thanks for joining me in this quest!

giphy.gif

Sort:  

Nice short little tutorial. You should consider using the tag #SteemSTEM next time you do a post on Science, Technology, Engineering, or Math. They have a curation team to upvote quality STEM content.

Thank you for information