Python Web Scraping Like a Boss

odesias (25)in #python • 7 years ago

Welcome champ! If you want to go through this tutorial, you need basic knowledge of these :

Python syntax
lists, dictionaries
for loop

Step 1 - Import these libraries

import requests
from bs4 import BeautifulSoup

In first line, we will send a get request to get HTML code from website we want and we need requests library for that. In second line, BeautifulSoup is going to help us search through HTML tags.

Step 2 - Get that freakin' HTML code

r = requests.get("https://pythonhow.com/example.html")
content = r.content

We had our get request from given url
We assigned its content to a variable

Step 3 - Parsing that bad boy

soup = BeautifulSoup(content,"html.parser")
elements = soup.find_all("div",{"class":"cities"})

In this code, we assigned soup variable. Than we took the div tags in cities class using soup.

Step 4 - Looping those tags

for element in elements:
    cities = element.find_all("h2")[0].text
    print(cities)
    info = element.find_all("p")[0].text
    print(info)

Please note that find_all() method returns a list. If you use find() method it only returns first tag it finds.

Here we searching through elements list and do these

For each element object, grab first h2 tag's text (which is city name) and print
For each element object, grab first p tag's text (which is city info) and print

Output :

London
London is the capital of England and it's been a British settlement since 2000 years ago. 
Paris
Paris is the capital city of France. It was declared capital since 508.
Tokyo
Tokyo is the capital of Japan and one of the most populated cities in the world.

You nailed it champ! Thanks for joining me in this quest!

#programming #web #scraping

7 years ago in #python by odesias (25)

Sort:

geekpowered (69) 7 years ago

Nice short little tutorial. You should consider using the tag #SteemSTEM next time you do a post on Science, Technology, Engineering, or Math. They have a curation team to upvote quality STEM content.

$0.00

[-]

odesias (25) 7 years ago

Thank you for information

$0.00