Part -2 Interacting with Web Pages using MechanicalSoup

in #utopian-io7 years ago (edited)

What Will I Learn?

I intend to cover the following concepts in this part of the tutotial

  • You will learn how to navigate in through pages in a website.
  • You will learn different functions and methods for interacting with forms.
  • You will learn to build a basic form filling bot.

Requirements

The user is expected to have the following requirements for clear understanding of the tutorial

  • Basic knowledge on Python programming language.
  • Python 3+ installed PC (For practical understanding)
  • Read my previous tutorial on MechanicalSoup to ensure continuation.

Difficulty

  • Basic

Tutorial Contents

So Let's continue our journey to learn MechanicalSoup. As stated above we will learn how to automate interactions with a webpage with MechanicalSoup, like we use a browser to interact with them. This might come in handy if you are into creating bots and web scrapers.

1. Navigation

First of all let's look at how to navigate through pages in a website. We know that every page in a website has a unique URL associated with it. While using a browser we click on links available on the pages to navigate through pages in the website.

open() method

We have used and familiarized the open() method in the previous part of this tutorial. We saw in the how to open a webpage in the browser instance in MechanicalSoup. It involved passing the whole absolute URL of the webpage we are targeting to the function browser.open()

ie. For example to open Steemit.com

browser.open('https://steemit.com')

You are free to use the open() method every time you need to go to another page in the site or to a different website. Its gonna be a mess if we have to specify the absolute URL every time for accessing a specific page (Unless you want to move to a different website). Luckily there is a shortcut method for this.

follow_link() method

follow_link() method can be used to move to different pages by just specifying only the relative path to the page. ie. We can now avoid the https://website.domain part and just specify a page.

For example, if we want to move to the pages that contain the latest posts in Steemit, then we just have to do this after browser.open('https://steemit.com')

browser.follow_link('created') # Since the new posts are listed in https://steemit.com/created

Now our browser instance is pointed towards the link https://steemit.com/created and contains the contents of that page.

NOTE: follow_link() should only be used in the case if you want to move to a different page in the same website. ie. As long as the domain part stays the same, it will work. And in case if you need to move to a different website, then you should use the open() method instead of follow_link().

So I hope its clear about surfing through pages in a website.

2. Interacting with Forms

Let us see the different methods we need for this:

select_form() method

It is a function to select a particular form on a page. It pretty much works just like a CSS selector which is really helpful in selecting the form we need, when a page contains more than a single form.

Everyone who worked with HTML and CSS is familiar with CSS selectors, which are used to give styling properties to a single or a group of elements. Here is a great guide on CSS selectors from w3schools

It is a function associated with the browser instance, it is called as

form = browser.select_form('optional_css_selector')

The above code will return a mechanicalsoup.form.Form object, which has all the input fields in the form, which can be accessed as a Python dictionary and also some cool functions to help us with the form filling.

In case if the page doesn't have multiple forms, then calling just select_form() without any arguments will do the trick.

get_current_form() method

This method is a member function of the browser instance which will return the currently selected form object.

form = browser.get_current_form()

print_summary() method

It is a method associated with the Form object, which is returned by the select_form() method, On calling this method, it returns the list of all the input elements present inside the form object.
You can print the list of inputs either like this:

browser.select_form('optional_css_selector').print_summary()

Or like this, by using get_current_form() function:

browser.get_current_form().print_summary()

Assigning values to input fields.

It's very simple to assign values to the form fields in MechanicalSoup. First of all you have to select the particular form using select_form() then you can assign values to the form fields like this:

We utilize the name of the input field to assign values to the form inputs, If you have some experience in the Web development then you will know that the POST request consists of a JSON structure like the name fields acts as the keys and value act as the corresponding value.

The same mechanism is applied here. You can just assign the values to the corresponding input fields by just using the browser object:

For example if we have an input named "Name", then to assign a value to the field you just have to:

browser['Name'] = 'Ajmal Noushad'

Simple isn't it?

launch_browser() method

This will launch a real browser with the current page that is in the browser instance. But you can see that the browser doesn't go to the original URL, but instead goes to local URL to a file stored inside your PC, because it also contains the form that you just filled along with it. So using launch_browser() function you can just confirm that you just did everything right.

So that's all the methods we need, so just get on the play ground.

Creating a Basic form filling bot

We will now see how we can fill a form in a webpage using the above functions. For that purpose I have made a dummy webpage with a form that consists of some input fields. I used Django to build this. You can find the code here : Github Repo

The form looks like this :
form.png

I hosted it into PythonAnywhere for practice, you can access it here : DummyForm

Lets proceed,

First of all, open a python console.

If you have read the previous tutorial we have set up an environment with mechnaicalsoup installed in it. So you just have to activate the virtualenv and type python in the terminal.

$ source env-name/bin/activate
$ python

Now in the python console, import mechanicalsoup

import mechanicalsoup

Create a new browser instance

browser = mechanicalsoup.StatefulBrowser()

Open the URL of the webpage that contains the form, in our case 'ajmal.pythonanaywhere.com'

browser.open('http://ajmal.pythonanaywhere.com')

Select the form in the webpage using select_form()

browser.select_form()

Remember: No CSS selectors are given since the age contains a single form.

To list the input fields we use print_summary()

browser.get_current_form().print_summary() # get_current_form() returns the form object pointing to the currently selected form

The above command will give you an output like:

<input name="csrfmiddlewaretoken" type="hidden" value="lIMnuL2olx1GEnGyTms3rDLMEB8lZKqCRWd9qo111631GkSEBNhEjv4IOAHDniym"/>
<input class="form-control" id="id_name" maxlength="20" name="name" required="" type="text"/>
<input class="form-control" id="id_age" name="age" required="" type="number"/>
<select class="form-control" id="id_gender" name="gender">
<option value="1">MALE</option>
<option value="2">FEMALE</option>
</select>
<textarea class="form-control" cols="40" id="id_about_me" name="about_me" required="" rows="10"></textarea>

Now fill the input fields with data using their name attributes.

browser['name'] = 'My Name'

browser['age'] = 21

browser['gender'] = '1' # See that we gave the value attribute of the select options for the gender input field. ie. '1' for 'MALE' and '2' for 'FEMALE'

browser['about_me'] = 'I am learning MechanicalSoup'

Note that radio inputs and select inputs should provide the correspongding value attribute of the item that needs to be selected.
Inputs for checkboxes can be given as an array of values like browser['checkbox'] = ['val1', 'val2']

Now lets take a look how it looks on a browser now,

browser.launch_browser()

The above command will open a local webpage with same contents as of the original webpage with the form filled with the values given by us.

browser.png

Finally submit the form

browser.submit_selected()

Now we will get response 200 indicating that everything went well and Form is successfully submitted.

<Response [200]>

Finally lets checkout the content of the response,

browser.get_current_page()

The above will return the page after the form submit is occured. If you did it through a browser you can see that you get a 'Success' as the http-response message

In console we see that as

<html><body><p>Success</p></body></html>

So that's it, you have now learned how to work with MechanicalSoup to interact with webpages. Feel free to ask any doubts. Thanks for reading...

Curriculum

My previous tutorial on MechanicalSoup



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Congratulations @ajmaln! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Your Post Has Been Featured on @Resteemable!
Feature any Steemit post using resteemit.com!
How It Works:
1. Take Any Steemit URL
2. Erase https://
3. Type re
Get Featured Instantly – Featured Posts are voted every 2.4hrs
Join the Curation Team Here | Vote Resteemable for Witness

Thank you for the contribution. It has been approved.

Very interesting tutorial! I've only ever used BeautifulSoup and didn't even know MechanicalSoup was a thing, it looks really cool though. If I ever find a use for it I will definitely refer to this tutorial!

You can contact us on Discord.
[utopian-moderator]

Thanks for the quick moderation.

Glad you found it helpful.

Hey @ajmaln I am @utopian-io. I have just upvoted you!

Achievements

  • You have less than 500 followers. Just gave you a gift to help you succeed!
  • Seems like you contribute quite often. AMAZING!

Suggestions

  • Contribute more often to get higher and higher rewards. I wish to see you often!
  • Work on your followers to increase the votes/rewards. I follow what humans do and my vote is mainly based on that. Good luck!

Get Noticed!

  • Did you know project owners can manually vote with their own voting power or by voting power delegated to their projects? Ask the project owner to review your contributions!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

Very good tutorial. I did not know that python can be used for this.

Python can literally be used for anything, lol. Python is Life...

Articles like this are a great contribution to the knowledge pool, @ajmaln! Congratulations on it being approved by Utopian!

I've upvoted and resteemed this article as one of my daily post promotions for the @mitneb Curation Trail Project. It will be featured in the @mitneb Curation Trail Project Daily Report for 02 FEB 2018.

Cheers!

You're very welcome, @ajmaln!
Cheers!

Thanks for introducing to us Mechanical Soup. I also only knew about Beautiful Soup. I'm planning to scrape some financial data in the future and Mechanical Soup will be very useful.

Also, I hope you will continue showing us examples using Mechanical Soup and Steemit as you did in the first post. I know there is an API, but sometimes API changes, sometimes just don't work (I can't get it to work in combination of venv and yupiter notebook for example) so it is always useful, to have a backup plan, a second tool when first one stops working ;-)

This post has received a 0.04 % upvote from @drotto thanks to: @banjo.

Congratulations! This post has been upvoted from the communal account, @minnowsupport, by ajmaln from the Minnow Support Project. It's a witness project run by aggroed, ausbitbank, teamsteem, theprophet0, someguy123, neoxian, followbtcnews, and netuoso. The goal is to help Steemit grow by supporting Minnows. Please find us at the Peace, Abundance, and Liberty Network (PALnet) Discord Channel. It's a completely public and open space to all members of the Steemit community who voluntarily choose to be there.

If you would like to delegate to the Minnow Support Project you can do so by clicking on the following links: 50SP, 100SP, 250SP, 500SP, 1000SP, 5000SP.
Be sure to leave at least 50SP undelegated on your account.

Congratulations @ajmaln! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

Upvote this notification to help all Steemit users. Learn why here!