Python For Data Science (#2)- Natural Language Processing (NLP)
Repository
https://github.com/sloria/TextBlob
What Will I Learn?
- Python for data science
- Textblob library
- Sentiment Analysis
- Machine learning concepts for NLP.
Requirements
- A working computer running macOS , windows or linux distribution(ubuntu preferred).
- An installed Python 3(.6) distribution, such as the Anaconda Distribution.
Dependencies ( numpy , scipy , tweepy and textblob).
- Intermediate knowledge about python programming.
Resources
- https://textblob.readthedocs.io/en/dev/
- https://www.quora.com/How-does-sentiment-analysis-work-generally
- http://cs224d.stanford.edu/syllabus.html
- https://learnpythonthehardway.org/
Difficulty
- Intermediate
Tutorial Contents
Python For Data Science (#2)- Natural Language Processing
hello guys,
In this tutorial we're going to learn how to write a python script that uses Twitter to understand how people are feeling about a topic that we choose using the natural language library text blob
in python it will be capable to search twitter for a list of tweets about any topic that we want and then analyze each tweet to see how positive or negative it's emotion is based on the analysis and data.
Setup (installing required dependencies) :-
If you haven't installed textblob
and you want to do it then you can check out this link, it has all the information regarding installation of the textblob
library . First create a directory in which you can run tests and predict using coded Python file. The first step will obviously be opening the code editor. After opening the code editor you will need to import required module from textblob(package) which you can do by typing from textblob import TextBlob
.
Sentiment Analysis
So, how does sentiment analysis work well when we receive some input text (like a tweet or status) well first we want to split it up into several words or sentences this process is called tokenization because we're creating small tokens from big texts we can just count the number of times each word shows up once a text is tokenized this is called the bag of words model then we could look up the sentiment value for each word from a sentiment lexicon that has it all pre-recorded to classify the total sentiment value of the input data or text.
"The more data we train it on or pass ,the more accurate the Sentiment Analysis output will be."
Sentiment Analysis using twitter with textblob
Well you might be thinking why we are using twitter as our data resource , it is because to perform sentiment analysis we need the data or text that consist of emotion and sentiments and twitter is a treasure trove of sentiment's , people around the world output thousands of reactions and opinions on every topic under the sun every second of every day it's like one big psychological database that's constantly being updated and we can use it to analyze millions of tech snippets in seconds with the power of machine learning.
So our process will be to register to access Twitter's API and install our dependencies (tweepy) then write our sentiment analyzer script.
let's start by signing up to use Twitter's API an API or application programming interface is a gateway that lets you access some servers internal functionality in our case Twitter so we'll be able to read and write tweets right into our app using Twitter's API let's sign up for it in the browser first we want to make sure that we have our own Twitter account set up with a verified phone number attached to it once we have that we can create our first Twitter app.
we'll select the create new app icon and type in a name and description of our app as well as a website the website field is kind of like a liberal arts degree it doesn't really matter then we can click the Create button at the bottom and our app is created.
we can click on our app and go to the keys and access tokens tab we're going to need to use these keys in our script to let us authenticate or verify our identity with Twitter.
let's move on to installing our dependencies our first dependency is called tweet pi which is our library for accessing the Twitter API or other dependency is called text blob (installed earlier) which will help us perform the actual sentiment analysis.
Dependencies :
Name | Command |
---|---|
textblob | pip install textblob |
tweepy | pip install tweepy |
All right first things first we want to create 4 variables
that authenticating with Twitter will require we will find these in the dashboard and we can copy and paste each of them as strings let's authenticate with Twitter which basically means login via code to do that we'll create a variable called off for authentication and use the OAuth handler method of tweepy this method takes two arguments the consumer key
and the consumer secret
this method was written inside of the tweepy library with a bunch of code that is hidden to us we don't need to know the details for our use case we can just name the method and all of its functionality is in our hands the arguments are what the method uses to perform its internal calculation whatever it is we're halfway through authentication the other half is to call the set access token method on the auth
variable which takes two arguments the access token
and the access token secret
that's it we've created our authentication variable now.
we want to create our main variable from which we'll do all of our Twitter magic we'll call itapidata
and assign it a value from the tweepy API method which takes a single authentication argument which we'll fill in right here now that we have our apidata
variable we can perform a whole bunch of different possible methods but what we're going to do is search for tweets for our use case we want to collect tweets that contain a certain keyword so to do that let's create a public_tweets
variable that is going to store a list of tweets to fill it we'll call the search
method on the apidata
variable the search method takes a single argument let's make it cryptocurrency
this method is going to retrieve a bunch of tweets that contain the word cryptocurrency
.
we can print them all out to terminal to do that we'll create a for loop for tweet in public tweets : a for loop increments through every value in a list so the question now is what do we want to do with each tweet ok let's print them out each tweet has a text attribute which is the string version of it so we'll print tweet.text
it's time to perform our sentiment analysis.
we've created an analysis variable that will store our analysis and call text blob with our tweet string as the only argument then we can print out the sentiment attribute of the analysis variable we can see each tweet related to cryptocurrency
as well as its sentiment analysis.
polarity
polarity measures how positive or negative some text is ?
NOTE:
Polarity Scale = between negative 1 and 1.
subjectivity
subjectivity measures how much of an opinion it is versus how factual it is ?
so to break it down sentiment analysis is the process of extracting and understanding human feelings from data.
Results: for "cryptocurrency "
RT @justinsuntron: #TRON #TRX will be listed on @IDAX11 the first Mongolian licensed Exchange, which already obtained the authorization fro… Sentiment(polarity=0.25, subjectivity=0.3333333333333333)#Crypto #Blockchain #ARAW #arawtoken #icosale #ecommerce #payment #ether #ethereum #bitcoin #cryptocurrency #ICO… https://t.co/X0Rts5qjUL
Sentiment(polarity=0.0, subjectivity=0.0)RT @CoinFirefly: 🌟🌟FFC is excited to announce our first airdrop campaign🌟🌟
♦ GET $10 worth FFC coins for Free!!!
💰Join Now- https://t.co/a…
Sentiment(polarity=0.4265625, subjectivity=0.49583333333333335)#bitcoin at under $6k will never ever happen again. 🚀📈
#Crypto #cryptocurrency #Blockchain #CryptoTrader
Sentiment(polarity=0.0, subjectivity=0.0)RT @ROOMDAOICO: ROOMDAO COIN (RDC ) - cryptocurrency for feeless travel.
With RDC you can:
Pay for accomodation
Pay for rent a car, bike s…
Sentiment(polarity=0.0, subjectivity=0.0)RT @BlockvestGroup: The First Licensed and Regulated Crypto Currency Exchange & Index Fund based in the US.
Read the whitepaper here : ht…
Sentiment(polarity=0.25, subjectivity=0.3333333333333333)RT @whenhub: Looking to get started with crypto?
We make it easy, and our app is already live. Buy WHEN Tokens and help build the future o…
Sentiment(polarity=0.1898989898989899, subjectivity=0.48611111111111116)How do you think the date of #EOS Mainnet Launch on June 2? $EOS #cryptocurrency #mainnet #babifinance
Sentiment(polarity=0.0, subjectivity=0.0)RT @Zero_Token: Airdrop has been increased
500 people gets 5000 Zero Token
first quarter
Retweet
Follow
Comment ERC-20#airdrop #ico #e…
Sentiment(polarity=0.25, subjectivity=0.3333333333333333)RT @howdooHQ: Be there or be square 😏 #live AMA just for you guys ✌🏾 get those questions ready and private message us to participate 😎 #cry…
Sentiment(polarity=0.11212121212121212, subjectivity=0.4583333333333333)RT @Hodor7777: Are we at the tipping point for use of #xrp? My answer is in the latest XRP Blog: https://t.co/WKOzHyj7Fx @haydentiff @JoelK…
Sentiment(polarity=0.5, subjectivity=0.9)RT @murthaburke: Quantor ICO - Investment Algorithms & Educational Solutions For Cryptocurrency & Traditional Financial Markets!
@BIGMONEY…
Sentiment(polarity=0.08333333333333333, subjectivity=0.3333333333333333)RT @murthaburke: Quantor ICO - Investment Algorithms & Educational Solutions For Cryptocurrency & Traditional Financial Markets!
@BIGMONEY…
Sentiment(polarity=0.08333333333333333, subjectivity=0.3333333333333333)RT @howdooHQ: Be there or be square 😏 #live AMA just for you guys ✌🏾 get those questions ready and private message us to participate 😎 #cry…
Sentiment(polarity=0.11212121212121212, subjectivity=0.4583333333333333)
Thank you for your time , play and Learn the simple way!
Curriculum
Proof of Work Done
https://github.com/skpjr001/ml_Natural-Language-Processing_textblob
muy buena informacion que compartes con los usuarios de esta gran comunidad,buena explicacion de codigos y tutoriales,un contenido muy importante para un buen aprendizaje Saludos desde Venezuela@skpjr001.
This post has received a 1.74 % upvote from @booster thanks to: @skpjr001.
The code of your contribution has been found to be plagiarized from the following source here. Plagiarism is a serious offense. Your account has been accordingly banned from receiving utopian reviews.
Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]