Bot that detects spam in comments (using Multinomial Naive Bayes filter)
I created a bot which purpose is to detect spam comments on Steem blockchain. It uses Multinomial Naive Bayes algorithm. It can reply to spam comment or downvote it. I've done it for #polish community, but it can be adapted for every tag (or all tags) - it's a matter of training file.
Log from console:
Running
$ POSTING_KEY=<posting_key> spam_detector.py config.json
Private posting key is stored as environent variable.
Configuration
All parameters are stored in config.json file.
Key | Value |
---|---|
account | account used by bot |
nodes | list of Steem nodes |
tags | tags which are observed |
probability_threshold | threshold to classify as spam |
training_file | input training file |
reply_mode | 0 - without reply, 1 - with reply |
vote_mode | 0 - without vote, 1 with vote |
vote_weight | weight of the vote from range [-100.0, 100.0] |
Training file contains rows with label ham
or spam
like below:
ham For years I am interested in the Anglo Boer war and their fight against the Commonwealth. W. Churchill's military career was full of hits and misses but he gained huge popularity in England after the war. Great post once again.
ham The development of a language can be a very boring topic, but sometimes it can also be exciting. While I was never interested in the development of the English language, I often liked to learn some facts about the development of the German language (maybe since it's my native language). It's nice to see that German has influenced such a young language. Maybe I'd even be able to understand some Afrikaans and maybe it's even worth a try to learn it.
ham This country is a cultural boiling pot like none, so many interesting stories ; struggle for independence, new discoveries, many different influences makes it a unique place , so also the language carries all this exciting details in it. I will definitely start learning Afrikaans .. one day ;)
ham I am South African born. I also grew up in an Afrikaanse town. Despite my background and being able to speak the language, I was not aware of the history. Probably should have paid more attention in school, lol. Thanks for the lesson!
ham I thought as much. So, you've been in this game since April 2016. I must be joking to think I've arrived when I haven't even started. I'm throwing away my white collar to settle for this. Should you need a minnow to mentor, please, let me be the number one to be considered.
ham Great interview, guys! We haven't met in person yet, but I'd say Tom's one of the most grounded people I've ever come to know. I love your focus and straightforwardness, combined with your endless generousity. For many reasons you're one of the most successful but especially most admirable people on this platform.
spam Upvote, follow, resteem
spam Followed and resteemed
spam Great photo
spam Follow me I will follow you
spam Hey Beautyfull love your blog... UPVOTED and RESTEEMED
spam Hi there, i RESTEEMED & UPVOTED for you! Have a nice day.
spam Cool pic bro
spam Thank you for sharing i will resteem it
spam good post
spam Super
Technology Stack
- python3.6
- libraries: steem-python, scikit-learn, pandas, textblob, bs4
Repository contains requirements.txt file.
Roadmap
There is still a lot that can be done:
- enlarging the training set
- adding new algorithms such as Neural Network or Support Vector Machine
- taking into account previous comments, not only current one
- taking into account user reputation
- adding to blacklist / whitelist
Posted on Utopian.io - Rewarding Open Source Contributors
That's awesome. You might want to take a percentage of the data to use as testing data, specifically not data used to train it. This way you can get an idea how accurate the filter is.
@jacek-w, Upvote for supporting you.
Nice work. This really should be part of the core Condenser platform...
Thank you for the contribution. It has been approved. The problem I see here is that you need to keep update your training file, shouldn't it be like checking the users's past comment to see if that user is spamming.
You can contact us on Discord.
[utopian-moderator]
Hey @jacek-w I am @utopian-io. I have just upvoted you!
Achievements
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x