Original Works Bot- A bot that doesn't work
So I'm sure most of you have come across the @originalworks bot before without thinking much about it. Well, as someone who spends most of his time on Steemit fighting plagiarism I see this bot far more often than I should. I understand the thinking behind the bot and think its goal is noble but it is badly broken and really needs to be shut down and reworked.
What is the Original Works bot?
As I mentioned before, I think the bot was started with good intentions. The purpose of the bot is to attempt to verify that the post someone has written is original. The author, or anyone commenting on the post, can simply type @originalworks or !originalworks and the bot will check the post and post a comment determining whether the content is original or not.
In theory, this would be a valuable tool. Attempting to determine whether something is original or not can be very difficult and time consuming. You would have to look at the article and manually do a little digging around to see if there is anything a little too similar already out there. This bot would be quite the time saver and be a visible badge that someone was actually contributing some original content. Not only that, but its kind enough to give you a small upvote just for calling it to your post. Sound a little too good to be true? That's because it is..
Why it doesn't work
Creating a bot to detect plagiarism is incredibly difficult. The current gold standard in plagiarism detection is the cheetah bot created by @anyx. He put a ton of work into it and it takes a lot to maintain, as he spells out here in this FAQ about cheetah-
I have had to sacrifice accuracy for price, as the cost of running Cheetah is actually quite high. At the current rate, Cheetah has a direct cost of about $150-$200 USD per week, with an indirect cost even higher -- and rising. The funding to pay for her currently comes from Steemcleaners log posts. Development is ongoing, and has never stopped! I mostly aim to improve detection and reduce false positives (as I continuously receive negative flak for any mistake). While I don't expect a reward for the continued development, nor do I post updates about development (as the algorithm I have developed for content detection is effectively a trade secret, and thus sharing updates to it would be silly), I consider my role as a witness ( @anyx ) as the direct support. This keeps the project community driven, rather than sponsored. You can vote for witnesses here.
Even putting in that kind of effort, cheetah isn't perfect. It misses things for any number of reasons or will occasionally even come up with a false positive. The big difference is that cheetah, besides being more accurate, isn't called to a post to prove authenticity. It does its work in the background checking every post and commenting when it finds a match. So how much worse is @originalworks? lets look at a few direct comparisons.
Here is a good example of something I come across often. This is a post from last week-https://steemit.com/warcraft/@cryptopaze/world-of-warcraft-vanilla-pvp-orrim-20171229t93821429z. Scrolling down in the comments you will see this-
In a heads up battle of the bots Cheetah comes out the clear winner. The original post is copied word for word from this post, yet original works still certified it original. Think this is an isolated incident? Here is a post where the author links to a source, cheetah finds a source, yet original works still manages to certify it as original. I could find dozens of more examples where people call original works to blatantly plagiarized posts in an attempt to legitimize them. Like I said, no detection system is perfect. Here is a post that was missed by both cheetah and originalworks. The user takes the text from a youtube video and tried to pass it off as their own. The difference between cheetah missing it and originalworks missing it is that when cheetah misses it doesn't validate a post, when original works misses it asserts that something is original.
Hell, just in the time its taking to write this post I picked a random article from an originalworks comment and this is what I saw. https://steemit.com/steemit/@sumansid/why-is-zuckerberg-entering-crypto is a direct ripoff of https://techcrunch.com/2018/01/05/mark-zuckerberg-is-right-to-explore-the-potential-of-the-blockchain-for-facebook/ that originalworks certified as original content. Feel free to take a gander at the latest originalworks comments and I'm sure you wont have to click through too many before finding something that is clearly not original.
So what should be done?
First, I think that unless originalworks can drastically improve the accuracy of his bot he should shut it down. In my work with @steemcleaners I have come across people who believe that this bots comments actually mean something, that it legitimately means a post is original. That is a dangerous message to send considering how often the bot is wrong.
Second, if the bot isn't going to be shut down, at least make it stop upvoting people. That just gives people more incentive to call a broken bot to every post, plagiarized or not.
I really do think this bot was a good idea to begin with but unfortunately it is far too inaccurate to be worth anything here on steemit. It not only fails to detect a large amount of plagiarized posts but falsely legitimizes them as well. I have no idea if the bot will actually be shut down/reworked but I hope this post at least gets the message out there that a comment by originalworks in no way means the post is original. When it comes to plagiarism detection the best method is still just good old fashioned common sense and a little detective work.
Feel free to share/resteem this message so that more people are aware just how badly broken the originalworks service is. Leave me any comments/criticisms you might have, I'm happy to respond to them all.
Oh man, how I wish that quote from me was still up to date. :)
An up-to-date clarification: the current cost of @cheetah is actually now about $200 per DAY. And the funding has actually moved from @steemcleaners (which I don't take any reward from) and is instead fully funded by the cheetah log and my witness pay.
Otherwise, great post. It's really disappointing to see people put so much trust in that bot when it is far, far worse than cheetah.
@fingolfin - FANTASTIC post. I was already blown away by the original quote. To hear that it's actually $200 a day is crazy. I'd leave the original quote in, since it talks about the process, and then put the update from @anyx in right below it in the content so people know how expensive and time consuming the job currently is. I think it's important to appreciate the unsung heroes. He's been doing this since I started a year and a half ago. Nice job here. You got a vote and a resteem from me.
Well I just went with the information publicly out there :) Yeah, I think even with the comparative accuracy of cheetah it would still be risky to use it to try and certify the originality of a given post. There are just too many variables out there to account for.
Hehe, no problem. I need to make another witness update soon.
Indeed, it's pretty hard to claim something is original. Cheetah might only catch 70% of copy/paste, but has a 99.8% accuracy when she does leave a comment. It's a trade off, I could have programmed her for the reverse (catch 99% of copy paste, but only with 70% accuracy -- thus many false positives) but the design was to have high accuracy rather than high catch rate.
But either way you do it, it's not going to be perfect, so it's wrong to claim so. Hence, cheetah's comment is quite simple and relaxed, to hopefully not leave a bad impression on the 0.2% she makes a mistake on.
I think it's better how you programmed it - to have a lower catch rate with higher accuracy. Gives it more authority.
There is no reason to copy from someone else,
Be original and let us see you in posts
And not to see anyone else, who loves you wants to see you.
Your progress may be slower - but it will be of high quality.
I agree, and I've seen the people who are the false positives have pleasant short responses to cheetah. Hopefully anyone reading the thread will easily see why cheetah marked it as a false positive and know.
May I ask what the costs are? Are we talking electricity costs to run the machine that runs the bot or labor/development costs to constantly update it? Or... something else because I don't know a ton about what makes magic computer things cost money...?
It's pretty much sickening that people can pay others to upvote them.
Hey, I'm the creator of @originalworks.
Although I disagree with some of the claims you are making, I understand your intentions and concerns. @originalworks is continuously being worked on and improved.
However, to address any potential misunderstanding, I have added a disclaimer to the @originalworks response.
Thanks, I understand the difficulty of trying to create something like the originalworks bot and I think your intentions are good. Hopefully it continues to improve :)
I tagged you in two of my original posts yesterday and got no response.
While I agree that originalworks is not very accurate, I can't imagine that a bot will ever be. Sure, simple copying can be detected with some effort. But with anything beyond that, you are inside a twilight zone already.
May be it would be a step in the right direction, if the originalworks comment clearly states that it is not a official seal from Steemit, if its that what some people assume.
I don't think that's good enough. originalworks needs to not deem things as original. It should rephrase it's validation with a disclaimer to say something like, "I'm correct only 60% of the time--and I am not a Steem supported app" or whatever.
Oh I agree that there is no bot that can be 100% accurate, but at least the other plagiarism detection bot doesnt claim to certify posts as original. Thats why @steemcleaners are actual people and not robots.
I just come across way too many plagiarists who call originalworks to every post they make and I get tired of seeing people think it means something.
Yes, I see your point. But with cheetah its the same, only in the other direction. Lots of people got warnings ( may be its somewhat better now ) for no reason. And in some cases its still bugging people, for example if someone makes a frequent post, like a lotto every week which is basically the same text each time.
In such cases the problem lies exactly in the fact, that it gives its opinion without a request - or a guarantee for being correct. And it tends to scare people a little.
However, I cant see how this could be solved. There is no way that all posts could be checked by real people, not now and even less in the future, if Steemit keeps growing like it does now.
You make a good point with the "not being called" thing, on repeated posts. I wonder if the two could marry and make something more broad?
Right, but at least cheetah is doing what it is supposed to. It lets a person know if the content posted can be found somewhere else online. Cheetah doesn't flag, make any judgement on the originality, that is left for actual people to review and act on.
Well, I guess originalworks is also doing what its designed to do - sending the reply after being called. It doesn't seem to check anything at all.
But anyway, its just one of the many things around here that need to be observed - and not the most critical one either.
Exactly! it USED to flag and castigate, in the early days. But then changed the diction to be more friendly, since there is always that chance it IS original and came from another place.
It would really suck to post something original, only to get a message claiming it's been stolen from somewhere. I could see it causing people to turn cranky.
For the most part, when this happens, the poster knows why. It's because it was a contest post, or whatever, that repeated a significant chunk of text from a prior iteration. If folks respond to the bot, users can read the response and judge for themselves.
I had wondered about this bot, I recently asked a Curie curator who told me not to trust it always do my own checking. It would be great if it worked, saving many curators a lot of digging around and also help steemians know when they are be taken for a ride. many posts I have read and thought this looks great definitely worth an upvote only to hesitate and check my plagiarism source to find its a word for word copy.
Thanks for writing this and getting the inaccuracy of the bot more well known, side issue of writing about it is until it is deactivated many people now know he will upvote you plagarised or not.
Yeah, it is by no means the most pressing issue on Steemit, just something that eventually got in my nerves. You are right, there is no substitute for doing your own checking.
I think you're onto something. In fact, cheater bots might REWARD organic users for legwork. That's a thought.
Excellent post. I have also witnessed the same thing countless times, where cheetah and original works have commented on the same post that was clearly plagarized
And a lot of times they come up with the OPPOSITE result.
@OriginalWorks
;)
The @OriginalWorks bot has determined this post by @fingolfin to be original material and upvoted it!
To call @OriginalWorks, simply reply to any post with @originalworks or !originalworks in your message!
LOL. Would have been even more ironic if it deemed the post as plagiarized.
haha your hilarious..
genius ;-)
I have personally witnessed this conflicting situations before, having original work and a cheetah on the same post.
Original actually is not what it seems to be, it can be maneuovered. Original is always working but is not sufficient enough to be trusted 100%.
I mean I have seen it detect plagiarized posts, but it seems to be wrong far too often.
Yea, I agree with you because a whole lots of guys in my group has been complaining on how cheetah visits posts they created. The worse a friend who told me that cheetah commented on his introducemyselfpost.
I still don't beleive that a bot should be 100% accurate but if it can be more effextive, it will be more appreciated.
I am new to Steemit and have been using OriginalWorks. I thought it would help show that the post was indeed original. I have found a lot of plagiarized stuff just in my own browsing it's actually a pretty big turn off to Steemit. I suppose I really am grateful to cheetah for being there or else I would have a lot of doubt about voting on any post. I like the OriginalWorks seal/graphic it looks nice. I understand you are saying it is flawed but as a newbie it has given me a little feeling of legitimacy, even just to see the 1 upvote on my posts.
Yeah, like I said I think its a good idea in theory. I mean I suppose it does rule out some plagiarism its just that it is far from an authority on a post being original.
Ouff I like to use originalworks to just tag my work as [OC], but after reading this I'm not so sure anymore. Gonna have to do some more reading on the "botconomy".
Cheers!
Most of my topics are reposts from other platforms that did not reward my hard effort for almost 2 decades. Don't anyone dare say it's plagiarism to share your past topics on a new platform.
Its not plagiarism to share your own posts from another site, nobody is saying that it is.
@originalworks is saying it is. That's the problem