Want to spell check? Read the fine print
Posted on May 30 2016
Update: This post has hit Hacker News - you might want to follow the comments over there.
I have been playing around with the very nice Visual Studio Code editor recently. For those who don't know, it's a free source code editor from Microsoft which works cross-platform. The reason I was giving it a look is that it has pretty good support for editing Go, a language I am trying to find the time to learn. I was also keen to see what the New Microsoft could offer a non-microsofty like myself.
I like what I've seen so far, but something interesting came up today. I realised that unlike Sublime, Visual Studio Code wasn't doing any spell checking out of the box. No problem I thought, I bet there is an extension for that. And guess what, there is:
Visual Studio gives you a nice browsing interface to find likely packages, and even get a sense of how popular they are. The clear winner here is Sean McBreen's Spelling and Grammar Checker. You can also see in the above screen shot that you get a link to details about the extension. As I was still pretty new to the ecosystem, I was sort of interested as to what information you got about extensions, and whether or not I'd found the spell check plugin for Visual Studio Code, or just a plugin. Clicking through…
Initially, things are promising. A lot of installs (compared to other extensions) and a bunch of 5 star reviews:
But then, at the top of the description, I found this message greeting me:
Notice: This extension uses the teacher node module which calls the After The Deadline service to check for spelling and grammatical errors. Document text is sent to the service over unencrypted HTTP. Do not use this extension with sensitive or private documents.
So to be clear, this is saying that any text opened in Visual Studio Code with this extension loaded would be sent in plain text to some service I've never heard of. The mind boggles at how terrible this is as an idea for an editor designed for source code editing. I wasn't the only person who thought so:
@samnewman @troyhunt yep. nope. the author of that spellchecker might be the nicest person ever but there is no justification for that. ever
— Laura Bell (@lady_nerd) May 30, 2016
Is This Really True?
My first reaction was "Surely I've misunderstood something, right?". So I enabled the extension and opened up a non-sensitive file, one of my in-progress blog posts. I then downloaded wireshark to take a look. It took a while to sift it out - it's been a LONG time since I used wireshark (so long ago that back then it had still been called Ethereal), and there is a rather large amount of traffic generated from my machine for things like Sonos and Dropbox and the like, but eventually I tracked down what was being sent. Sure enough I could see all the text being sent, unencrypted, over HTTP to the 'After The Deadline' service. So, at least the documentation was accurate, and that this absolutely insane thing was happening.
What's Going On Here?
As per the documentation, the extension is just making use of a node plugin called teacher, which is just a thin node API (last updated over 4 years ago) to help call a service called After The Deadline. This piqued my interest - who were the people behind this service? Could they be trusted?
Their website immediately made me think of some of the pretty bad stock product website from 2008, but I was surprised to see that the company behind the service is Automattic, the company behind Wordpress, amongst other things. This started to make a bit more sense. In the context of a blog post, which is something designed to be shared publicly, the fact that this information is being sent to another party isn't too much of a problem. But in the context of the sorts of things we open up in a source code editor, this is utterly insane.
It's About Trust, Right?
Partly, the problem here is that this traffic is being sent unencrypted. We'll get to that in a moment. But even if this service did use HTTPS (and I can see no good reason why it can't), by using this plugin I am explicitly trusting that not only with the service I send my data to not do anything silly with the information they receive, but that they will ensure it is properly managed and not left lying around to be swept up by some malicious attacker. And again, the amount of sensitive information this could include could be staggering. Source code can be bad enough, but think of the number of times you've put sensitive things like AWS keys or passwords into text files. Now think what happens if they fall into the wrong hands.
The HTTP-only support is also nuts. This means that even if I do trust Automattic to look after my potentially private and sensitive information, I'm opening myself up to people pretending to be After The Deadline. Given the use of a very old node plugin in this tool chain, I did wonder if perhaps After The Deadline did support HTTPS, but that the extension's reliance on the teacher plugin may have limited the ability to use a slightly more sensible protocol. Poking around, it becomes pretty clear that After The Deadline is a freely available service, which Automattic have decided to run for you. You are free to run your own copy of the stack if you want, which you could then decide to protect with HTTPS. From the docs it wasn't clear if Automattic support HTTPS on the version they host, but I couldn't find anything in the FAQ or documentation to imply that they did.
Update: Thanks to a comment over on Hacker News from zerocrates, it seems that After The Deadline does in fact support HTTPS in addition to HTTP (although I think my point about this not being clear in the docs still stands). Another commenter points out a recently opened issue against the teacher module is asking for HTTPS support.
What Should Happen?
Just so we're clear, a sensible solution, for a spell checker that needs to run over sensitive information, is that it should run locally (ideally using a system & user dictionary), without any remote calls needed. It seems odd that I have to spell that out (pardon the pun) but the fact that this extension exists at all seems to be a good enough reason to have to call this out. And it turns out that popularity doesn't always steer us in the right direction - the second spell checking extension that shows up and has a fraction of the installs (88 vs 20K), is a good old fashioned checker that uses a local dictionary. Go use that one instead.
Alternatively of course, we could wait for another party to help us here:
We might see the NSA powered spell checkers soon. It could even tell you how suspicious an email would look like. https://t.co/HTiRvl4o5u
— Thomas Graf (@tgraf__) May 30, 2016
Automattic should give serious thought to supporting their freely available service over HTTPS too. Even though I can't see any situation where I'd want to use this service in this specific context, for the Wordpress ecosystem having this service served over a sensible, and in this day and age easy to implement protocol, would be a very good idea. Update: As pointed out above, it seems that After The Deadline can support HTTPS.
I'm sure that Sean McBreen, who wrote this plugin, has no malicious intent. He spent his own time creating this extension, and gave back to the community. He did a great job implementing and documenting an extension that I think is unfortunately misguided in this particular context. I'm going to try and track down Sean to see if he has any views on this (the marketplace for extensions doesn't seem to give you an obvious way to find or contact authors, which seems an oversight). To be fair to him, he's been extremely clear in the documentation as to what happens under the hood, but I'm still surprised that the use of an external (and un-trustable) service in the context of code is sensible. I'm just as surprised by the number of 5 star reviewers (and the 20K+ installs) listed on the marketplace - perhaps the reviewers have a different view of security to me.
My advice, in the strongest possible terms, is to not use this extension, unless the only things you will ever open with Visual Studio Code are things that you don't mind being viewed by completely unknown people. Oh, and always read the fine print before installing extensions in the future. I know I will…
Back to Blog.