Newslint: My Project to Supplement Media Literacy

I just released a new site that I built, Newslint.  Check it out!

A while ago I was reading Hackernews and I saw someone posted a link to the Javascript source for joblint, an app that you copy and paste text from a job listing into, returning results on whether the job sounds fluffy, unrealistic, bro-ish, tech buzzword-y, etc.

I thought it was an awesome idea, and the implementation was such that it was easy to add rules, keywords, and phrases to check against.  Having just gone through a job hunt before being hired by a very professional group of people as a developer at The Barbarian Group, I was sensitive to junk in 90% of job listings.  I was also considering my love for news curation, media literacy, and good journalism, after having been an Army intelligence collector and analyst and then a social media operations analyst for a DHS contractor.  So I started thinking about making a port of joblint so that I could lint news articles in a similar way in an effort to explore media literacy.

pew

Fast-forward a little while and I was preparing to work on a Django project for The Barbarian Group.  I had already put in a lot of time learning Python while at ChatID as an engineering intern, but now I had to learn the Django framework.  Since Python and JavaScript are similar and flexible enough that I didn’t anticipate many problems, I decided I could try to port JavaScript joblint to Python newslint and then incorporate newslint into a Django server and experimental test-bed.

How It Works

During my time doing social media analysis, we hired for and trained for our analysts to be able to quickly assess whether news information was valid, credible, interesting to our client, potentially dangerous, environmentally relevant, etc.  More of an art than a science, this involved knowing which sources tended to put out good info, knowing the current situation and deciphering which new information would most affect the status quo, where the best sources of information in different spheres of influence could be found on the public internet.  This is actually a pretty difficult skill to acquire and that alone has the largest influence on the quality of analysis output.  That is, if only 5% of the info out there is actually game-changing, then less time has to be spent on the other 95% so that more direct analysis can be done on the 5% — but at the same time, the 95% of noise is still relevant as an environmental check.

Media literacy is crucial even if it’s not your job.  A lot of my Army friends are more conservative and they’ll post articles from certain biased sources that end up not being true.  And a lot of my liberal NYC friends will post stuff from advocacy blogs about the NSA and eavesdropping which are demonstrably false or short-sighted.  For others who don’t really consume the news, the tangential connections they have with the news are even more important.  Those decontextualized sound bites from the news are all those people will hear about an issue and so it will largely shape their opinion on the matter without more study.  FOXNews used to be on every TV all day, and now it’s likely you’ll see CNN instead.  Some people only watch The Colbert Report and The Daily Show.  Others watch the worst hours of cable news television, the afternoon lineups on FOX News and MSNBC.

It is crucial to understand how businesses buy people to write in newspapers or make TV ads or form political action groups to shape public opinion through blanketing the air with a specific message.  Non-profits, advocacy groups, and different areas of the government do it as well.  Whenever you see a poster advocating for or against a bill, you should always look up the group named in small print at the bottom and see who’s behind it.  It’s probably not a grassroots campaign — it’s probably astro-turfing.

In short, like any good intel, you should be suspicious of any information that finds its way to you because it most likely was intended to reach you, and wasn’t a happy accident or a sign of unstoppable progress towards that position.  Media literacy helps people decipher incoming input for true intent and agenda.

So that’s what newslint can help you do.  It takes raw text and looks up key words and phrases that indicate credibility, non-partisanship, and professionalism.  Do you read solid sources from solid journalists in solid publications.  Are you learning partisan phraseology that slants your opinion?  How objective and experienced are the people you read?

Here are all the rules for newslint.  I would definitely appreciate an email, or even better, a pull request, if you want to add more rules.

swears = [
 'bloody',
 'bugger',
 'cunt',
 'cock',
 'pussy',
 'dick',
 'douche',
 'jackass',
 'asshole',
 re.compile('fuck(?:er|ing)?'),
 re.compile('piss(?:ing)?'),
 'shit'
]
partisan_words = [
 re.compile('obama[ -]?care'),
 'libtard',
 'nobama',
 'death panel',
 'leftist',
 'communist',
 'malkin',
 'coulter',
 'sharpton',
 'sarah palin',
 'pinhead',
 'limbaugh',
 'o\'reilly',
 'krauthammer',
 'bachmann',
 'ron paul',
 'rand paul',
 'john bolton',
 'alex jones',
 'martin bashir',
 'perino',
 'karl rove',
 'stasi',
 re.compile('police[ -]?state'),
 re.compile('hipp(?:ies|y)'),
 re.compile('fly-?over state'),
 'wingnut'
]
pundit_words = [
 'krauthammer',
 re.compile('(?:thomas|tom) friedman'),
 'ayn rand',
 'john galt',
 'jesse jackson',
 'gold standard',
 'rand paul',
 'ron paul',
 'bachmann',
 'limbaugh',
 'o\'reilly',
 'gingrich',
 'slate',
 'glenn beck',
 'msnbc',
 'huffpo',
 'huffington[ ]*post',
 'fox[ ]*news',
 'coulter',
 'david brooks',
 'john bolton',
 'michael hayden',
 'alex jones',
 'scarborough',
 'chris matthews',
 'sharpton',
 'martin bashir',
 'sarah palin',
 'ezra klein',
 'perino',
 'malkin',
 'evgeny morozov',
 'karl rove',
 'drudge',
 'dowd',
 re.compile('chris(?:topher|) hayes')
]
rag_words = [
 'gawker',
 'tmz',
 'slate',
 'infowars',
 'buzzfeed',
 re.compile('the blaze'),
 re.compile('huffington[ ]?post'),
 re.compile('fox[ ]?news'),
 re.compile('drudge[ ]?report')
]
sensationalism_words = [
 re.compile('tears? apart'),
 'screed',
 'demolish',
 'crush',
 re.compile('brown[- ]?shirt'),
 'hitler',
 'gestapo',
 'snitch',
 'stooge',
 re.compile('game[- ]?chang(?:e|ing)'),
 re.compile('cutting[- ]?edge'),
 re.compile('bleeding[- ]?edge'),
 re.compile('marxis[tm]'),
 re.compile('cron(?:y|ie)'),
 re.compile('fema[ ]?trailer'),
 re.compile('chem[ ]?trails'),
 'delusion',
 'false flag',
 re.compile('racis[tm]'),
 re.compile('meme[- ]?wrangl(?:ing|e)'),
 'flagrant',
 'cult',
 'the establishment',
 'police state',
 re.compile('solutionis[mt]'),
 re.compile('shock(?:ing|er|ed)')
]

And Now, the Tech Details

I ported over the code (it’s not very large) in a day or two, then debugged it for a while.  It worked — I made some additions, and, like joblint, it can be run independently via the command line.  Then I forked joblint and turned it into newslint in a separate git repo.

console

Next I figured I could not only use the opportunity to learn Django but also learn other stuff.  I really hadn’t needed to use LESS or SASS up till now because I was working on code I could just throw classes into, but since then at ChatID and at The Barbarian Group I ran into projects where I wouldn’t be able to modify HTML markup and would have to traverse the DOM or find other ways to hook into the code.  So I set up for SASS and Compass and installed Grunt so I could have all my different tasks (uglification, concatenation, code linting, copying to static directory, etc.) automated.  Along the way I found autoprefixer, which you can run as a Grunt task to take any CSS3 stylings and automatically add support for browser extensions.  I also decided to try Django-Bower (based on Bower) for making updating to the latest version of my JavaScript dependencies easier, and within Django’s environment bubble no less.

Django is incredibly easy to use and you get a lot of control over it, which is something I like about express.js.  But then I’m also coming off learning Drupal (PHP) for a project, which seems like a black box most of the time.

I got a simple version of newslint running on a local Django server and then things snowballed; I fleshed out some JSON endpoints for an API, I enabled form submission for saving news clips, and I wrote some tests in Django’s TestCases and Python’s unittests.  Super-easy, especially after dealing with a somewhat problematic time spent figuring out correct resources and syntax for Angular.js tests with mocha for my project Momentous.

front

And then I figured I would try deploying this Django app to Amazon’s Elastic Beanstalk, because I’d never tried that before!  I ran in to some issues there; my static files directory was split up and not in a standard directory, I had newslint loaded as a git submodule under the newslint-server and automatic deployment services like EB and Heroku don’t like submodules.  I also would have had trouble getting underneath the EB abstraction to make edits directly to server settings.

I decided to tear that app down and just get an EC2 instance (m1-small).  It costs a bit, but not really that much, and I’ll probably take down the instance once there’s no traffic on it.

My small test app turned into a full day deploying the app underneath Varnish and Apache to a new ubuntu instance.  I plugged in memcached and set up mysql and added appropriate Django middleware to help get my pagespeed score up and remove warnings and errors.  The full control of an EC2 instance made this all super easy whereas I’m not sure how I would’ve managed dealing with the EB thicket.

I had some problems making sure my headers were set up correctly so that stuff would get cached okay but tweaks to Apache and Varnish settings, along with Django, helped to mitigate those problems.  Updates to code were as easy as a git push on my local computer and a git pull to the instance.

I ran some apache bench tests on the server and it seemed okay; one thing I think I ran into was that having a form on the front page slows down the response slightly because it’s not caching the page (CSRF token?).  ab tests to a non-calling API endpoint on the other hand were super fast.  Most of the time, pageloads are under 400ms, which is pretty sweet!  Thank you to the god of page loadability, Ilya Grigorik!

So then it was rather late and I was thinking, hey, how hard would it be to get a domain for this instead of the long EC2 address?  Well, whois’ing newslint.com actually showed that it wasn’t owned!  And namecheap sells domains for $10-13 usually so I picked it up and pointed it from namecheap to Route 53 and all the sudden very early in the morning I had a working newslint.com!

result

I found some more bugs the next day, for which I’m writing regression tests, but otherwise this has been a really successful learning experiment and confidence booster for my developer chops.  Really glad this worked out so well, and thank you to Rowan Manning for his joblint work and to The Barbarian Group for letting me be a developer.