Newslint: My Project to Supplement Media Literacy

I just released a new site that I built, Newslint.  Check it out!

A while ago I was reading Hackernews and I saw someone posted a link to the Javascript source for joblint, an app that you copy and paste text from a job listing into, returning results on whether the job sounds fluffy, unrealistic, bro-ish, tech buzzword-y, etc.

I thought it was an awesome idea, and the implementation was such that it was easy to add rules, keywords, and phrases to check against.  Having just gone through a job hunt before being hired by a very professional group of people as a developer at The Barbarian Group, I was sensitive to junk in 90% of job listings.  I was also considering my love for news curation, media literacy, and good journalism, after having been an Army intelligence collector and analyst and then a social media operations analyst for a DHS contractor.  So I started thinking about making a port of joblint so that I could lint news articles in a similar way in an effort to explore media literacy.

pew

Fast-forward a little while and I was preparing to work on a Django project for The Barbarian Group.  I had already put in a lot of time learning Python while at ChatID as an engineering intern, but now I had to learn the Django framework.  Since Python and JavaScript are similar and flexible enough that I didn’t anticipate many problems, I decided I could try to port JavaScript joblint to Python newslint and then incorporate newslint into a Django server and experimental test-bed.

How It Works

During my time doing social media analysis, we hired for and trained for our analysts to be able to quickly assess whether news information was valid, credible, interesting to our client, potentially dangerous, environmentally relevant, etc.  More of an art than a science, this involved knowing which sources tended to put out good info, knowing the current situation and deciphering which new information would most affect the status quo, where the best sources of information in different spheres of influence could be found on the public internet.  This is actually a pretty difficult skill to acquire and that alone has the largest influence on the quality of analysis output.  That is, if only 5% of the info out there is actually game-changing, then less time has to be spent on the other 95% so that more direct analysis can be done on the 5% — but at the same time, the 95% of noise is still relevant as an environmental check.

Media literacy is crucial even if it’s not your job.  A lot of my Army friends are more conservative and they’ll post articles from certain biased sources that end up not being true.  And a lot of my liberal NYC friends will post stuff from advocacy blogs about the NSA and eavesdropping which are demonstrably false or short-sighted.  For others who don’t really consume the news, the tangential connections they have with the news are even more important.  Those decontextualized sound bites from the news are all those people will hear about an issue and so it will largely shape their opinion on the matter without more study.  FOXNews used to be on every TV all day, and now it’s likely you’ll see CNN instead.  Some people only watch The Colbert Report and The Daily Show.  Others watch the worst hours of cable news television, the afternoon lineups on FOX News and MSNBC.

It is crucial to understand how businesses buy people to write in newspapers or make TV ads or form political action groups to shape public opinion through blanketing the air with a specific message.  Non-profits, advocacy groups, and different areas of the government do it as well.  Whenever you see a poster advocating for or against a bill, you should always look up the group named in small print at the bottom and see who’s behind it.  It’s probably not a grassroots campaign — it’s probably astro-turfing.

In short, like any good intel, you should be suspicious of any information that finds its way to you because it most likely was intended to reach you, and wasn’t a happy accident or a sign of unstoppable progress towards that position.  Media literacy helps people decipher incoming input for true intent and agenda.

So that’s what newslint can help you do.  It takes raw text and looks up key words and phrases that indicate credibility, non-partisanship, and professionalism.  Do you read solid sources from solid journalists in solid publications.  Are you learning partisan phraseology that slants your opinion?  How objective and experienced are the people you read?

Here are all the rules for newslint.  I would definitely appreciate an email, or even better, a pull request, if you want to add more rules.

swears = [
 'bloody',
 'bugger',
 'cunt',
 'cock',
 'pussy',
 'dick',
 'douche',
 'jackass',
 'asshole',
 re.compile('fuck(?:er|ing)?'),
 re.compile('piss(?:ing)?'),
 'shit'
]
partisan_words = [
 re.compile('obama[ -]?care'),
 'libtard',
 'nobama',
 'death panel',
 'leftist',
 'communist',
 'malkin',
 'coulter',
 'sharpton',
 'sarah palin',
 'pinhead',
 'limbaugh',
 'o\'reilly',
 'krauthammer',
 'bachmann',
 'ron paul',
 'rand paul',
 'john bolton',
 'alex jones',
 'martin bashir',
 'perino',
 'karl rove',
 'stasi',
 re.compile('police[ -]?state'),
 re.compile('hipp(?:ies|y)'),
 re.compile('fly-?over state'),
 'wingnut'
]
pundit_words = [
 'krauthammer',
 re.compile('(?:thomas|tom) friedman'),
 'ayn rand',
 'john galt',
 'jesse jackson',
 'gold standard',
 'rand paul',
 'ron paul',
 'bachmann',
 'limbaugh',
 'o\'reilly',
 'gingrich',
 'slate',
 'glenn beck',
 'msnbc',
 'huffpo',
 'huffington[ ]*post',
 'fox[ ]*news',
 'coulter',
 'david brooks',
 'john bolton',
 'michael hayden',
 'alex jones',
 'scarborough',
 'chris matthews',
 'sharpton',
 'martin bashir',
 'sarah palin',
 'ezra klein',
 'perino',
 'malkin',
 'evgeny morozov',
 'karl rove',
 'drudge',
 'dowd',
 re.compile('chris(?:topher|) hayes')
]
rag_words = [
 'gawker',
 'tmz',
 'slate',
 'infowars',
 'buzzfeed',
 re.compile('the blaze'),
 re.compile('huffington[ ]?post'),
 re.compile('fox[ ]?news'),
 re.compile('drudge[ ]?report')
]
sensationalism_words = [
 re.compile('tears? apart'),
 'screed',
 'demolish',
 'crush',
 re.compile('brown[- ]?shirt'),
 'hitler',
 'gestapo',
 'snitch',
 'stooge',
 re.compile('game[- ]?chang(?:e|ing)'),
 re.compile('cutting[- ]?edge'),
 re.compile('bleeding[- ]?edge'),
 re.compile('marxis[tm]'),
 re.compile('cron(?:y|ie)'),
 re.compile('fema[ ]?trailer'),
 re.compile('chem[ ]?trails'),
 'delusion',
 'false flag',
 re.compile('racis[tm]'),
 re.compile('meme[- ]?wrangl(?:ing|e)'),
 'flagrant',
 'cult',
 'the establishment',
 'police state',
 re.compile('solutionis[mt]'),
 re.compile('shock(?:ing|er|ed)')
]

And Now, the Tech Details

I ported over the code (it’s not very large) in a day or two, then debugged it for a while.  It worked — I made some additions, and, like joblint, it can be run independently via the command line.  Then I forked joblint and turned it into newslint in a separate git repo.

console

Next I figured I could not only use the opportunity to learn Django but also learn other stuff.  I really hadn’t needed to use LESS or SASS up till now because I was working on code I could just throw classes into, but since then at ChatID and at The Barbarian Group I ran into projects where I wouldn’t be able to modify HTML markup and would have to traverse the DOM or find other ways to hook into the code.  So I set up for SASS and Compass and installed Grunt so I could have all my different tasks (uglification, concatenation, code linting, copying to static directory, etc.) automated.  Along the way I found autoprefixer, which you can run as a Grunt task to take any CSS3 stylings and automatically add support for browser extensions.  I also decided to try Django-Bower (based on Bower) for making updating to the latest version of my JavaScript dependencies easier, and within Django’s environment bubble no less.

Django is incredibly easy to use and you get a lot of control over it, which is something I like about express.js.  But then I’m also coming off learning Drupal (PHP) for a project, which seems like a black box most of the time.

I got a simple version of newslint running on a local Django server and then things snowballed; I fleshed out some JSON endpoints for an API, I enabled form submission for saving news clips, and I wrote some tests in Django’s TestCases and Python’s unittests.  Super-easy, especially after dealing with a somewhat problematic time spent figuring out correct resources and syntax for Angular.js tests with mocha for my project Momentous.

front

And then I figured I would try deploying this Django app to Amazon’s Elastic Beanstalk, because I’d never tried that before!  I ran in to some issues there; my static files directory was split up and not in a standard directory, I had newslint loaded as a git submodule under the newslint-server and automatic deployment services like EB and Heroku don’t like submodules.  I also would have had trouble getting underneath the EB abstraction to make edits directly to server settings.

I decided to tear that app down and just get an EC2 instance (m1-small).  It costs a bit, but not really that much, and I’ll probably take down the instance once there’s no traffic on it.

My small test app turned into a full day deploying the app underneath Varnish and Apache to a new ubuntu instance.  I plugged in memcached and set up mysql and added appropriate Django middleware to help get my pagespeed score up and remove warnings and errors.  The full control of an EC2 instance made this all super easy whereas I’m not sure how I would’ve managed dealing with the EB thicket.

I had some problems making sure my headers were set up correctly so that stuff would get cached okay but tweaks to Apache and Varnish settings, along with Django, helped to mitigate those problems.  Updates to code were as easy as a git push on my local computer and a git pull to the instance.

I ran some apache bench tests on the server and it seemed okay; one thing I think I ran into was that having a form on the front page slows down the response slightly because it’s not caching the page (CSRF token?).  ab tests to a non-calling API endpoint on the other hand were super fast.  Most of the time, pageloads are under 400ms, which is pretty sweet!  Thank you to the god of page loadability, Ilya Grigorik!

So then it was rather late and I was thinking, hey, how hard would it be to get a domain for this instead of the long EC2 address?  Well, whois’ing newslint.com actually showed that it wasn’t owned!  And namecheap sells domains for $10-13 usually so I picked it up and pointed it from namecheap to Route 53 and all the sudden very early in the morning I had a working newslint.com!

result

I found some more bugs the next day, for which I’m writing regression tests, but otherwise this has been a really successful learning experiment and confidence booster for my developer chops.  Really glad this worked out so well, and thank you to Rowan Manning for his joblint work and to The Barbarian Group for letting me be a developer.

Building Online Communities

[Before I begin, I just wanted to link to this O’Reilly Radar post that shows how Facebook continues to blow away its competition, with 175 million users worldwide.  Another conflicting post from another source has a different number of total users, at 222 million.  Facebook is posting great growth numbers abroad and in the US — I say all this because I believe Facebook is taking over the planet in social networking shortly before the personal data jailbreak is to occur.]

Somewhere between researching my final orals exam topic of “individualized identity and reputation for international development” (for my MSFS degree) and studying how to design both a competitive and collaborative ecosystem for my start-up, I came across some very cool pages at Yahoo!.

Yahoo!’s developer network has available some tips and examples of how to build competition, reputation, rankings, leaderboards, and other social interaction devices into a web site.

Check some of them out:

YDN (Yahoo! Developer Network) has grouped these and many other categories loosely under “Reputation” in one of its menu hierarchies.

These pages have some interesting linkages.  From one post it links to:

“The famed #1 book reviewer on Amazon.com (who does claim to be a speed-reader) posts, on average, 7 book reviews a day. So not only does Harriet have time for reading all these books, she can also whip off reviews of them pretty quickly, too.”

Another example:

“Avoid even slightly offensive names for levels (e.g., Music Hotshot! or Photo Flyguy!)

  • These may be learnable with appropriate supporting material, but remember that reputations are also a form of self-expression and odds are good that a sizable portion of your community won’t want to be identified with frivolous, insulting or just goofy-sounding labels.
  • Ambiguous level names like these tested very poorly with some of our users.”

What’s interesting to me about all this is that it provides some basic examples of when to use certain systems and when not to.  Sometimes you may not want people to be competitive, because it may detract from their desires to collaborate.  What I read between the lines is that different cultures will adopt different preferences for how their self-designed systems will create and generate the maximum value and benefit for them.  Such a system might not be of maximum utility to another culture, however.

This implies that systems may need to be designed that are flexible to different peoples’ values.  It also implies that certain web sites may work where they were previously thought not to, just by providing an alternate version specific to that culture or tribe.  The easiest example of this to visualize would be language-localized versions of web sites.  Facebook adding Arabic and Hebrew versions recently will bring in many more Arab- and Hebrew- speakers through this alone.  But other cultural dimensions beyond language have yet to be addressed.

Not too long ago, I attended the Future of Web Apps conference in Miami.  It amazed me to see just how involved companies like Yahoo! and Facebook are getting into building online communities.  I also picked up some cool Yahoo! schwag including a foldable map that shows all of Yahoo!’s APIs and services.  Pretty impressive.  What’s even better, these companies are being extremely open about all of this.  The social networking community looked nothing like this when we first began our research not too long ago in August!  Pretty awesome!