Newslint: My Project to Supplement Media Literacy

I just released a new site that I built, Newslint.  Check it out!

A while ago I was reading Hackernews and I saw someone posted a link to the Javascript source for joblint, an app that you copy and paste text from a job listing into, returning results on whether the job sounds fluffy, unrealistic, bro-ish, tech buzzword-y, etc.

I thought it was an awesome idea, and the implementation was such that it was easy to add rules, keywords, and phrases to check against.  Having just gone through a job hunt before being hired by a very professional group of people as a developer at The Barbarian Group, I was sensitive to junk in 90% of job listings.  I was also considering my love for news curation, media literacy, and good journalism, after having been an Army intelligence collector and analyst and then a social media operations analyst for a DHS contractor.  So I started thinking about making a port of joblint so that I could lint news articles in a similar way in an effort to explore media literacy.

pew

Fast-forward a little while and I was preparing to work on a Django project for The Barbarian Group.  I had already put in a lot of time learning Python while at ChatID as an engineering intern, but now I had to learn the Django framework.  Since Python and JavaScript are similar and flexible enough that I didn’t anticipate many problems, I decided I could try to port JavaScript joblint to Python newslint and then incorporate newslint into a Django server and experimental test-bed.

How It Works

During my time doing social media analysis, we hired for and trained for our analysts to be able to quickly assess whether news information was valid, credible, interesting to our client, potentially dangerous, environmentally relevant, etc.  More of an art than a science, this involved knowing which sources tended to put out good info, knowing the current situation and deciphering which new information would most affect the status quo, where the best sources of information in different spheres of influence could be found on the public internet.  This is actually a pretty difficult skill to acquire and that alone has the largest influence on the quality of analysis output.  That is, if only 5% of the info out there is actually game-changing, then less time has to be spent on the other 95% so that more direct analysis can be done on the 5% — but at the same time, the 95% of noise is still relevant as an environmental check.

Media literacy is crucial even if it’s not your job.  A lot of my Army friends are more conservative and they’ll post articles from certain biased sources that end up not being true.  And a lot of my liberal NYC friends will post stuff from advocacy blogs about the NSA and eavesdropping which are demonstrably false or short-sighted.  For others who don’t really consume the news, the tangential connections they have with the news are even more important.  Those decontextualized sound bites from the news are all those people will hear about an issue and so it will largely shape their opinion on the matter without more study.  FOXNews used to be on every TV all day, and now it’s likely you’ll see CNN instead.  Some people only watch The Colbert Report and The Daily Show.  Others watch the worst hours of cable news television, the afternoon lineups on FOX News and MSNBC.

It is crucial to understand how businesses buy people to write in newspapers or make TV ads or form political action groups to shape public opinion through blanketing the air with a specific message.  Non-profits, advocacy groups, and different areas of the government do it as well.  Whenever you see a poster advocating for or against a bill, you should always look up the group named in small print at the bottom and see who’s behind it.  It’s probably not a grassroots campaign — it’s probably astro-turfing.

In short, like any good intel, you should be suspicious of any information that finds its way to you because it most likely was intended to reach you, and wasn’t a happy accident or a sign of unstoppable progress towards that position.  Media literacy helps people decipher incoming input for true intent and agenda.

So that’s what newslint can help you do.  It takes raw text and looks up key words and phrases that indicate credibility, non-partisanship, and professionalism.  Do you read solid sources from solid journalists in solid publications.  Are you learning partisan phraseology that slants your opinion?  How objective and experienced are the people you read?

Here are all the rules for newslint.  I would definitely appreciate an email, or even better, a pull request, if you want to add more rules.

swears = [
 'bloody',
 'bugger',
 'cunt',
 'cock',
 'pussy',
 'dick',
 'douche',
 'jackass',
 'asshole',
 re.compile('fuck(?:er|ing)?'),
 re.compile('piss(?:ing)?'),
 'shit'
]
partisan_words = [
 re.compile('obama[ -]?care'),
 'libtard',
 'nobama',
 'death panel',
 'leftist',
 'communist',
 'malkin',
 'coulter',
 'sharpton',
 'sarah palin',
 'pinhead',
 'limbaugh',
 'o\'reilly',
 'krauthammer',
 'bachmann',
 'ron paul',
 'rand paul',
 'john bolton',
 'alex jones',
 'martin bashir',
 'perino',
 'karl rove',
 'stasi',
 re.compile('police[ -]?state'),
 re.compile('hipp(?:ies|y)'),
 re.compile('fly-?over state'),
 'wingnut'
]
pundit_words = [
 'krauthammer',
 re.compile('(?:thomas|tom) friedman'),
 'ayn rand',
 'john galt',
 'jesse jackson',
 'gold standard',
 'rand paul',
 'ron paul',
 'bachmann',
 'limbaugh',
 'o\'reilly',
 'gingrich',
 'slate',
 'glenn beck',
 'msnbc',
 'huffpo',
 'huffington[ ]*post',
 'fox[ ]*news',
 'coulter',
 'david brooks',
 'john bolton',
 'michael hayden',
 'alex jones',
 'scarborough',
 'chris matthews',
 'sharpton',
 'martin bashir',
 'sarah palin',
 'ezra klein',
 'perino',
 'malkin',
 'evgeny morozov',
 'karl rove',
 'drudge',
 'dowd',
 re.compile('chris(?:topher|) hayes')
]
rag_words = [
 'gawker',
 'tmz',
 'slate',
 'infowars',
 'buzzfeed',
 re.compile('the blaze'),
 re.compile('huffington[ ]?post'),
 re.compile('fox[ ]?news'),
 re.compile('drudge[ ]?report')
]
sensationalism_words = [
 re.compile('tears? apart'),
 'screed',
 'demolish',
 'crush',
 re.compile('brown[- ]?shirt'),
 'hitler',
 'gestapo',
 'snitch',
 'stooge',
 re.compile('game[- ]?chang(?:e|ing)'),
 re.compile('cutting[- ]?edge'),
 re.compile('bleeding[- ]?edge'),
 re.compile('marxis[tm]'),
 re.compile('cron(?:y|ie)'),
 re.compile('fema[ ]?trailer'),
 re.compile('chem[ ]?trails'),
 'delusion',
 'false flag',
 re.compile('racis[tm]'),
 re.compile('meme[- ]?wrangl(?:ing|e)'),
 'flagrant',
 'cult',
 'the establishment',
 'police state',
 re.compile('solutionis[mt]'),
 re.compile('shock(?:ing|er|ed)')
]

And Now, the Tech Details

I ported over the code (it’s not very large) in a day or two, then debugged it for a while.  It worked — I made some additions, and, like joblint, it can be run independently via the command line.  Then I forked joblint and turned it into newslint in a separate git repo.

console

Next I figured I could not only use the opportunity to learn Django but also learn other stuff.  I really hadn’t needed to use LESS or SASS up till now because I was working on code I could just throw classes into, but since then at ChatID and at The Barbarian Group I ran into projects where I wouldn’t be able to modify HTML markup and would have to traverse the DOM or find other ways to hook into the code.  So I set up for SASS and Compass and installed Grunt so I could have all my different tasks (uglification, concatenation, code linting, copying to static directory, etc.) automated.  Along the way I found autoprefixer, which you can run as a Grunt task to take any CSS3 stylings and automatically add support for browser extensions.  I also decided to try Django-Bower (based on Bower) for making updating to the latest version of my JavaScript dependencies easier, and within Django’s environment bubble no less.

Django is incredibly easy to use and you get a lot of control over it, which is something I like about express.js.  But then I’m also coming off learning Drupal (PHP) for a project, which seems like a black box most of the time.

I got a simple version of newslint running on a local Django server and then things snowballed; I fleshed out some JSON endpoints for an API, I enabled form submission for saving news clips, and I wrote some tests in Django’s TestCases and Python’s unittests.  Super-easy, especially after dealing with a somewhat problematic time spent figuring out correct resources and syntax for Angular.js tests with mocha for my project Momentous.

front

And then I figured I would try deploying this Django app to Amazon’s Elastic Beanstalk, because I’d never tried that before!  I ran in to some issues there; my static files directory was split up and not in a standard directory, I had newslint loaded as a git submodule under the newslint-server and automatic deployment services like EB and Heroku don’t like submodules.  I also would have had trouble getting underneath the EB abstraction to make edits directly to server settings.

I decided to tear that app down and just get an EC2 instance (m1-small).  It costs a bit, but not really that much, and I’ll probably take down the instance once there’s no traffic on it.

My small test app turned into a full day deploying the app underneath Varnish and Apache to a new ubuntu instance.  I plugged in memcached and set up mysql and added appropriate Django middleware to help get my pagespeed score up and remove warnings and errors.  The full control of an EC2 instance made this all super easy whereas I’m not sure how I would’ve managed dealing with the EB thicket.

I had some problems making sure my headers were set up correctly so that stuff would get cached okay but tweaks to Apache and Varnish settings, along with Django, helped to mitigate those problems.  Updates to code were as easy as a git push on my local computer and a git pull to the instance.

I ran some apache bench tests on the server and it seemed okay; one thing I think I ran into was that having a form on the front page slows down the response slightly because it’s not caching the page (CSRF token?).  ab tests to a non-calling API endpoint on the other hand were super fast.  Most of the time, pageloads are under 400ms, which is pretty sweet!  Thank you to the god of page loadability, Ilya Grigorik!

So then it was rather late and I was thinking, hey, how hard would it be to get a domain for this instead of the long EC2 address?  Well, whois’ing newslint.com actually showed that it wasn’t owned!  And namecheap sells domains for $10-13 usually so I picked it up and pointed it from namecheap to Route 53 and all the sudden very early in the morning I had a working newslint.com!

result

I found some more bugs the next day, for which I’m writing regression tests, but otherwise this has been a really successful learning experiment and confidence booster for my developer chops.  Really glad this worked out so well, and thank you to Rowan Manning for his joblint work and to The Barbarian Group for letting me be a developer.

Facebook Privacy Stats Discussion

My friend Kevin Donovan sent me a link (thanks Kevin) to this post (by Fred Stutzman) criticizing a NYTimes article (by Randall Stross) about how Facebook is affecting privacy boundaries for different age groups.

Personally I think the post is a bit too harsh on the NYTimes article (along with Michael Zimmer‘s), but provides excellent data points in his criticism.

Stutzman quotes some excellent data (see his post for references):

Stross simply has this one wrong.  Instead of misguided intuition, let’s look at the numbers.  In the Summer/Fall of 2008, Jacob Kramer-Duffield and I ran a survey of undergraduate Facebook users.  We employed a list-based simple random sample, with 494 respondents.  When asked the question Have you changed the default Facebook privacy settings to give yourself enhanced privacy in Facebook?, 72.47% responded “Yes.” To the question Based on your Facebook privacy settings choices, who do you allow to see your Facebook profile?, 50% answered “Only my Facebook friends.” (1)

It’s good to see that Facebook users are beginning to learn how to use the many settings Facebook gives them to control their privacy, such that the percentages have changed dramatically.  It had been weird to see so many Facebook users unresponsive to the privacy tools given to them.

I also liked Stutzman’s final comments:

First, Facebook defaults have changed over the years, so a default now may have been a modification in the past.  Second, Facebook’s audience is increasingly international, so we must remember that norms will vary significantly across nations and cultures.  Third, privacy is not in Facebook’s business interests.  Less privacy = more content, so it may not be in Facebook’s interest to craft a privacy statistic that reflects current norms.

But Stutzman concludes with this:

Young people didn’t simply decide to give up privacy.  Rather, the studies show that social network sites, in their early iterations, created a very meaningful sense of close community.  Young people disclosed not because attitudes about privacy instantly and simultaneously changed, but because they felt very comfortable with their audience.

Hmm.  It seems as though Randall Stross was just saying that older people do not take as freely to sharing their lives publicly as younger people would.  Is that horribly wrong to say?  While there is more resistance among older people, sure, many will eventually adapt (I’ve been getting my dad to share more online).

But generational memory and identity are hard to break; try as we might, there will be many of the older generations who will just never change, and will never want to share online.  They grew up in a different world, and it sticks with them.  I’m not saying Stutzman is wrong — I would just like to see him add generational memory to the study of old vs. young people.  I’d argue that kids these days are being wired to accept a future flesh/digital hybrid world…one where a radical transparency and accountability system exists and there is little privacy except for the most intimate parts of our lives.

Building Online Communities

[Before I begin, I just wanted to link to this O’Reilly Radar post that shows how Facebook continues to blow away its competition, with 175 million users worldwide.  Another conflicting post from another source has a different number of total users, at 222 million.  Facebook is posting great growth numbers abroad and in the US — I say all this because I believe Facebook is taking over the planet in social networking shortly before the personal data jailbreak is to occur.]

Somewhere between researching my final orals exam topic of “individualized identity and reputation for international development” (for my MSFS degree) and studying how to design both a competitive and collaborative ecosystem for my start-up, I came across some very cool pages at Yahoo!.

Yahoo!’s developer network has available some tips and examples of how to build competition, reputation, rankings, leaderboards, and other social interaction devices into a web site.

Check some of them out:

YDN (Yahoo! Developer Network) has grouped these and many other categories loosely under “Reputation” in one of its menu hierarchies.

These pages have some interesting linkages.  From one post it links to:

“The famed #1 book reviewer on Amazon.com (who does claim to be a speed-reader) posts, on average, 7 book reviews a day. So not only does Harriet have time for reading all these books, she can also whip off reviews of them pretty quickly, too.”

Another example:

“Avoid even slightly offensive names for levels (e.g., Music Hotshot! or Photo Flyguy!)

  • These may be learnable with appropriate supporting material, but remember that reputations are also a form of self-expression and odds are good that a sizable portion of your community won’t want to be identified with frivolous, insulting or just goofy-sounding labels.
  • Ambiguous level names like these tested very poorly with some of our users.”

What’s interesting to me about all this is that it provides some basic examples of when to use certain systems and when not to.  Sometimes you may not want people to be competitive, because it may detract from their desires to collaborate.  What I read between the lines is that different cultures will adopt different preferences for how their self-designed systems will create and generate the maximum value and benefit for them.  Such a system might not be of maximum utility to another culture, however.

This implies that systems may need to be designed that are flexible to different peoples’ values.  It also implies that certain web sites may work where they were previously thought not to, just by providing an alternate version specific to that culture or tribe.  The easiest example of this to visualize would be language-localized versions of web sites.  Facebook adding Arabic and Hebrew versions recently will bring in many more Arab- and Hebrew- speakers through this alone.  But other cultural dimensions beyond language have yet to be addressed.

Not too long ago, I attended the Future of Web Apps conference in Miami.  It amazed me to see just how involved companies like Yahoo! and Facebook are getting into building online communities.  I also picked up some cool Yahoo! schwag including a foldable map that shows all of Yahoo!’s APIs and services.  Pretty impressive.  What’s even better, these companies are being extremely open about all of this.  The social networking community looked nothing like this when we first began our research not too long ago in August!  Pretty awesome!

Studying Russia

[To round out my research, I need to study the BRIC countries — however I realize I do not have the time to give them much more than a cursory look in all their dimensions:  demographics, political economy, sociography, history, culture, religion, etc.  So I thought if I were to look at them through the lens of how it might affect the expression of their cultures/countries online, that might be sufficient.

Now, please, I am not a regional expert by any means, so if I overgeneralize or say something blatantly wrong, please correct me in the comments but don’t take what I write personally — I’m only going off what I could find online, mainly through Wikipedia.  Here’s Russia’s Wikipedia page, for example.]

Russia

Government: Parag Khanna argues in “The Second World” that Gazprom, Russia’s oil corporation, controls Russia and the government, with Vladimir Putin running a revivalist, nationalist agenda.  It is, as Khanna says, a petrocracy, one that is acutely sensitive to oil prices.  Russia is not politically free, but it is economically free — if you’re rich, you’re living well.  The rest of the country has languished.  Journalists who have attempted to investigate the government have been intimidated or murdered.

International Affairs: Russia continues to be a formidable security presence, exerting its influence on former Soviet satellites and in throttling Europe’s exposure to natural gas and oil.  However, it seems reliant on Europe for investment, and is being trumped by China on its eastern borders.  Russia’s military has not benefited from oil/gas profits — thus its ability to exert leverage has become even more concentrated in its ability to control natural resources.  It can be argued that Russia now looks with embarrassment as China as a successful Communist model.

Demographics: According to Khanna, 2/3 of the Russian population lives near the poverty line.  Russia has an aging population that is emigrating from the country if possible.  It is still well-educated.  HIV/AIDS and other health problems have surfaced as health care systems languished.  Russia is in danger of losing its eastern provinces (providing most of its land mass) to China, whose economic success and cultural roots prove far more inviting.  3/4 of Russia’s economy is concentrated in Moscow.

Religion: Russian Orthodox 63%, agnostic 12%, atheist 13%, 6% Muslim.

Telecom: Russia has very low penetration, at 14%.  According to comScore, the Russian internet market grew 25% in 2007, making it one of the fastest-growing (and largest) markets in the world.

Social Media Usage:

In Russia, there are two major social networking sites (SNSs):  Odnoklassniki and vkontakte.  Odnoklassniki is primarily for students to find each other, while Vkontakte is a blatant Facebook rip-off.  Both have the same percentage reach of the overall internet market.  The difference is that Vkontakte users spend 689 average minutes on the site per month, whereas Odnoklassniki users only spend 120 average minutes on their site. (comScore)  This means that although both have similar statistics, Vkontakte usage is richer, and, in the long-run, will grow faster.

One blog post says,

“What’s more, some users try to demonstrate to their friends that they no longer use Odnoklassniki and have moved to Vkontakte by displaying a graphical image as their avatar or one of the photos reading “moved to Vkontakte” to avoid the automatic filters for the text messages – but such photos are quickly deleted by moderators of the network anyway.

“I have to admit this looks like a creative way to avoid migration of your users to your competitor but at the same time I have a feeling it should be frowned on at the very least. For example, I have seen Odnoklassniki buying ad space on Facebook to display to the Russian users and a Facebook advertising team representative told me that their ToS for the advertising program did not prevent competitors from paying to reach the users of the social network.”

Noticeable is that Facebook has almost no exposure in Russia, although it only added language localization in June of 2008.

Questions

Odnoklassniki seems on the surface to not be appealing in a broader sense than networking among students.  Facebook started off this way, however, but expanded for wider social networking.  Vkontakte is exploiting the success of Facebook, but in an inferior manner — fewer controls and features.

Furthermore, I disagree with the blog post that suggests the only option for Facebook is to buy its clone Vkontakte to take the users and grab much of the Russian market.  I would predict that if Russia’s integration into the larger internet community grows, Facebook will quickly syphon users away from Vkontakte.

Some Effects of Cultural Context

Over on my reputation research blog, I wrote a long piece, mainly to do with Malcolm Gladwell’s new book, “Outliers”.  I felt the post was also relevant for this blog because Gladwell talks about how cultural history affects modern-day events, design, and culture.

For instance, Gladwell writes that some Asian civilizations, being primarily rice-growers, approach problems the same way they grow rice.  Rice must be nurtured extensively, carefully grown, and constantly improved.  Wheat and corn growers, on the other hand, are not necessarily required to plant seeds perfectly spaced apart, to build perfect soil or mud/clay for the crop, or to spend lots of time maintaining the crops.  What Gladwell says is that rice-growing civilizations have been measured to spend more time thinking about a problem before giving up than wheat- or corn- growing civilizations.  They have more patience and determination to be good at things like math.

He also talks about how, until training accounted for the problem, Korean Air had a massive problem with communication among its pilots and first mates.  This led to a spate of crashes, and black box recordings showed that a cultural context where one does not question authority, and does not speak directly, instead using hints or suggestions, is not good for an industry where if the crew doesn’t make direct, well-communicated decisions, its plane will end up smashing into the ground.

So check out my post, and read Gladwell’s book.  It’s fascinating.  The premise is sort of what I’m hoping to get out of my research into how international values shape social networking sites within the context of privacy and identity.

Hiatus

Apologies for the interruption in posting regularly.  It’s the end of the semester and I can’t speak for Gaurav and Pav but I’ve had a lot of on-going semester-long projects.  The Mumbai attacks hit close to home for Gaurav and Pav and I kept up with Gaurav’s tweets and posts during the Thanksgiving break while watching TV coverage and reading the spotty journalism online.  Certainly there was a communitas and online awareness during the Mumbai hostage situations that’s unique to our times.

In mid-November, Gaurav gave a presentation during a Georgetown CCT (Communications, Culture, and Technology) breakfast chat. The CCT program, by the way, has a really cool blog called gnovis which covers interdisciplinary issues such as culture, technology, media, politics, and the arts. Add it to your RSS feed!

I assisted in covering a few slides for the presentation.  Our topic was how cultural context affects social media usage in the BRIC countries and in the US.

Gaurav posted the excellent slideshow he presented, so you can check it out:

This presentation was very useful for us because the CCT students are not only already well-versed in the subject we covered, but also pointed out areas we completely overlooked, studies we used that have blind spots, and presented an argument that we should look more carefully at how the different BRIC countries and the US view issues like privacy, openness, and sharing.

So these issues I will be researching for my future posts, particularly how the word “privacy” does not translate well into other languages and is fairly confusing even in English.

I also plan to study the individual countries to see if I can isolate characteristics applicable to my studies on privacy and openness vs. closedness.

It should also be mentioned that discussion within the web developer community regarding identity, sharing data across sites, and privacy vs. advertising is extremely hot right now, so I will try to post more summaries of good stories I see out there on that front.

Happy belated Thanksgiving, and here’s hoping you have a happy holiday season, wherever you are.

Edward Hall’s Context Prism

In search of more prisms that I can examine BRIC countries through (Gaurav blogged about Geert Hofstede, which gave us some interesting data points), I came across Edward Hall’s high- and low- context analysis.

Other sites already cover Hall’s theory pretty well, but basically he differentiated cultures based on an idea that some had high-context communication and others had low-context communication.

Scandinavians, for example, have low-context communications.  You can walk into any conversation with them and their dialogue will contain very direct messages that are self-encapsulated and contain most of the information you would need to make sense of it.

There are codified norms within the society that make the conversation rules-based and less personal.  It comes off as very direct and to the point. Read More »