Occasionally, I blog about things that interest me.

12 Jun 2015
Succeeding in Rutgers Computer Science

I spent 5 years in the Computer Science BS-MS program. I was a peer mentor for CS111 for 3 years.

I've met a wide range of students in the CS department. I've helped students with assignments, and explained concepts to them over and over again. This blog post explains the patterns for success and failure that I have noticed.

Your performance in intro courses is incredibly important. Develop a good mental model of programming early in your coursework.

A good performance in the intro courses builds the foundation for your CS knowledge. Without it, you are as useful as a mathematician that cannot add fractions, or solve linear equations.

By the time you get through CS111, you should have a clear mental model of how problem solving is accomplished through code. It's easy to get distracted as a 111 student by the complexities of Java, Eclipse, jEdit, your weekly assignments, and course project. This is especially true if you come to 111 with no programming experience. It might feel like all you're doing is staying afloat week after week, assignment after assignment, milestone after milestone. Someone tells you that using Eclipse is good/bad. Someone tells you that you should put curly braces on the same line as a function, as opposed to putting it on its own line. It can be very overwhelming.

If at the end of 111, you are able to take a problem given to you, think about how it's solved, write code to solve it, and debug your code until it works, you have done well. This seems like a hard thing to measure, so I'm going to give you a couple of litmus tests to gauge your performance.

Consider the problem below:

Write a program that prints the numbers from 1 to 100. But for multiples of three print Lemon instead of the number and for the multiples of five print Juice. For numbers which are multiples of both three and five print LemonJuice.

Here is what the output of the program looks like from 1 to 20.

1
2
Lemon
4
Juice
Lemon
7
8
Lemon
Juice
11
Lemon
13
14
LemonJuice
16
17
Lemon
19
Juice

This problem should be a breeze for you after 111. If you cannot write code to solve this in 30 minutes, without someone's help, you are in trouble.

Here is another litmus test. If you can solve the first 5 problems on Project Euler without any help, you have learned problem solving. You can start worrying about other things. If you can't do these problems, you need to spend more time learning to problem solve, writing code, and debugging code. You will not get very far in a CS degree before you do this.

You can write code to solve a Project Euler problem, and submit the solution by creating an account. This is also a great way to learn a new programming language (This is how I learned the basics of Python).

Data Structures is incredibly important.

Data Structures develops on your understanding of problem solving from 111 and teaches you about clever ways in which programmers manipulate data, and algorithms.

  1. You need to know Data Structures to write C code (CS211 - Computer Architecture, and CS214 - Systems Programming).
  2. You need to know Data Structures to do well in Algorithms (CS344).
  3. You need to know Data Structures to get through a tech interview.

You can ask any upperclassman what they think the most important CS course is, and Data Structures will almost certainly be one of their top 3.

The Story of 50% of the students in CS112

Now, let me tell you the story of Alice. Alice is a metaphorical student that represents 50% of the CS112 roster.

Alice took CS111, did fairly well in it (got better than a B). In 111, Alice often found it hard to listen to the professor in lecture. Every once in a while, she zoned out, and couldn't figure out what Tjang or Sesh was saying. She went home, looked at the weekly assignment, and finished it (perhaps with some help from a friend, TA, or someone at the iLabs). Since every concept in lecture was reinforced in a weekly assignment, she understood most of the material that was covered in 111.

When Alice took 112, she followed a similar path. Tjang/Sesh was lecturing about Heaps, Linked Lists, Graphs, Tail Recursion, Efficiency Analysis of Insertion Sort etc. She went on Facebook on her laptop/got a text from a roommate, subequently zoned out in class, and quickly lost track of what the professor was saying. Alice went home, there was a project due in two weeks. Alice worked very hard on the project. She definitely had some issues along the way, but she was able to ask Sesh/TAs/peers/people at the iLabs for help and get it done. She got an 85+ on all the 5 projects in CS112. Alice got around a 50 on both Data Structures exams, ended Data Structures with a C/C+, and moved on.

Alice is the canonical example of a student that does poorly in the Rutgers CS degree, and here's why.

In 111, Alice had projects every week that forced her to learn material. The only thing in 112 that forces you to learn material is the exam, and that happens twice a semester. Sometimes, the first 112 exam is so early in the semester that there's very little material covered on it. This means that the final covers a ton of material, and because Alice has been zoning out of 112 lecture on a regular basis, she does not know why Heapify is O(n), or why you would want to use a min heap as a frontier when implmenting Dijkstra's shortest path algorithm.

In order to avoid Alice's pitfalls, you need to make sure that you understand every concept in Data Structures. Practically every sentence that Sesh or Tjang says in 112 is important. I understand that you are human, and cannot keep perfect concentration in an 80 minute lecture. But you need to be proactive about understanding all the concepts that are covered. This means that if you zoned out in the lecture about AVL Trees, you still need to make sure you learn it. You can go read the textbook after class. You can look up the material online. There's some very good online courses that cover data structures. I strongly recommend the lectures from Stanford's Programming Abstractions course. You can ask upperclassmen to explain the concept to you.

You need to do this before the day of your data structures exam, because the course covers a lot of important material, and you cannot learn all of it in a day.

Pick classes carefully, especially after the intro sequence.

  1. Pay close attention to the professor that's teaching the class.
    • Don't take a class just because the professor is easy. By doing this, you are learning nothing, and wasting both time and tuition dollars.
    • Understand that a lot of the professors teaching you haven't written a line of code in 15+ years. They cannot and will not teach you web or mobile app development. If they attempt to teach you this stuff, their material will likely be outdated and poorly researched.
    • The theoretical classes are challenging and will require mental discipline. I highly recommend everything being taught by Martin Farach-Colton, William Steiger, or Eric Allender. These were my favorite professors. Their classes were engaging, memorable, and had challenging material.
  2. Prerequisites don't matter.

    • If you've done well in 111, 112, 211 and 205, you can take any other class in the department - even a graduate class.
    • Do you see Uli Kremer (an excellent compilers professor) teaching next semester? Take his class even though you haven't taken 314.
    • But V! I'm not smart. Everyone in this class is going to have an edge on me. I'm going to get a bad grade, lose my scholarship, and end up on the streets. Here's what I have to say to this - 50% of the people taking a class don't learn a thing. At the end of CS112, I want you to ask 10 of your classmates if they can implement a hash table. At the beginning of 211 and 214, I want you to look around you as they struggle to implement one in C. They're not learning in 112, and they're not learning in the prerequisite courses that you want to skip. They will have no edge over you.
    • You will feel more challenged in a course if you take it without a prerequisite, but as long as you're proactive about brushing up on material you don't know, you'll do just fine.
    • Logistics - you can't register for a course on webreg if you haven't taken the prereqs. Send the prof an email to get a prereq override - tell them you're interested in the material, and you'll put extra time and learn stuff you don't know.
  3. Stop focusing on grades. Focus on concepts you're learning

    • The classes I loved the most were the ones in which I was engaged by the professor's lectures and assignments.
    • I took Operating Systems with a smart but boring professor. I was not engaged in lecture, but his assignments, exams were easy so I got an A. I regret wasting my time, because my grade did not reflect my knowledge of OS.
    • I have friends that took OS with Ricardo Bianchinni. Their assignments and exams were much harder, and they spent much more time on coursework. Most of them did not get As - but they learned far more about OS than I did.
    • If your goal is to learn OS, don't waste your time in a course that doesn't teach you OS.
    • If your goal is to get a piece of paper with your name on it, and "Bachelors in Computer Science" under it, this advice isn't for you. I loved learning about the hard problems in computer science from incredibly smart people that were experts in these fields.

If you are not branching out from the material you learn in class, you will have a hard time getting hired as a developer.

Your professors haven't written code in 15+ years. They're not going to teach you how to develop software - don't waste your time taking CS431.

Start writing code in your free time to solve any problem that you have. I wrote RUBUS when I was in 112. I used to maintain static web pages for a board game my roommate played with his high school friends. Everyone that's good at programming has put in lots of time into it, and you will need to do the same.

Talk to upperclassmen about ideas you have, and the things that they've built. Hang out in the CAVE, attend USACS events, and go to hackathons. All of this experience will add up, and help you land your first paid programming gig.

Get paid to write code as early as possible.

When I started taking CS classes, I didn't understand why people got paid to sit at a computer and write for-loops. I thought the mysterious, legendary developers getting internships and part time jobs must know so much more about the web/mobile apps/algorithms. This is a classic case of impostor syndrome, and it creates a mental roadblock until you get paid to write code.

Use your summers for internships or part time work (not for folding clothes). For your first development job, I recommend Student System Administration at OSS, System Administration at LCSR (under Doug Motto), an internship at Too Much Media, or HackNY if you are lucky. You'll want to do a more traditional tech internship at companies like Microsoft, Google or Etsy, but you'll have an easier time getting these once you have some experience. If you hang around CS folks, you'll hear about such opportunities frequently. Apply early and often - you'll have an easier time in October than in April.

Participate in the community

The Rutgers CS community, while not perfect, has grown a great deal over the last few years. You've got access to events like HackRU, HackNY, PennApps, and HackTCNJ. You've got access to the CAVE, the Hackerspace, and the Makerspace. USACS, RuMAD and the Rutgers Hackathon Club are active, and full of smart people. People taking the same classes as you have gone on to amazing jobs, start companies, and sell companies. Start reading hacker news, and /r/programming.

Don't forget to give back to the community. Teach CS111 recitation. Join the USACS board, and help plan events. Hang out at the CAVE and help underclassmen understand difficult concepts. Help the noobs out at hackathons.

Make friends out of your peers. Impossible looking homework assignments will become easier. You'll spend a silly amount of time working on a CTF challenge, or writing a game. You'll get one letter Github usernames together. After college, they'll help you find jobs and offer you their couches.

Rutgers is a great place to study Computer Science, and I hope your time there will be as memorable as mine was.

10 Jul 2014
Easy HTTPS Setup with StartSSL

If you're like me, you've written loads of web apps, but you rarely set up SSL on them. SSL is a must for any production-grade web application, especially if you're authenticating users or taking personal information from them. Otherwise all the contents of your HTTP requests are being sent in plaintext - user login info / passwords, cookies etc.

Usually, SSL certificates can cost lots of money (Verisign charges over $100 / month), and be annoying to setup. After paying for domains and hosting, this is the last thing you want to shell out money for. Thus, StartSSL Free is a very appealing product because it gives you a free SSL certificate valid for 1 year, that's accepted in all major browsers. I'm using it to serve flipdclass.com over SSL.

Getting your Cerficates from StartSSL

  • Start by Signing up for an Account at StartSSL.
  • They'll send you a verification code and install a client certificate into your browser.
  • Backup this certificate to cloud storage and Sign in to StartSSL.
  • Now, you'll want to validate your domain. Go to the StartSSL Free Product Page > Validations Wizard > Domain Name Validation. Type in your domain name (vverma.net for example).
  • StartSSL will give you a list of email addresses for verifying your domain. You can usually access one of these emails through your domain registrar if you aren't already set up to receive emails at your domain.
  • Verify your domain (sometimes it takes a few minutes), and click on Cerficates Wizard to start creating your very first cert.
  • Pick Web Server SSL/TLS Certificate > Enter a Password. Note the password that you enter - you will need it later to encrypt your private key.
  • This will give you a private key in a text box. Copy this into a text editor and save the file as domain_key.enc.
  • Click Continue > Continue > Add a Subdomain to your key. If you're making this for the top level domain (http://vverma.net), I would recommend adding www as the sub domain to the certificate. You'll need to create a certificate for each subdomain that you want to access over HTTPS, unless you get a wildcard SSL cert (not available through StartSSL Free).
  • On the next page, you'll see a certificate in a text box. Copy this into a text editor and save the file as domain.crt. I would also recommend saving the intermediate and root CA certs, because you'll need them for your webserver setup.

Setting up your Web Server

I'll walk you through setting up nginx or apache for SSL.

  • At this point you'll want to copy all the certificate files onto your server (domain_key.enc, domain.crt, ca.pem, sub.class1.server.ca.pem ), probably via scp.
  • I'd recommend moving all certificate files to a directory like /etc/nginx/ssl/
  • Now, you'll want to unencrypt your private key domain_key.enc, so that it can be read by your web server. Without this, your webserver will prompt you for a password everytime it is restarted.
$ openssl rsa -in domain_key.enc -out domain.key 
$ chmod 400 domain.key # only root should be able to read this.
  • Next, you'll want to configure your webserver to respond to HTTPS requests with the certificate.

Apache Virtual Host

<VirtualHost _default_:443>
    ServerName domain.com

    SSLEngine On

    SSLCertificateFile /path/to/certs/domain.crt
    SSLCertificateKeyFile /path/to/certs/domain.key
    SSLCertificateChainFile /path/to/certs/sub.class1.server.ca.pem

    ...
</VirtualHost>

Nginx Virtual Host

Nginx does not have a directive for SSL Certificate Chains, so you will to concatenate your certificate to the intermediate and root CA certs.

$ cat domain.crt sub.class1.server.ca.pem ca.pem > domain.chained.crt

Then you can configure your virtual host as follows.

server {
    listen 443 default_server ssl;
    ssl_certificate /path/to/certs/domain.chained.crt;
    ssl_certificate_key /path/to/certs/domain.key;
    ...
}

Now, you can reload your webserver, and if you did everything correctly, you should get a successful HTTPS connection to your web app. Make sure that you test your site on a few different browsers, because not all browsers will behave the same way with SSL certificates.

You should also consider configuring your webserver to redirect all traffic to HTTPS, in order to prevent users from leaking their sensitive data by mistake.

26 Jan 2014
Use PDSH to shell into multiple hosts via SSH

SSH is a powerful protocol that lets you access machines remotely and run commands on them. Rutgers has a cluster of linux machines for CS students, and I often run programs on them. Sometimes, I leave a program running for a while, and forget which machine it was on. In this situation, PDSH comes in handy. It lets me run ps aux | grep -i <username> quickly across all the machines.

PDSH - Run SSH in Parallel

PDSH lets you run a command in parallel across a bunch of machines. I start by creating a text file with a list of machines I want to shell into:

cd.cs.rutgers.edu
cp.cs.rutgers.edu
grep.cs.rutgers.edu
kill.cs.rutgers.edu
less.cs.rutgers.edu
ls.cs.rutgers.edu
man.cs.rutgers.edu
pwd.cs.rutgers.edu
rm.cs.rutgers.edu
top.cs.rutgers.edu
vi.cs.rutgers.edu
cpp.cs.rutgers.edu
java.cs.rutgers.edu
perl.cs.rutgers.edu
basic.cs.rutgers.edu
assembly.cs.rutgers.edu
pascal.cs.rutgers.edu
php.cs.rutgers.edu
lisp.cs.rutgers.edu
prolog.cs.rutgers.edu
adapter.cs.rutgers.edu
command.cs.rutgers.edu
decorator.cs.rutgers.edu
facade.cs.rutgers.edu
flyweight.cs.rutgers.edu
mediator.cs.rutgers.edu
patterns.cs.rutgers.edu
singleton.cs.rutgers.edu
state.cs.rutgers.edu
template.cs.rutgers.edu
visitor.cs.rutgers.edu
builder.cs.rutgers.edu
composite.cs.rutgers.edu
design.cs.rutgers.edu
factory.cs.rutgers.edu
interpreter.cs.rutgers.edu
null.cs.rutgers.edu
prototype.cs.rutgers.edu
specification.cs.rutgers.edu
strategy.cs.rutgers.edu
utility.cs.rutgers.edu

Let's say I save this file as machines.txt. I can then run a command in parallel across all these machines:

$ pdsh -R ssh -w ^machines "<command>"

Here are some things you can do with PDSH that you might find useful

Find all python processes running on these machines. $ pdsh -R ssh -w ^machines "ps aux | grep -i python"

Kill any processes being run by my user. (Super useful if you forget to log out of a lab machine.) $ pdsh -R ssh -w ^machines "killall -u `whoami`"

Check a specific log file for errors. $ pdsh -R ssh -w ^machines "grep -i error /path/to/log"

It's a handy UNIX tool to have in your arsenal when working with lots of machines. Clearly, I am only showing the usage of pdsh in the most basic way. Check out PDSH on Google Code for a more detailed description of everything PDSH can do.

05 Jan 2014
Scrape the web using CSS Selectors in Python

Web Scraping is a super useful technique that lets you get data out of web pages that don't have an API. I often scrape web pages to get structured data out of unstructured web pages, and Python is my language of choice for quick scripts.

BeautifulSoup - Why I don't use it anymore

In the past, I used Beautiful Soup almost exclusively to do this kind of scraping. BeautifulSoup is a great library for web scraping - it has great docs, and it gets the job done most of the time. I've used it on lots of projects. However, I find that it doesn't fit my workflow.

Let's say I wanted to scrape some data off a web page. I usually inspect the element in the Chrome Dev Console, and guess at a selector that might give me the data I want. Perhaps I guess div.foo li a. I quickly check to see if this works by running this selector in the console $('div.foo li a'), and modify it if it doesn't.

Even after using BeautifulSoup for a while, I find that I have to go back and read the docs to write code that scrapes this selector. I always forget how to select classes in BeautifulSoup's find_all method. I don't remember how to write a CSS attribute selector such as a[href=*foo*]. It doesn't let me write code at the speed of thought.

lxml.cssselect

LXML is a robust library for parsing XML and HTML in Python that even BeautifulSoup is built on top of. I don't know much about lxml, except that I can use CSS Selectors with it very easily, thanks to lxml.cssselect. Look at the example code below to see how easy this is.

import lxml.html
from lxml.cssselect import CSSSelector

# get some html
import requests

r = requests.get('http://url.to.website/')

# build the DOM Tree
tree = lxml.html.fromstring(r.text)

# print the parsed DOM Tree
print lxml.html.tostring(tree)

# construct a CSS Selector
sel = CSSSelector('div.foo li a')

# Apply the selector to the DOM tree.
results = sel(tree)
print results

# print the HTML for the first result.
match = results[0]
print lxml.html.tostring(match)

# get the href attribute of the first result
print match.get('href')

# print the text of the first result.
print match.text

# get the text out of all the results
data = [result.text for result in results]

As you can see, it's really easy to use CSS Selectors with Python and lxml. Instead of spending time reading BeautifulSoup docs, spend time writing your application.

Installation of lxml and lxml.cssselect

LXML and CSSSelect are both Python packages that you can install easily via pip. In order to install lxml via pip you will need libxml2 and libxslt. On a standard Ubuntu installation, you can simply do

sudo apt-get install libxml2-dev libxslt1-dev
pip install lxml cssselect

Check out the lxml installation page and lxml.cssselect for more information.

04 Oct 2013
Keyboard Focused Development Workflow on Macs

Having used Linux almost exclusively for the last four years, I miss efficient window management on Macs. Coming from the awesome window manager, I find that OS X does not have good support for a two monitor multiple workspace workflow out of the box. After tinkering with third party software, I believe I've found a good solution for most of my complaints, and have a workflow that I feel productive with. In my experience, this works best with multiple monitors, a standard keyboard (think Dell not Apple), and a three button mouse (I'm not a fan of touchpads or Apple mice).

Setting up Multiple Desktops

I really like the use of multiple desktops in my workflow. I usually set up four desktops. I keep Spotify open on the very last one. The middle ones are my "work" desktops that I use for terminals, browsers, IDEs, and documentation. The first one is usually a "distraction workspace" that will have my email, and Adium open. This helps me keep my windows organized, and keep focus when I need to.

In order to set this up, I add additional desktops (up to 4). The easiest way to do this is to open up Mission Control (usually Control-Up), hover over the Desktops, and click the Plus button on the top right.

Once this is done, I would recommend setting up easy keybindings to switch between desktops. To do this, you go to System Preferences > Keyboard > Keyboard Shortcuts > Mission Control. Then you can set up keybindings for Move left a space, Move right a space, ... Switch to Desktop 1-4. I use Ctrl-Alt-Left/Right to move between desktops, and use Command-1/2/3/4 to jump to a desktop.

Using Slate for Window Management

On OS X, it's sometimes pretty cumbersome to perform window management tasks like moving windows between monitors, and maximizing windows efficiently. This is where Slate comes in. Slate is a configurable third-party window management application, that makes these window management tasks super easy. I will explain how I use Slate day-to-day.

Setting it up

You can install Slate, pretty simply by downloading the Slate dmg. After installing and starting Slate, you will want to make sure it's properly configured.

Here is my ~/.slate.js configuration file that describes the keybindings I use with it. Right click on the Slate icon in the topbar > Relaunch and Load Config, to apply configuration changes.

//Save this in ~/.slate.js
//This configuration file came from http://vverma.net. 
//Credit to Gerard O'Neill, http://goneill.net for introducing me to Slate.

var left = {
    'x': 'screenOriginX',
    'y': 'screenOriginY',
    'width': 'screenSizeX/2',
    'height': 'screenSizeY',
};

var right = {
    'x': 'screenOriginX + screenSizeX/2',
    'y': 'screenOriginY',
    'width': 'screenSizeX/2',
    'height': 'screenSizeY',
};


//half screen.
slate.bind('right:ctrl,cmd', function(win) {
    var screen = slate.screen().rect();
    var win_rect = win.rect();

    // if we are at the edge of a screen on the right.
    if(Math.abs(screen.x + screen.width - win_rect.x - win_rect.width) < 5) {
        var curr_screen = slate.screen().id();
        slate.log(curr_screen);
        if(curr_screen < slate.screenCount() - 1) {
            var shift_screen = _.clone(left);
            shift_screen['screen'] = curr_screen + 1;

            win.doOperation(slate.operation('move', shift_screen));
        }
    } else {
        win.doOperation(slate.operation('move', right));
    }
});


//half screen.
slate.bind('left:ctrl,cmd', function(win) {
    var screen = slate.screen().rect();
    var win_rect = win.rect();

    // if we are at the edge of a screen on the left.
    if(screen.x == win_rect.x) {
        var curr_screen = slate.screen().id();
        if(curr_screen > 0) {
            var shift_screen = _.clone(right);
            shift_screen['screen'] = curr_screen - 1;

            win.doOperation(slate.operation('move', shift_screen));
        }
    } else {
        win.doOperation(slate.operation('move', left));
    }
});


//maximize
slate.bind('up:ctrl,cmd', function(win) {
    win.doOperation(slate.operation('move', {
        'x': 'screenOriginX',
        'y': 'screenOriginY',
        'width': 'screenSizeX',
        'height': 'screenSizeY',
    }));
});

//center
slate.bind('down:ctrl,cmd', function(win) {
    win.doOperation(slate.operation('move', {
        'x': 'screenOriginX + screenSizeX/4',
        'y': 'screenOriginY + screenSizeY/4',
        'width': 'screenSizeX/2',
        'height': 'screenSizeY/2',
    }));
});

Here is the mapping of keybindings

Cmd-Ctrl-Left - Split window in half vertically and move to the left. (Moves to the next screen if you are at the edge.)
Cmd-Ctrl-Right - Split window in half vertically and move to the right. (Moves to the next screen if you are at the edge.)

Cmd-Ctrl-Up - Maximize window.
Cmd-Ctrl-Down - Center window in its current screen.

Feel free to modify this slate configuration to suit your needs. You might find the Slate documentation helpful.

Miscellaneous

Adium

If you use Adium as a chat client on your machine, I recommend setting up a Global Keyboard Shortcut. This allows you to switch the focus to Adium anytime on your machine by pressing the key sequence. It's super handy to instantly switch to Adium when you get an Adium notification.

I set my global keyboard shortcut to Cmd-Shift-/. To use this, you'll have to get rid of the keybinding for the Help Center first. Do this by removing the keybinding in System Preferences > Keyboard > Keyboard Shortcuts > Help Center.

To set the global keyboard shortcut in Adium, go to Preferences > General > Global Shortcut.

Now, you can press Cmd-Shift-/ to switch to Adium, and press Cmd-/ to show/hide your buddy list.

Spotlight/Alfred

You should already be using Alfred as your primary application launcher. It lets you launch applications with your keyboard really easily, and do much more. It's also way faster than Spotlight.

If you use multiple desktops, sometimes you'll want to create multiple windows of the same application. Alfred will get in your way here, because if you try to launch an application that already has a window open, it will take you to the window instead of opening a new one. This happens to me all the time when I want to create a Chrome window, when you already have one open on another window.

The easiest way to do this is to go to an existing Chrome window, press Cmd-N to create another window, drag the newly created window, and while dragging the window, press Cmd-1/2/3/4 to take the window to another desktop.

Spaces

Like I mentioned before, I use Spaces in my development workflow. One thing that I really dislike about Spaces is that when you move between Spaces using Ctrl-Alt-Left/Right, it takes a second to animate the movement. I don't like this because it feels clunky.

You can run this in a terminal to make this animation a lot faster.

defaults write com.apple.dock expose-animation-duration -int 0; killall Dock

Credit to Gerard O'Neill for showing me a lot of this workflow.

26 Sep 2013
Adding Rutgers Ubuntu Mirrors

Rutgers Open Systems Solutions mirrors a bunch of Linux distributions, and you can use these to download packages quickly when you're on campus. When downloading on the Rutgers campus, your bandwidth will also not be throttled which significantly improves your download speeds.

Adding a Mirror to Ubuntu

To add a mirror, open up your /etc/apt/sources.list file.

sudo [editor] /etc/apt/sources.list

It should look something like this:

# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib] main restricted
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib] main restricted

## Major bug fix updates produced after the final release of the
## distribution.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates main restricted
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates main restricted

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib] universe
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib] universe
deb http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates universe
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu 
## team, and may not be under a free licence. Please satisfy yourself as to 
## your rights to use the software. Also, please note that software in 
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib] multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib] multiverse
deb http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates multiverse

## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib]-backports main restricted universe multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib]-backports main restricted universe multiverse
...

[distrib] here is the name of your distribution. This is one of natty, oneiric, precise, quantal, raring, or saucy. You can find a full list on the Ubuntu Wikipedia Page

You are going to add the following lines at the top of the file.

deb http://mirrors.rutgers.edu/ubuntu [distrib] main restricted universe multiverse
deb http://mirrors.rutgers.edu/ubuntu [distrib]-updates main restricted universe multiverse
deb http://mirrors.rutgers.edu/ubuntu [distrib]-backports main restricted universe multiverse
deb http://mirrors.rutgers.edu/ubuntu [distrib]-security main restricted universe multiverse

Be sure to replace [distrib] with the name of your distribution. Now, save the file and quit, and run

sudo apt-get update

You should now be downloading packages from the Rutgers mirror. Try to install a package, and make sure that you're making requests to http://mirrors.rutgers.edu.

09 Sep 2013
Fetch any JSON using JSONP and YQL

When building web applications, you sometimes want to retrieve JSON data from APIs and domains that are external to your service. Because of the Same Origin policy in browsers, you cannot retrieve data from other domains via AJAX.

JSON-P to the rescue

Usually to get around this, APIs will have endpoints that support JSON-P. JSON-P is a nifty technique that loads JSON data via <script> tags instead of loading the data via XMLHttpRequests (AJAX). To understand this, let's look at an example.

Let's say you have a service on http://myservice.com/data.json that returns the following JSON.

{
  "from": "myservice",
  "status": 200,
  "data": ['foo', 'bar', 'baz'],
}

An application that lives on http://anotherapplication.com cannot access data.json in client-side JS via AJAX because anotherapplication.com and myservice.com are not the same domain.

As the author of myservice.com, you can solve this problem by turning your JSON endpoint into a JSON-P endpoint. To do this you write myservice.com in such a way that hitting http://myservice.com/data.json?callback=procedureName returns the following:

procedureName({
  "from": "myservice",
  "status": 200,
  "data": ['foo', 'bar', 'baz'],
})

Now, the author of anotherapplication.com can load data.json by adding the following script tag dynamically to the client side DOM.

<script type="text/javascript" src="http://myservice.com/data.json?callback=procedureName">

Now, the function procedureName will get called with the data from data.json. Using this trick, does mean that you have to trust http://myservice.com, because any content returned by it can get executed by your client side JS.

Where YQL Comes in

Most web services will support JSON-P if they expect you to retrieve their data on the client side, but some do not.

For services that do not support JSON-P that live on the internet, you can use YQL to proxy the request through Yahoo's servers, and retrieve data in the same way.

Here is a snippet of jQuery code that would normally hit http://myservice.com/data.json


$.ajax({
  'url': 'http://myservice.com/data.json',
  'dataType': 'json',
  'success': function(response) {
    console.log(response);
  },
});

Here is how you modify it to proxy via Yahoo's servers.


var yql_url = 'https://query.yahooapis.com/v1/public/yql';
var url = 'http://myservice.com/data.json';

$.ajax({
  'url': yql_url,
  'data': {
    'q': 'SELECT * FROM json WHERE url="'+url+'"',
    'format': 'json',
    'jsonCompat': 'new',
  },
  'dataType': 'jsonp',
  'success': function(response) {
    console.log(response);
  },
});

The snippet above will send a request to Yahoo and get back data from myservice.com as a response. This does mean that http://myservice.com needs to live on the open web (not on an internal server), so that Yahoo servers can hit it.

jQuery will automatically add a callback parameter to the request, and give that name to the success function, so that it gets called appropriately.

09 Aug 2013
Using git hooks to deploy your web application

Often times when building web applications, I used to spend time deploying my web applications via ssh and scp. Then I used Heroku for a few projects, and I really liked that deploying to heroku was as easy as it could be.

git push heroku master

I wanted to have a similar deployment scheme on my own projects that aren't deployed on Heroku.

How it works

Since git is a distributed version control system, you can push the code that lives on your machine to another machine very easily via ssh. So your first instinct is to set up a repo in the location that your code needs to be deployed, and push to it via git. This is a good instinct, but git does not allow you to push code to a working copy. To resolve this, you will create a bare repository on your server, and push to it. You will also set up a git hook to automatically deploy your application when code gets pushed to the bare repository.

Setting it up

Before you start, your codebase needs to be in a git repository. This could be a Github repository that you use for version control. I will assume that your codebase lives in one directory called project on your development machine, which I will refer to as develop.

This codebase will be deployed to your server. I will refer to your server as deploy.

Now, you are going to create a bare git repository on deploy, and you will be able to push to it from develop.

username@deploy:~$ mkdir repos # this is the dir where all your repos will be stored.
username@deploy:~$ cd repos
username@deploy:~/repos$ mkdir project.git 
username@deploy:~/repos$ cd project.git # You can replace this with the name of your project.
username@deploy:~/repos/project.git$ git init --bare
# Initialized empty Git repository in /home/username/repos/project.git

You will now set up your codebase on develop to push to the repos/project.git directory on deploy.

username@develop:~$ cd /path/to/my/project
username@develop:~/code/project$ git status 
# This must be a git repo.
username@develop:~/code/project$ git remote add deploy username@deploy:~/repos/project.git # This is the path to your bare repo.
username@develop:~/code/project$ git push deploy master 

This will push your codebase, to the bare repository you just created on deploy. You can verify this by cloning the bare repository if you'd like.

username@develop:~$ cd /tmp
username@develop:/tmp$ git clone username@deploy:~/repos/project.git
# Cloning into 'project'...
# remote: Counting objects: 666, done.
# remote: Compressing objects: 100% (417/417), done.
# remote: Total 666 (delta 255), reused 632 (delta 221)
# Receiving objects: 100% (666/666), 621.96 KiB | 462 KiB/s, done.
# Resolving deltas: 100% (255/255), done.
username@develop:/tmp$ cd project
username@develop:/tmp$ ls
# make sure your files are here.

Now that we are pushing to the repos/project.git directory on deploy. Let's set up our repository to actually deploy its code. I'll assume that your application gets deployed to /var/www/myproject.com .

username@deploy:~$ cd repos/project.git
username@deploy:~$ ls
# HEAD  branches  config  description  hooks  info  objects  refs
username@deploy:~$ cd hooks
username@deploy:~$ [editor] post-receive

The post-receive hook gets called by git right after code gets pushed to a repository (right after git push deploy master). We will make this hook deploy your application to /var/www/myproject.com . Using an editor of your choice, place the following in the post-receive file.

#!/bin/bash

### This file gets run when code is pushed to the project.git directory.

GIT_WORK_TREE=/var/www/myproject.com git checkout -f

Make the hook executable.

username@deploy:~/repos/project.git/hooks$ chmod +x post-receive

Make sure that your user has permissions to write to /var/www/myproject.com. This is it! You can now deploy your code anytime you want by running:

username@develop:~/code/project$ git push deploy master

Verify that your code is deployed when you push, and you should never need to use scp to deploy ever again.