The Rise of the Server-less Web Stack

  • Wednesday, March 18th, 2015 11:22 am GMT -5

Javascript has lots of cool stuff built on top of it now. These days, there are tons of well-worn frameworks that bring all sorts of powerful programming paradigms into the browser.

Want easy object-orientation? Use backbone. More of a functional programmer? There’s underscore, lodash and many others. And I can’t keep up with the latest template rendering libraries, but there are dozens.

Plus ECMAscript 6 is rolling out quickly and with it, some long-awaited language features, syntactic sugar and new APIs.

Additionally, there are a lot of JS SDKs and simple integrations for things like accepting payments (stripe), analytics (mixpanel,, etc) if you don’t want to write the code or support the infrastructure to do those things yourself.

With all of these features, one can build an entire, bonafide web application in pure javascript. This certainly isn’t a new idea — single-page javascript applications have been around for years.

serverless-web-stackBut what if we take the power of javascript to its logical conclusion — making the entire app live in the user’s browser.

Do we even need to deal with setting up servers and maintaing a separate codebase for a server-side backend at all?


Minimum Viable Git Best Practices for Small Teams

  • Tuesday, January 20th, 2015 08:01 pm GMT -5

logomark-orange@2xWhen I started as the first employee at Burstworks, the cofounders and I could easily hold the information about who was working on what at any given moment in our brains.

But as we worked on new projects and the scope and size of the engineering team grew, all of our code mostly stayed organized in one central repository:

  • Our high-performance ad server
  • Data Pipeline
  • One-off scripts
  • Nightly jobs
  • Everything…

While we generally weren’t working on the exact same files at the same time, there was still lots of stepping on toes. Having your git push rejected was a common occurrence.

Inevitably we had issues with merge conflicts, which lead me to send this tweet from our company account:

And so I decided to take a step back and think about how we managed our version control system at Burstworks.

I definitely didn’t want to come up with something heavy handed or overly-proscriptive. The goal was to come up with just enough process to grease the wheels, and not slow things down.

I did some reading, came up with some initial ideas and pitched them to the team. We iterated a bit and here’s what we came up with.

I should start out by saying that it’s nothing revolutionary or new. It’s what I would consider the Minimum Viable Git Best Practices™ for a small engineering organization.


Lightning Fast Data Serialization in Python

  • Monday, December 8th, 2014 10:55 pm GMT -5

A few months ago, I got a chance to dive into some Python code that was performing slower than expected.

lightning-fast-serialization-pythonThe code in question was taking tiny bits of data off of a queue, translating some values from strings to primary keys, and then saving the data back to another queue for another worker to process.

The translation step should have been fast. We were loading the data into memory from a MySQL database during initialization, and had organized the data structure so that the id -> string lookups were constant time.

Finding the Problem

In order to figure out where the bottleneck(s) were, I used Python’s builtin CProfile package, and combed through the results using the awesome CProfileV package, written by a former Quora intern.

After letting the script run for awhile, the bottleneck jumped out right away — the workers were spending about 40% of their time serializing and deserializing data.

In order to keep messages on the queue for other workers to pick up, we were translating the Python dicts into JSON objects using the standard library’s json package.

Our worker was reading the text data from the queue, deserializing it into a Python dict, changing a few values and then serializing it back into text data to save onto a new queue.

The translation steps were taking up about 40% of the total runtime.

So I set out to see if there was a faster way to serialize a Python dict.


Preventing Web Scraping: Best Practices for Keeping Your Content Safe

  • Monday, August 11th, 2014 08:17 pm GMT -5

Many content producers or site owners get understandably anxious about the thought of a web scraper culling all of their data, and wonder if there’s any technical means for stopping automated harvesting.

Unfortunately, if your website presents information in a way that a browser can access and render for the average visitor, then that same content can be scraped by a script or application.

Any content that can be viewed on a webpage can be scraped. Period.

A content thiefYou can try checking the headers of the requests — like User-Agent or Cookie — but those are so easily spoofed that it’s not even worth doing.

You can see if the client executes Javascript, but bots can run that as well. Any behavior that a browser makes can be copied by a determined and skilled web scraper.

But while it may be impossible to completely prevent your content from being lifted, there are still many things you can do to make the life of a web scraper difficult enough that they’ll give up or not event attempt your site at all.

Having written a book on web scraping and spent a lot of time thinking about these things, here are a few things I’ve found that a site owner can do to throw major obstacles in the way of a scraper.


The Full Time Employee's Guide to Generating Freelance Clients on the Side

  • Monday, July 7th, 2014 06:26 pm GMT -5

There are many reasons to begin freelancing while still maintaining a full time job. Whether you want to work on new projects outside the scope of your current role or make some extra money each month — or maybe you’re hoping to eventually jump into freelancing full time — getting your first few clients can be really rewarding.

But how should you get started? I’ll cover that in this article.

We’ll go over ways to build your expertise and generate demand for your time. Then we’ll talk about generating potential leads and how to convert them into paying clients. Finally, we’ll talk about some tips for your pricing conversations.

In a future article, I’ll talk about different styles of project management, easy ways to exceed your clients’ expectations, how to handle some of the legal and administrative issues you’ll face, and how to feel comfortable raising your rates. Make sure to subscribe for updates!

Freelance tips

Personally, I’ve worked with dozens of clients over the past few years across several different business problem domains. Some of my clients are one-person operations while others are large organizations that have run Super Bowl ads.

I learned a bunch from both my successes and my mistakes along the way as I was getting started, so I figured I’d put together a guide for other people who might want to follow a similar path.


Moving a Static Site to S3 Before My Girlfriend Got Out of the Shower

  • Friday, June 6th, 2014 04:42 pm GMT -5

I’ve got an old Rackspace instance that I’ve been running a bunch of small sites on over the past 4 years. Lately it’s been causing me problems and sites will sporadically go down from time to time.

I have been meaning to move several of the static sites onto a more appropriate static-file hosting service like Amazon’s Simple Storage Service, also known as simply “S3″.

I’m on a trip in Denver with my girlfriend right now, so when I woke up to an email that one of my sites was down again, the last thing I wanted to do was waste precious vacation time doing server ops.

Woman in the shower

Fortunately, moving the static sites to s3 was so easy, I was able to get it done before my girlfriend even got out of the shower. No vacation time wasted!


Becoming a Cold Weather Adventurer: Notes from MIT Outing Club's Winter School

  • Saturday, February 15th, 2014 03:15 pm GMT -5

Growing up, I was always an outdoorsy person, but cold New England winters kept me cooped up inside for a big chunk of the year. Last winter, I decided to take my first winter mountaineering and ice climbing lessons to start building the skills to become a year-round adventurer.

Preparing for a hike up Mt. Washington

Earlier this winter, a friend told me about MIT Outing Club’s annual Winter School in January. It was 16 hours of lectures, demonstrations and stories from trip leaders and outside speakers. The course was a great introduction for anyone looking to get outside more in the winter.

Guest lecturer at MIT Outing Club's Winter School

I’ve compiled some of my notes from the course here, and added my own anecdotes that I’ve picked up over the past year. While reading about this stuff is a great way to whet your appetite, some of the skills and more technical aspects should really be practiced before you go out and try using them.

I’d highly recommend Northeast Mountaineering’s Introduction to Mountaineering course if you’re in New England.


Peeling Back the ORM: Demystifying Relational Databases For New Web Developers

  • Tuesday, November 19th, 2013 10:48 pm GMT -5

Most web developers building dynamic websites interact with databases every day. Relational databases like MySQL or Postgres are usually the first tool people reach for when their application needs to store data.

database-iconBut with the recent proliferation of web frameworks like Rails and Django, many web developers rely totally on Object-Relational Mappers (ORMs) for interacting with their database.

In fact, many new web developers see “writing raw SQL” or interacting directly with the database as something scary that should be avoided at all costs.

The reality is that relational databases are actually fairly easy to tame, and are built on top of lots of great ideas. Understanding the relational database that your application runs on will give you a much richer understanding of your web stack and make you a more powerful, proficient developer.

This article is a version of some notes I wrote for the new web developers who just started at Burstworks. At the end I link to the major resources I used, in case you want to learn more about this stuff.


The "Ultimate Guide to Web Scraping" is Now Available

  • Sunday, August 4th, 2013 09:45 pm GMT -5

web-scraping-ebook-coverI wrote an article on web scraping last winter that has since been viewed almost 100,000 times. Clearly there are people who want to learn about this stuff, so I decided I’d write a book.

A few months later, I’m happy to announce: The Ultimate Guide to Web Scraping.

No prior knowledge of web scraping is necessary to follow along — the book is designed to walk you from beginner to expert, honing your skills and helping you become a master craftsman in the art of web scraping.

The book talks about the reasons why web scraping is a valid way to harvest information — despite common complaints. It also examines various ways that information is sent from a website to your computer, and how you can intercept and parse it. We’ll also look at common traps and anti-scraping tactics and how you might be able to thwart them.

There are code samples in both Ruby and Python — I had to learn Ruby just so I could write the code samples! If anyone’s willing to translate the sample code into PHP or Javascript, I’ll give you a free copy of the book. Get in touch.


How HTTPS Secures Connections: What Every Web Dev Should Know

  • Wednesday, July 24th, 2013 09:40 pm GMT -5

https certificate iconHow does HTTPS actually work? That was the question I set out to solve a few days ago for a project at work.

As a web developer, I knew that using HTTPS to protect users’ sensitive data was A Very Good Idea, but I didn’t have much understanding about how it actually worked.

How was data protected? How can a client and server create a secure connection if someone was already listening in on the wire? What is a security certificate and why do I need to pay someone to get one?

A Series of Tubes

Before we dive into how it all works, let’s talk briefly about why it’s important to secure connections in the first place, and what sorts of things HTTPS guards against.

When you make a request to visit your favorite website, that request must pass through many different networks — any of which could be used to potentially eavesdrop or tamper with your connection.

series of tubes

From your own computer to other machines on your local network, to the access point itself, through routers and switches all the way to the ISP and through the backbone providers, there are a lot of different organizations who ferry a request along. If a malicious user got into any one of those systems, then they have the potential to see what’s traveling through the wire.

Normally, web requests are sent over regular ol’ HTTP, where a client’s request and the server’s response are both sent as plain text. There are lots of good reasons why HTTP doesn’t use secure encryption by default:

  • Security requires more computation power
  • Security requires more bandwidth
  • Security breaks caching

But sometimes, as the developer of a web application, you know that sensitive information like passwords or credit card data will be going over the connection, so it’s necessary to take extra precautions against snooping on those pages.