Hello, world!

Every project has to begin with something. Here's how this site is built.

Minutes to read → 11
Map of the ARPAnet circa July 1974

Welcome! I hadn't updated my personal site in years, and there was that lingering itch in the back of my mind to not only do a site refresh, but to also share some knowledge back to the community. The rate at which technology is changing is accelerating to a breakneck pace, and looking through web and app development through the lens of having created some pretty high-trafficked websites over nearly 25 years, you get the perspective that software really is eating the world.

So this site will act less of a personal blog (remember those?) and more of a series of articles and thoughts about 21st century web development and where we're headed. To do that, I want to make sure that my own site follows the same rules and parameters as what I would expect a 21st century business would follow. (If you like the term dogfood, there you go.)

This first article is going to be about how this marketing website-equivalent is built and deployed. Let's jump in.

First, let's define the problem space.

What is a marketing website? What is a company blog? From a business perspective it's pretty straightforward to define those two - you want to put your best face forward and get customers engaged enough to want to purchase / interact / sign up / etc. From a technical perspective, I like to think of a marketing website as a website that is 99.999% reads, and 0.001% writes.

When you think about it, if your marketing message is changing any more often than that, you have some serious messaging problems. It's one thing to a/b test conversions by tweaking language, layout, or even functionality, but if you're kanban'ing through changes to a marketing website without a clear plan of "let's see how this performs" then there's something fundamentally wrong.

So making that assumption, that I'm not planning on updating my website hourly, or even daily, that very quickly informs a technical solution to hosting a bunch of pages that don't change very often.

  • We don't need any kind of dynamic OLAP solution here.
  • We don't need any authentication.
  • There's really no business logic to enforce.
  • Everyone can get the same exact content.
  • And on a real marketing website, our goal is to move the customer to a signup/signin flow that can happen within the context of a separate web app.

So here's what we technically need.

  • The ability to serve up pages via HTTP. (Really this is via TLS.)
  • Let's fire up a CDN to get those pages to people quickly.
  • Fast DNS service so our time-to-first-byte is quick. (As a side note: DNS response times are surprisingly overlooked by people when optimizing their response times! You would be surprised at how many tens-to-hundreds of milliseconds you can shave by using a good DNS provider.)

...and that's it.

Now, let's go back in time 20 years and pretend that the year is 1999. Given those parameters, you're going and spec'ing out at least two boxes to act as web servers, another two boxes to act as software load balancers (or purchasing a pair of F5 hardware load balancers), you're purchasing a backup system to store offsite backups, you're purchasing hardware to act as website monitoring, and that's just to host production. You could make the argument that you're looking at spec'ing out a build environment and all the fun infrastructure that comes with that as well.

Either you're hosting this within your own datacenter, or you're renting out half a rack at your nearby provider. There were other relatively straightforward static site hosting services in 1999, but they were flaky at best, and so DIY was the way to go. You would be looking at a total cost of probably upward of $40k up front. To host a static marketing website.

Geocities Racks
The racks at Geocities circa 1999, by Michael Stevens

If you fast forward five years or so, you could potentially get yourself set up on hosted Wordpress, but then it's a matter of time before yet-another-PHP-security-hole rears its head and you find your site overwritten, hosting malware, or worse. I don't consider this to be a real solution (and hindsight is 20/20).

So let's fast forward back to today.

The 21st century marketing website.

Here's what I'm using:

  • GatsbyJS to power the site.
  • An S3 bucket to host pages.
  • Cloudflare for my CDN and DNS.
  • Gitlab to host my source code and provide CI/CD to build and deploy the site.

...and that's it. I'll go over those each in detail, but the overall monthly cost of hosting this site is pennies. It isn't truly a scale-from-zero cost, but it's close enough. The largest cost for hosting my personal site is the DNS registration fee. If you could go back in time 20 years and tell me that would be the largest cost of hosting a site, I wouldn't have believed you.

This is the democratizing power of the cloud. It doesn't take a large amount of infrastructure, time, and money to spin up a project, see if you can get traction, and scale to an internet powerhouse. The largest cost is your time, and modern tools are shrinking that time investment every day. I went above and beyond in terms of customizing Gatsby for my purposes, and start to finish getting this site running took me about 8 points of scrum complexity, or rather a little under a week's worth of work.

One of the recurring themes of these articles is going to be, this can be done very quickly and inexpensively. Even though I'm just building and hosting some html in an S3 bucket, I have a pretty impressive system for writing, building, and deploying that site, all of which is free to use and the resulting site is extremely performant for end users.

Let's go over the tools and workflow.

GatsbyJS for site generation.

Gatsby is one of a new-ish set of tools called static site generators; they take a collection of data (in my case, a bunch of markdown files stored in my repository) run it through a templating system, and output some extremely-optimized HTML, Javascript, CSS, and static assets. Gatsby will resize images for multiple viewers; phone users get smaller filesizes than desktop users, for example. Gatsby uses relatively straightforward React code to render pages - while React has a pretty big learning curve, the subset that you need to create a fully-functional Gatsby site is extremely small. If you know how to write Javascript, you can create a Gatsby-powered site.

The idea behind static site generators is simple: for sites that change infrequently (which is to say, marketing sites or personal sites), instead of having these dynamic applications, databases, memory caches, and all of that, you can just pre-build the assets for the site. You don't need to worry about SQL injection attacks because you aren't accessing a database. You don't need to worry about XSS because there's no business logic. The only way the site could be compromised is if you lose your cloud provider's credentials and someone is able to upload different content. There are no plugin security holes to worry about. There's no maintenance required. Static site generators just work.

Here's the React code for most of this page:

const ArticlePage = ({ data }) => (
  <div className="page-article">
    <SEO title={data.markdownRemark.frontmatter.title} keywords={[`mark`, `beeson`, `engineering`]}
         description={data.markdownRemark.excerpt}/>
    <ArticleNav title={data.markdownRemark.frontmatter.title}/>

    <article>
      <h1 className={"article-title"}>{data.markdownRemark.frontmatter.title}</h1>
      <h3 className={"article-subhead"}>{data.markdownRemark.frontmatter.subhead}</h3>
      <div className={"article-timetoread"}>Minutes to read &#x2192; {data.markdownRemark.timeToRead}</div>
      <Img fluid={data.markdownRemark.frontmatter.cover.childImageSharp.fluid}/>
      <div className={"covercaption"}>{data.markdownRemark.frontmatter.covercaption}</div>
      <div dangerouslySetInnerHTML={{ __html: data.markdownRemark.html }}/>
    </article>

    <section className={"other-articles"}>

The one gotcha that you'll run into along the way → Gatsby accesses data via GraphQL, which is still relatively new and can be tricky to pick up. There are prebuilt templates for creating a blog, data-driven pages, and the like, however it may take a bit of research to figure out how to query your data store (in my case, the markdown pages) and pass data to the presentation layer.

For example, to do the "newer" / "older" links at the bottom of each article required a custom GraphQL query:

olderpost: allMarkdownRemark(filter: { 
  frontmatter: { 
    date: { 
      lt: $date 
    } 
  }}, sort: { 
    order: DESC, 
    fields: [ frontmatter___date ]
  }, limit: 1) {
  fields
  to
  display
})

This is essentially similar to querying DynamoDB, if you're familiar with that. There's a bit of ramp-up time with GraphQL and then you can do quite a lot with it because you can embed the queries you need either within the pages themselves, or even at the React component level.

I develop the site on my laptop - Gatsby has a developer mode where you can do local development and the application will do hot-reloading, making iteration fast. As with pretty much any marketing site, styling made up the majority of my time spent heads down. Gatsby is great at this; it just gets out of your way and you can have a browser running and see live changes to your styling.

Writing articles is just writing Markdown pages and getting them ready for publishing. When it comes time to deploy, I'll push to my Gitlab repo and let the built-in Gitlab CI/CD pipeline create and deploy the pages to...

An S3 bucket for hosting the site.

You already know about S3. Maybe you prefer Google Firestore, maybe you prefer Azure storage, whatever you prefer, cheap redundant infinitely-scalable disk space that speaks HTTP natively is one of the internet's greatest technical inventions. S3 can be complex when it comes to permissions, IAM roles, pre-signed uploads and downloads, but it remains an absolute necessity for any modern application.

It's staggering to think about how much infrastructure time and effort can been saved by just implementing S3 into your application. Disk hashing algorithms, backup rotation strategies, the actual process of purchasing disk space, racking it, and connecting it via fibre channel to your network. There are multi-billion dollar companies who exist solely because people haven't yet understood the power of S3.

I turned on static website hosting for my S3 bucket with a single trackpad tap. S3 itself can host the entire site and will be more than fast enough for your purposes, but to really accelerate things I'm using...

Cloudflare for CDN, WAF, and DNS.

I'm going to write an entire article about the benefits of Cloudflare, but even the free tier of Cloudflare is enough to host your personal or marketing website. Go there and sign up now if you haven't already. Cloudflare's network is second to none and they have spent an enormous amount of time and thought into creating the best content delivery network and edge infrastructure available.

Hosting your DNS on Cloudflare will decrease your time to first byte by a huge percentage. Even the usability of their DNS control panel is better than any DNS provider's - certainly better than writing zone files yourself - and would be worth the $20 pro-tier account all by itself.

Cloudflare DNS
The Cloudflare DNS control panel

Combine that with an amazing web application firewall, free easy-to-deploy SSL, an easy-to-use content delivery network cache, and now the ability to deploy functions to their edge nodes via their Workers (essentially a simplified Lambda running on the Cloudflare network with near-zero cold start times), Cloudflare is a tool that should be in front of every single web application.

Gitlab for source repository and CI/CD.

I'm also going to write an entire article about Gitlab as an essential tool for engineering teams. The reason that I tend to choose Gitlab versus Github is because of Gitlab's built-in CI/CD. Normally, Github would require a Travis or Jenkins integration in order to pull off the same level of continuous delivery that Gitlab has out of the box. Github Actions are in beta, and will come close to Gitlab's CI/CD, but in terms of creating a delivery pipeline, Gitlab makes testing and deploying your code very easy.

To make your Gitlab pipeline run faster, set up a dedicated Docker image with pre-built tools that you need for building and testing your application. I'm using an image with the latest stable version of Node and the AWS tools installed, so I can npm install gatsby, build, and then copy the files to the S3 bucket.

Gitlab Pipeline
A sample Gitlab pipeline

For my pipeline, I have one step that attempts to build the site, and saves the resulting build as artifacts. If the build is successful, I move to a deployment step that does the actual copying to S3. The pipeline can be configured to run on any branch, or git tag, or even if specific files/directories change. I run the build step on any branch that gets checked in, so I can verify the build on a feature branch, and then the deployment job only runs on the master branch. This is a simplified version of Gitflow without the release branches or back-porting to a development branch.

The development workflow.

I've gone slow to go fast here - once I've written my article in Markdown, attached any images that I want, and gone through an editing cycle or two, the act of publishing is as simple as committing the correct files and pushing to either a new branch for building, or pushing to master for building and deploying.

I use Webstorm as an IDE for its semi-WYSIWYG Markdown editor; you could run something like Atom or Sublime or even VSCode and get the same functional editor for articles.

While this isn't the one-button publish that you get with Wordpress or Squarespace or Blogger or Livejournal (remember that?) it's a very developer-friendly workflow, and one that even a non-technical person could pull off with no problems. This workflow gives an extreme amount of control without being a morass of code - just create a new directory, throw a markdown file inside it, images, and do a git commit and push.

Wrapping up, and conclusions.

What do you use for your marketing website or blog? I feel like I've created a sustainable, maintainable system that allows for fast publishing when I need to, and for the 99.999% reads and 0.001% writes that this site will receive, the architecture fits extremely well. The data is in a relatively portable format, the site combines simplicity with a level of technical control that a fully-managed service wouldn't be able to pull off, and the price of hosting these articles is effectively nothing.

Thanks for reading! Hopefully this is a good introduction to creating modern applications and how web technology has changed over the years. I'm excited to be writing these articles and sharing my experiences of helping create and evolve the web. At the rate of change we're going through, it won't surprise me to have a later article completely supersede earlier ones, and it will be fun to see how quickly technology and processes change.