The Stochastic Game

Ramblings of General Geekery

Using Mercurial to publish a PieCrust website

These days, all the cool hipster kids want to deploy stuff by pushing a Git or Mercurial repository up to their server.

And that’s pretty cool indeed, because you basically update your website by doing something like:

hg push myserver

So here’s how you can do it with PieCrust (although 90% of this article has nothing to do with PieCrust):

  1. Installing Git/Mercurial/whatever on your server
  2. Setting up your SSH keys
  3. Pushing your repository
  4. Defining hooks/triggers

Keep reading for the meaty details…

I’m going to use Dreamhost as the hosting provider in this pseudo-tutorial, but most of the information would be valid for another provider.

Installing the DSCM on your server

As far as I can tell, Git is installed on most servers at Dreamhost, so that’s already taken care of.

For Mercurial, they seem to have a super old version so you may want to install a more recent one yourself. It’s actually pretty easy since the steps are described on their community wiki. In my case, it boiled down to:

  1. mkdir -p ~/srcs
  2. cd ~/srcs
  3. wget http://mercurial.selenic.com/release/mercurial-1.7.5.tar.gz
  4. tar xvzf mercurial-1.7.5.tar.gz
  5. cd mercurial-1.7.5
  6. make local
  7. make install-home-bin

And then adding the new paths to my ~/.bash_profile and ~/.bashrc:

export PYTHONPATH=~/lib/python
export PATH=~/bin:$PATH

Setting up your SSH keys

If you’re using BitBucket or GitHub, you probably already have some SSH keys lying around somewhere. If not, then, well, create an account at either one (depending on whether you use Mercurial or Git). Not only are those websites super useful, but they can also help you (somewhat) with setting up an SSH access:

The Git help pages are way better than the Mercurial ones, so even if you don’t like Git you may want to check them out if you’re lost.

If your SSH access works with Git/Mercurial, then enable password-less login on your Dreamhost account (you basically just need to copy/paste your public key into an ~/.ssh/authorized_keys file… this should work on any other Unix-based host). This will make things super smooth in the future.

Pushing your repository to Dreamhost

Create a directory in your Dreamhost home to store your repository. For example, with Mercurial:

  1. mkdir -p ~/hg/myproject
  2. cd ~/hg/myproject
  3. hg init .

Now back on your local machine, in your local repository, edit the .hg/hgrc file with:

[paths]
dreamhost = ssh://yourlogin@yourdomain.ext/hg/myproject

If you type hg push dreamhost, it should all work! If not… well… go back to the beginning and start again.

Things would look very similar with Git, and you should be able to do it yourself since you’re already a Git user!

Setting up hooks/triggers

Now’s the time to do actual work. The idea is to run a custom script on your server when you push updates to the repository you have up there.

For example, on the Dreamhost server and using Mercurial, you can edit the ~/hg/myproject/.hg/hgrc file and add:

[hooks]
changegroup = ~/scripts/bake_site.sh

Now you only need to write the actual script bake_site.sh! You probably want something that does:

  1. Export the repository into a temporary folder.
  2. Bake your site in that temporary folder.
  3. If all went well, copy the baked site into your home.
  4. Clean up.

This would look something like:

#!/bin/sh

set -e

HG_REPO_DIR=~/hg/myproject
ARCHIVE_DIR=~/tmp/hg_archival
PUBLIC_DIR=~/yourdomain.com

# Archive (export) the repo.
echo "Archiving repository to ${ARCHIVE_DIR}"
if [ -d ${ARCHIVE_DIR} ]; then
    rm -fr ${ARCHIVE_DIR}
fi
mkdir -p ${ARCHIVE_DIR}
hg archive -r tip ${ARCHIVE_DIR}

# Bake your website
mkdir -p ${ARCHIVE_DIR}/baked
${ARCHIVE_DIR}/_piecrust/chef bake -r http://yourdomain.com/ -o ${ARCHIVE_DIR}/baked ${ARCHIVE_DIR}/mywebsite

# Move baked website into the public directory.
echo "Copying website to ${PUBLIC_DIR}"
cp -R ${ARCHIVE_DIR}/baked/* ${PUBLIC_DIR}

# Clean up.
echo "Cleaning up"
rm -fr ${ARCHIVE_DIR}

But don’t use that script!

Obviously it won’t work for you unless you change a lot of directory names, tweak the options for chef, etc. But it should be a good starting point.

  • I recommend you test your script by first running it manually while logged in via SSH, and after changing the PUBLIC_DIR to some temporary directory. You’ll probably get it wrong a couple times at first, especially if you’re quite rusty with Bash syntax like me.
  • When it’s working as expected, do a repository push and check that the script is baking everything correctly in the temporary directory.
  • If that’s all good, then you can revert PUBLIC_DIR back to its intended path.

Now you can enjoy your new coolness: hg push dreamhost!


Don’t brick your ReadyNAS

I have a ReadyNAS NV+ at home to store most of my data and I’ve been pretty happy with it so far… except for one thing: although it’s running a flavor of Linux that you can access as root user (if you installed the EnableRootSSH add-on), you can’t do everything you would normally do with a Linux box.

File Server

First, like most pre-2010 consumer grade NASes, the NV+ runs on a sparc CPU, so there’s a lot of packages you don’t have access to unless you recompile them yourself. And that’s fine, if you know you’re going to waste your whole evening figuring out weird broken dependencies and compile errors. But, second, there’s some custom stuff in there, I don’t know what it is, but it basically prevents you from even upgrading to newer versions of some of the packages you do have access to. This means: don’t run apt-get upgrade on an NV+.

Let me repeat that: don’t run apt-get upgrade on an NV+. Ever.

What happens when you do it is that you lose SSH access, the web administration interface stops working, some of your network shares become inaccessible, and half of your left socks magically disappear. I know, I did it twice in the past (yes, I’m stupid like that).

In both cases, I was lucky enough to recover from my mistake by performing an OS reinstall. It keeps all the packages, add-ons and configuration settings you had before, and only resets the admin password to netgear1 or infrant1 (depending on the version of RAIDiator you had installed), so it almost works again right away afterwards. The downside is that if what fucked up your NAS was one of those add-ons or packages, you wouldn’t have any other option than to do a factory reset and recover your data from a backup (you at least have daily automated backups, right?). But in my case, I think it was one of the OS libraries (like glibc or something) that was causing the issue so that’s where I got lucky. Twice.

Those are the only problems I ever had with that box, so overall I’m still happy to own it. The X-RAID that comes with it makes life a lot easier (you can hot-swap disks, and you can mix different disk sizes), and the machine is small and pretty quiet (my external backup disks are louder). Unlike my media center PC, I wouldn’t have much fun trying to build my own NAS, I think.

…but DON’T RUN APT-GET UPGRADE!


IEnumerable is awesome

I’ve always thought that one of the most underrated features of C# is the yield statement and its companion, IEnumerable<T>. It may be because I’m working a lot with people coming from an exclusively C++ background – it takes some time to adapt to a new language with new paradigms, especially when that language can look a lot like “C++ without delete!” at first. But there are so many wonderful constructs in C# (especially in the past couple versions) that it’s a shame when people keep writing code “the old way”…

That’s why I’m going to write a few “101” articles I can refer my co-workers to (hi co-worker!).

It starts (after the jump) with how yield+IEnumerable is awesome.

The yield statement

The yield statement allows you to write a pseudo1 coroutine specifically for generating enumerable objects (which is why it is sometimes called a “generator”). The first beauty of it is that you don’t have to create the enumerable object itself – it is created for you by the compiler. The second beauty is that this generator will return each item one by one without storing the whole collection, effectively “lazily” generating the sequence of items.

To illustrate this, look at how the following piece of code will generate the all time favorite Fibonacci sequence.

public static IEnumerable<int> GetFibonacciSequence()
{
    yield return 0;
    yield return 1;

    int previous = 0;
    int current = 1;

    while (true)
    {
        int next = previous + current;
        previous = current;
        current = next;
        yield return next;
    }
}

The awesome thing is that it’s an infinite loop! It would obviously never return if the function didn’t behave like a coroutine: it returns the next number to the caller and only resumes execution when the caller asks for another number. It’s like you’re “streaming” the Fibonacci sequence!

You can stop any time you want. For instance:

int count = 20;
foreach (int i in GetFibonacciSequence())
{
    Console.WriteLine(i);
    if (--count == 0)
        return;
}

Or, even better, using some of LINQ’s extension methods:

using System.Linq;
foreach (int i in GetFibonacciSequence().Take(20))
{
    Console.WriteLine(i);
}

Performance Gains

There are many advantages to using the yield statement, but most C++ programmers are not really swayed by arguments of coding terseness and expressivity, especially when it involves “black magic” going on inside the compiler2: they usually mostly, err, yield to performance related arguments (see what I did, here?).

So let’s write a simple program that generates many “widgets”, each 256 bytes in size, and print the peak memory usage:

class Widget
{
    private byte[] mBuffer;

    public Widget(int size)
    {
        mBuffer = new byte[size];
    }
}

class Producer
{
    // The old, classic way: return a big array. Booooorriiiiing.
    public IEnumerable<Widget> ProduceArray(int count, int widgetSize)
    {
        var widgets = new Widget[count];
        for (int i = 0; i < count; i++)
        {
            widgets[i] = new Widget(widgetSize);
        }
        return widgets;
    }
    
    // The new funky, trendy, hipstery way! Yieldy yay!
    public IEnumerable<Widget> ProduceEnumerable(int count, int widgetSize)
    {
        for (int i = 0; i < count; i++)
        {
            yield return new Widget(widgetSize);
        }
    }
}

class Program
{
    static void Main(string[] args)
    {
        int size = 256;
        int count = 1000000;
        ProduceAndPrint(false, count, size);    // LINE 7
        Console.WriteLine("Generated {0} widgets of size {1} (total size = {2})", count, size, count * size);
        Console.WriteLine("Memory Peaks:");
        var process = Process.GetCurrentProcess();
        Console.WriteLine("Virtual MemoryttPaged MemoryttWorking Set");
        Console.WriteLine("{0} Mbtt{1} Mbtt{2} Mb", process.PeakVirtualMemorySize64 / 1024, process.PeakPagedMemorySize64 / 1024, process.PeakWorkingSet64 / 1024);
    }

    static void ProduceAndPrint(bool useEnumerable, int count, int widgetSize)
    {
        var producer = new Producer();
        if (useEnumerable)
        {
            int i = 0;
            foreach (var w in producer.ProduceEnumerable(count, widgetSize))
            {
                ++i;
            }
        }
        else
        {
            int i = 0;
            foreach (var w in producer.ProduceArray(count, widgetSize))
            {
                ++i;
            }
        }
    }
}

This prints, on average:

Generated 1000000 widgets of size 256 (total size = 256000000)
Memory Peaks:
Virtual Memory          Paged Memory            Working Set
488572 Mb               299544 Mb               293100 Mb

Now, on line 7, change false into true. This will make the Producer return a yield-enumerated bunch of widgets, instead of returning a big array. This prints:

Generated 1000000 widgets of size 256 (total size = 256000000)
Memory Peaks:
Virtual Memory          Paged Memory            Working Set
133564 Mb               14156 Mb                12984 Mb

Woohoo! That’s almost 4 times less virtual memory used, and a 25 times smaller working set! All of this because the garbage collector can now kick in during the enumeration and discard some of the widgets we’re not using anymore. You can actually print the number of garbage collections triggered using GC.CollectionCount(0) (all those widgets will be in generation 0 because they’re short lived). In the first case (returning the full array of widgets) I usually get 49 collections. For the second case (returning widgets one by one) I usually get 66 collections.

Of course, you may get frightened by all those garbage collections that would slow down your loop (C++ programmers are easily scared by garbage collectors), and that’s a legitimate concern if you’re writing a real-time ultra-sensitive application (although you’d have to check first that it would indeed be a problem). But for other kinds of applications, it can be a real life-saver – like for instance when you’re fetching huge amounts of data from a database of some sorts, and although each piece of data fits in memory, the whole batch doesn’t.3

More!

If you want more meaty stuff on all this, I recommend reading Raymond Chen’s series: part 1, part 2, part 3 and part4. You can also have a look at the fun traps that await you as told by Eric Lippert: part 1 and part 2.


  1. “pseudo” because the actual compiler-generated implementation uses a simple state machine. ↩︎

  2. It’s not even all that black, since you can easily look at the generated code, but it does feel magical! ↩︎

  3. This is obviously supposing you don’t have referencing between each item, but even then you could use additional lazy-loading and caching to still only have a subset of your whole dataset in memory. ↩︎


XBMCFlicks in Canada

I recently cut the cord, as they say, giving up cable TV in favor of Netflix and generally doing more productive things otherwise (which is easy since Netflix up here in Canada doesn’t have nearly as much content as in its home country). The problem was then to figure out how to access Netflix’s streams at home.

The web interface works well enough but is obviously not suited for using from the couch (even with that very handy Rii remote). The other official options are devices that I don’t have, except for the Xbox360 and the Wii. I haven’t tried it on the Wii yet, but I know I certainly don’t want to use my Xbox360 since it’s loud as hell and I’d need to wait for it to boot up before I could watch anything… so I went the unofficial route, as it’s often the case with guys like me.

I started with Boxee, since it’s a lot better than XBMC when it comes to apps and its Netflix app looks very nice. The problem was that it’s only supported in the US at the moment (except on the Boxee Box itself). I tried tweaking the app’s code to make it work in Canada and eventually got that somewhat working, but it was a lot more effort than what I expected (maybe I’ll post about that later if I get it in a state I’m happy with, unless the Boxee guys actually release an update that fixes the problem). Still, it was a nice way to keep Python programming fresh in my mind.

I then reverted back to XBMC, which I’m a big fan of, and its XBMC-Flicks add-on, which lacks style but is at least functional and supports Canada. Still, there were a few blocking issues so I had to go in the code and change a few little things. I uploaded my fork on Github, so you can grab it from there and try it for yourself if you’re a canadian XBMC/Netflix user. Those fixes should be integrated into the offical add-on soon anyway.


Little wireless keyboard

A couple months ago I made a new addition to my home entertainment setup: a little wireless keyboard named “Rii”:

IMG_2006.jpg

It’s a lot more practical to use than my previous Logitech wireless full size mouse and keyboard combo because it’s so easy to grab, do something quick, and toss away (e.g. restart a program, copy or move a few files, type a search query, etc…). Obviously it’s not as good if you want to do anything that takes more than a minute, so in that case what I do is use my living room laptop (a Macbook Pro that’s always sitting on the coffee table) to either remote desktop into the HTPC or control it with Synergy.

Thanks to Eddie for showing me this little gem!


Enabling Disqus’ developer mode

This is the second post of a series of things I learned while using PieCrust.

From the Disqus documentation:

[You can tell] the Disqus service that you are testing the system on an inaccessible website, e.g. secured staging server or a local environment. If disqus_developer is off or undefined, Disqus’ default behavior will be to attempt to read the location of your page and validate the URL. If unsuccessful, Disqus will not load. Use this variable to get around this restriction while you are testing on an inaccessible website.

Assuming you will be baking your final website, you basically want this disqus_developer variable to be enabled if you’re not baking (i.e. you let PieCrust server each page dynamically), or if you’re baking from the chef server. And because both the baker and the server define variables you can access from your templates and pages, you can write this:

{% if not baker.is_baking or server.is_hosting %}{{ disqus.dev() }}{% endif %}

That’s assuming you’re using the default template engine (Twig), and that you have a disqus.dev macro defined somewhere in your template paths with the following code:

{% macro dev() %}
<script type="text/javascript">
    var disqus_developer = 1;
</script>
{% endmacro %}

Previewing a baked site in IIS/Apache

This is the first post of a series of things I learned while using PieCrust.

If you’re using PieCrust as a static website generator, you know you can use the built-in development server to preview your site as you’re modifying it. This is all pretty nice but there are plenty of good reasons to not go that way, the top ones being:

  1. The development server doesn’t run any custom scripts you may have.
  2. IIS, Apache and all the other well-known web-servers are faster and more robust.
  3. You can’t use special features like .htaccess or web.config settings.

If you went with the recommended directory structure for your site, you will have all your content in a _kitchen directory ready to be baked into your final website. But this _kitchen directory really looks a lot like a regular PieCrust website when PieCrust is used as a lightweight CMS. The only difference is that there’s no main index.php file, and that you may be using some file processors.

If you create an index.php file to bootstrap the _kitchen website, it would get deployed to your export directory when you bake your site, so you need to do one of the following:

  • Write some PHP wrapper to the chef baking utility to exclude that file from baking (you can pass a “skip_pattern” parameter to the baker class). I’ll try to make it easier to add exclude rules to chef in the near future, but at the moment you would need to write some code. Not ideal.
  • Rename the bootstrap file _index.php, since all files or directories with a leading underscore in their names are excluded. Now, however, you will need to edit your .htaccess or web.config to add _index.php as a default document, but that’s really just one line of text or a couple clicks. That’s the solution I use.

If you use file processors, you may have to work around them, which may not be possible. I myself only use the LESS CSS processor, and it’s easy to avoid it because LESS comes with a client-side Javascript processor. The trick is now to switch between the client and server side processors, which can be done with a few Twig macros:

{% macro stylesheet(baker, path, media) %}
{% if baker.is_baking %}
<link rel="stylesheet" href="{{ path }}.css" type="text/css" media="{{ media|default('all') }}" />
{% else %}
<link rel="stylesheet/less" href="{{ path }}.less" type="text/css" media="{{ media|default('all') }}" />
{% endif %}
{% endmacro %}

This macro emits a reference to a stylesheet that points to the standard CSS file if the website is currently being baked (which means the server-side processor will run to generate that CSS file), or points to the LESS file itself otherwise (which means the processor will run on the client).

{% macro less_js(baker, site_root) %}
{% if not baker.is_baking %}
<script src="{{ site_root }}js/less-1.0.41.min.js" type="text/javascript"></script>
{% endif %}
{% endmacro %}

This second macro adds the client-side processor to your page if the baker is not running.

Now you just need to call those macros from your template file(s):

{{ less.stylesheet(baker, site.root ~ 'styles/my-awesome-stylesheet', 'screen, projection') }}
{{ less.less_js(baker, site.root) }}

Update (PieCrust >= 0.0.3)

Recent versions of PieCrust have, at last, the ability to define which files should be excluded from baking. This means you can rename the bootstrap file back to index.php if you wish, and exclude it from the bake along with any .htaccess or web.config:

baker:
    skip_patterns:
        - /^_/
        - index.php
        - .htaccess
        - web.config

This is much better!


Working with Disqus

Since I’ve migrated this blog to PieCrust and all its comments to Disqus I’ve run into a few problems that took me some time to figure out (although I got help from the nice Disqus guys). Those problems come down to the following information which was not quite clear to me even after reading their documentation several times:

  • The disqus_identifier value gives a unique name to a thread.
  • The disqus_location value indicates the URL(s) at which a given thread can be found.
  • A thread can be used by several URLs, but no URL can be used by more than one thread.

The first 2 points are pretty straightforward, but that last one was the tricky one. Basically (at the time of this writing, obviously) if you have 2 threads with 2 correctly unique identifiers that were created both with the same URL, your page at that URL will show all kinds of weird behaviours, from never loading any comments (you get the “Disqus loading” icon animation) to loading one of the thread regardless of what disqus_identifier you specified on that page. You would think that identifier would, well, identify exactly what thread you wanted to display, but no, the URL from which it is loaded apparently also plays an important role.

This means, for example, that you can’t display more than one thread on a single page, although the Disqus guys told me this limitation will be removed in a few months.

Another little thing to look for is the subtle distinction between http://example.com/some/url and http://example.com/some/url/ (note the trailing slash character at the end of the second URL). Those URLs are effectively different, as far as Disqus is concerned. This is especially important for statically generated websites since the actual URL of a page or post is http://example.com/some/url/index.html, so there could be at least 3 correct ways to access the same page. Hopefully, as long as you don’t have some weird address collision, you won’t run into the problem mentioned above (which I ran into because I was messing around with the very handy Disqus Python bindings to batch rename and re-identify my threads).

Hopefully this will save somebody some head-scratching.


Announcing PieCrust

You may have noticed that this blog has changed its look, and has migrated its comments to Disqus. You may also see at the bottom of the page something about some pie crust baking… here’s what’s happening.

I have a few websites around and most of them don’t have much in them (e.g. I use the domain name for other things). There’s clearly not enough content to use a proper CMS, but there’s a bit too much repetition for my tastes if I write HTML files by hand. Also, I wanted to take advantage of libraries like Markdown or SmartyPants to make my text look nice with no effort. Basically, I needed some micro-CMS that would handle some basic layout and formatting.

Then there was the issue of this blog. It was running with WordPress, which I’m very happy with usually, but it wasn’t geeky enough. Also, syntax highlighting for code snippets felt dirty and over-complicated. I stumbled upon the whole “static site generation” underground scene and figured I could find something in between: a micro-CMS (for my small sites) that could also bake its own contents into static HTML files (for this blog).

Enter PieCrust.

There was already a shitload of static website generators, but none that could also work as a “dynamic” micro-CMS. Also, I’m a programmer geek, so it’s kind of my duty to not be happy with existing solutions and write one myself (“this one uses curly braces, but I want to use brackets instead! Surely now I have to write my own!). Anyway, isn’t the whole point of home projects to write stuff for yourself, to learn something new or just have fun with a new subject, regardless of whether it’s productive? I was bored, too.

I ended up writing PieCrust in PHP not because I like PHP (it mostly sucks), but because it’s still the most widely used web application language out there. And also because I’ve already got other home projects aimed at having fun with Python and Ruby.

So there you have it, go check out PieCrust. The code is on BitBucket, and mirrored on GitHub.