Ramblings of General Geekery

New PieCrust features

After several release candidates1 , I finally updated PieCrust to version 0.0.4.

Pie

There are quite a few nice new things in it:

  • pattern-based ability to skip files from baking
  • pagination filtering
  • multi-tag pages
  • multi-blog support
  • various bug fixes and optimizations

More details after the break.

“Skip patterns” for the baker

As part of the ongoing PieCrust cookbook, I blogged about previewing a baked site in IIS or Apache. Back then I mentioned how just one piece was missing to make it simpler: the ability to specify a pattern for files to exclude from baking. Well, I did just that.

Now, in your site configuration, you can specify an array of skip_patterns in the baker category. See the documentation for more info.

Pagination filtering

If you have a page that displays a list of posts, you can now do some custom filtering for those posts. In that page’s configuration header, add a section called posts_filters like such:

posts_filters:
    has_tags: piecrust
    has_tags: cookbook

This will only list blog posts that have both the “piecrust” and “cookbook” tags. This is useful for example if you only want to display important annoucements on your homepage, and have a separate “news” page with all the posts.

For more info, check out the documentation on pagination.

Multi-tag pages

Edit: in PieCrust 1.0RC and above, the multi-tags syntax changed!

PieCrust can generate tag pages, like for example the page listing all my posts related to PieCrust. However, if you want to generate a page that shows all the posts related to the PieCrust cookbook, you would have to specify two tags ("piecrust" and “cookbook”)… that is, assuming you really didn’t want to create specific “piecrust-cookbook” tag.

This is now possible by separating tags with a / like so: cookbook/piecrust. Try it!

If you’re using Twig, the default template engine, or Dwoo, one of the optional ones, it means you can use the pctagurl like so:

{{ pctagurl('cookbook/piecrust') }}

The only limitation is when you bake your site: only tag combinations that you insert with the Twig or Dwoo pctagurl functions will be baked. Since you should always insert links with the pc* functions, it should be all right, but you may want to link to a tag combination from outside your website. If that’s the case, you can specify custom tag combinations to be baked. Check the documentation for more details.

Multi-blog support

PieCrust now also supports multiple blogs in the same website. I debated with myself for quite a long time before adding this feature. On one hand, it felt a bit too complicated for a system that’s supposed to feel simple and natural, but on the other hand, well, I needed it. And after much pondering, I figured out a way to make it rather simple to the user. I think. Maybe. Hopefully.

Anyway, if you have a single-blog site, you don’t have to change anything (yay!). If you want more than one blog… go check out the documentation on multi-blog sites!

Other changes

Other changes include a more clever cache validation mechanism, some new debugging features, and many bug fixes.


  1. a.k.a: Ludovic doesn’t test his shit enough before applying a version tag. ↩︎


Using Mercurial to publish a PieCrust website

These days, all the cool hipster kids want to deploy stuff by pushing a Git or Mercurial repository up to their server.

And that’s pretty cool indeed, because you basically update your website by doing something like:

hg push myserver

So here’s how you can do it with PieCrust (although 90% of this article has nothing to do with PieCrust):

  1. Installing Git/Mercurial/whatever on your server
  2. Setting up your SSH keys
  3. Pushing your repository
  4. Defining hooks/triggers

Keep reading for the meaty details…

I’m going to use Dreamhost as the hosting provider in this pseudo-tutorial, but most of the information would be valid for another provider.

Installing the DSCM on your server

As far as I can tell, Git is installed on most servers at Dreamhost, so that’s already taken care of.

For Mercurial, they seem to have a super old version so you may want to install a more recent one yourself. It’s actually pretty easy since the steps are described on their community wiki. In my case, it boiled down to:

  1. mkdir -p ~/srcs
  2. cd ~/srcs
  3. wget http://mercurial.selenic.com/release/mercurial-1.7.5.tar.gz
  4. tar xvzf mercurial-1.7.5.tar.gz
  5. cd mercurial-1.7.5
  6. make local
  7. make install-home-bin

And then adding the new paths to my ~/.bash_profile and ~/.bashrc:

export PYTHONPATH=~/lib/python
export PATH=~/bin:$PATH

Setting up your SSH keys

If you’re using BitBucket or GitHub, you probably already have some SSH keys lying around somewhere. If not, then, well, create an account at either one (depending on whether you use Mercurial or Git). Not only are those websites super useful, but they can also help you (somewhat) with setting up an SSH access:

The Git help pages are way better than the Mercurial ones, so even if you don’t like Git you may want to check them out if you’re lost.

If your SSH access works with Git/Mercurial, then enable password-less login on your Dreamhost account (you basically just need to copy/paste your public key into an ~/.ssh/authorized_keys file… this should work on any other Unix-based host). This will make things super smooth in the future.

Pushing your repository to Dreamhost

Create a directory in your Dreamhost home to store your repository. For example, with Mercurial:

  1. mkdir -p ~/hg/myproject
  2. cd ~/hg/myproject
  3. hg init .

Now back on your local machine, in your local repository, edit the .hg/hgrc file with:

[paths]
dreamhost = ssh://yourlogin@yourdomain.ext/hg/myproject

If you type hg push dreamhost, it should all work! If not… well… go back to the beginning and start again.

Things would look very similar with Git, and you should be able to do it yourself since you’re already a Git user!

Setting up hooks/triggers

Now’s the time to do actual work. The idea is to run a custom script on your server when you push updates to the repository you have up there.

For example, on the Dreamhost server and using Mercurial, you can edit the ~/hg/myproject/.hg/hgrc file and add:

[hooks]
changegroup = ~/scripts/bake_site.sh

Now you only need to write the actual script bake_site.sh! You probably want something that does:

  1. Export the repository into a temporary folder.
  2. Bake your site in that temporary folder.
  3. If all went well, copy the baked site into your home.
  4. Clean up.

This would look something like:

#!/bin/sh

set -e

HG_REPO_DIR=~/hg/myproject
ARCHIVE_DIR=~/tmp/hg_archival
PUBLIC_DIR=~/yourdomain.com

# Archive (export) the repo.
echo "Archiving repository to ${ARCHIVE_DIR}"
if [ -d ${ARCHIVE_DIR} ]; then
    rm -fr ${ARCHIVE_DIR}
fi
mkdir -p ${ARCHIVE_DIR}
hg archive -r tip ${ARCHIVE_DIR}

# Bake your website
mkdir -p ${ARCHIVE_DIR}/baked
${ARCHIVE_DIR}/_piecrust/chef bake -r http://yourdomain.com/ -o ${ARCHIVE_DIR}/baked ${ARCHIVE_DIR}/mywebsite

# Move baked website into the public directory.
echo "Copying website to ${PUBLIC_DIR}"
cp -R ${ARCHIVE_DIR}/baked/* ${PUBLIC_DIR}

# Clean up.
echo "Cleaning up"
rm -fr ${ARCHIVE_DIR}

But don’t use that script!

Obviously it won’t work for you unless you change a lot of directory names, tweak the options for chef, etc. But it should be a good starting point.

  • I recommend you test your script by first running it manually while logged in via SSH, and after changing the PUBLIC_DIR to some temporary directory. You’ll probably get it wrong a couple times at first, especially if you’re quite rusty with Bash syntax like me.
  • When it’s working as expected, do a repository push and check that the script is baking everything correctly in the temporary directory.
  • If that’s all good, then you can revert PUBLIC_DIR back to its intended path.

Now you can enjoy your new coolness: hg push dreamhost!


Don’t brick your ReadyNAS

I have a ReadyNAS NV+ at home to store most of my data and I’ve been pretty happy with it so far… except for one thing: although it’s running a flavor of Linux that you can access as root user (if you installed the EnableRootSSH add-on), you can’t do everything you would normally do with a Linux box.

File Server

First, like most pre-2010 consumer grade NASes, the NV+ runs on a sparc CPU, so there’s a lot of packages you don’t have access to unless you recompile them yourself. And that’s fine, if you know you’re going to waste your whole evening figuring out weird broken dependencies and compile errors. But, second, there’s some custom stuff in there, I don’t know what it is, but it basically prevents you from even upgrading to newer versions of some of the packages you do have access to. This means: don’t run apt-get upgrade on an NV+.

Let me repeat that: don’t run apt-get upgrade on an NV+. Ever.

What happens when you do it is that you lose SSH access, the web administration interface stops working, some of your network shares become inaccessible, and half of your left socks magically disappear. I know, I did it twice in the past (yes, I’m stupid like that).

In both cases, I was lucky enough to recover from my mistake by performing an OS reinstall. It keeps all the packages, add-ons and configuration settings you had before, and only resets the admin password to netgear1 or infrant1 (depending on the version of RAIDiator you had installed), so it almost works again right away afterwards. The downside is that if what fucked up your NAS was one of those add-ons or packages, you wouldn’t have any other option than to do a factory reset and recover your data from a backup (you at least have daily automated backups, right?). But in my case, I think it was one of the OS libraries (like glibc or something) that was causing the issue so that’s where I got lucky. Twice.

Those are the only problems I ever had with that box, so overall I’m still happy to own it. The X-RAID that comes with it makes life a lot easier (you can hot-swap disks, and you can mix different disk sizes), and the machine is small and pretty quiet (my external backup disks are louder). Unlike my media center PC, I wouldn’t have much fun trying to build my own NAS, I think.

…but DON’T RUN APT-GET UPGRADE!


IEnumerable is awesome

I’ve always thought that one of the most underrated features of C# is the yield statement and its companion, IEnumerable<T>. It may be because I’m working a lot with people coming from an exclusively C++ background – it takes some time to adapt to a new language with new paradigms, especially when that language can look a lot like “C++ without delete!” at first. But there are so many wonderful constructs in C# (especially in the past couple versions) that it’s a shame when people keep writing code “the old way”…

That’s why I’m going to write a few “101” articles I can refer my co-workers to (hi co-worker!).

It starts (after the jump) with how yield+IEnumerable is awesome.

The yield statement

The yield statement allows you to write a pseudo1 coroutine specifically for generating enumerable objects (which is why it is sometimes called a “generator”). The first beauty of it is that you don’t have to create the enumerable object itself – it is created for you by the compiler. The second beauty is that this generator will return each item one by one without storing the whole collection, effectively “lazily” generating the sequence of items.

To illustrate this, look at how the following piece of code will generate the all time favorite Fibonacci sequence.

public static IEnumerable<int> GetFibonacciSequence()
{
    yield return 0;
    yield return 1;

    int previous = 0;
    int current = 1;

    while (true)
    {
        int next = previous + current;
        previous = current;
        current = next;
        yield return next;
    }
}

The awesome thing is that it’s an infinite loop! It would obviously never return if the function didn’t behave like a coroutine: it returns the next number to the caller and only resumes execution when the caller asks for another number. It’s like you’re “streaming” the Fibonacci sequence!

You can stop any time you want. For instance:

int count = 20;
foreach (int i in GetFibonacciSequence())
{
    Console.WriteLine(i);
    if (--count == 0)
        return;
}

Or, even better, using some of LINQ’s extension methods:

using System.Linq;
foreach (int i in GetFibonacciSequence().Take(20))
{
    Console.WriteLine(i);
}

Performance Gains

There are many advantages to using the yield statement, but most C++ programmers are not really swayed by arguments of coding terseness and expressivity, especially when it involves “black magic” going on inside the compiler2: they usually mostly, err, yield to performance related arguments (see what I did, here?).

So let’s write a simple program that generates many “widgets”, each 256 bytes in size, and print the peak memory usage:

class Widget
{
    private byte[] mBuffer;

    public Widget(int size)
    {
        mBuffer = new byte[size];
    }
}

class Producer
{
    // The old, classic way: return a big array. Booooorriiiiing.
    public IEnumerable<Widget> ProduceArray(int count, int widgetSize)
    {
        var widgets = new Widget[count];
        for (int i = 0; i < count; i++)
        {
            widgets[i] = new Widget(widgetSize);
        }
        return widgets;
    }
    
    // The new funky, trendy, hipstery way! Yieldy yay!
    public IEnumerable<Widget> ProduceEnumerable(int count, int widgetSize)
    {
        for (int i = 0; i < count; i++)
        {
            yield return new Widget(widgetSize);
        }
    }
}

class Program
{
    static void Main(string[] args)
    {
        int size = 256;
        int count = 1000000;
        ProduceAndPrint(false, count, size);    // LINE 7
        Console.WriteLine("Generated {0} widgets of size {1} (total size = {2})", count, size, count * size);
        Console.WriteLine("Memory Peaks:");
        var process = Process.GetCurrentProcess();
        Console.WriteLine("Virtual MemoryttPaged MemoryttWorking Set");
        Console.WriteLine("{0} Mbtt{1} Mbtt{2} Mb", process.PeakVirtualMemorySize64 / 1024, process.PeakPagedMemorySize64 / 1024, process.PeakWorkingSet64 / 1024);
    }

    static void ProduceAndPrint(bool useEnumerable, int count, int widgetSize)
    {
        var producer = new Producer();
        if (useEnumerable)
        {
            int i = 0;
            foreach (var w in producer.ProduceEnumerable(count, widgetSize))
            {
                ++i;
            }
        }
        else
        {
            int i = 0;
            foreach (var w in producer.ProduceArray(count, widgetSize))
            {
                ++i;
            }
        }
    }
}

This prints, on average:

Generated 1000000 widgets of size 256 (total size = 256000000)
Memory Peaks:
Virtual Memory          Paged Memory            Working Set
488572 Mb               299544 Mb               293100 Mb

Now, on line 7, change false into true. This will make the Producer return a yield-enumerated bunch of widgets, instead of returning a big array. This prints:

Generated 1000000 widgets of size 256 (total size = 256000000)
Memory Peaks:
Virtual Memory          Paged Memory            Working Set
133564 Mb               14156 Mb                12984 Mb

Woohoo! That’s almost 4 times less virtual memory used, and a 25 times smaller working set! All of this because the garbage collector can now kick in during the enumeration and discard some of the widgets we’re not using anymore. You can actually print the number of garbage collections triggered using GC.CollectionCount(0) (all those widgets will be in generation 0 because they’re short lived). In the first case (returning the full array of widgets) I usually get 49 collections. For the second case (returning widgets one by one) I usually get 66 collections.

Of course, you may get frightened by all those garbage collections that would slow down your loop (C++ programmers are easily scared by garbage collectors), and that’s a legitimate concern if you’re writing a real-time ultra-sensitive application (although you’d have to check first that it would indeed be a problem). But for other kinds of applications, it can be a real life-saver – like for instance when you’re fetching huge amounts of data from a database of some sorts, and although each piece of data fits in memory, the whole batch doesn’t.3

More!

If you want more meaty stuff on all this, I recommend reading Raymond Chen’s series: part 1, part 2, part 3 and part4. You can also have a look at the fun traps that await you as told by Eric Lippert: part 1 and part 2.


  1. “pseudo” because the actual compiler-generated implementation uses a simple state machine. ↩︎

  2. It’s not even all that black, since you can easily look at the generated code, but it does feel magical! ↩︎

  3. This is obviously supposing you don’t have referencing between each item, but even then you could use additional lazy-loading and caching to still only have a subset of your whole dataset in memory. ↩︎