to_lang 0.3.0: featuring batch translations and a command line utility

to_lang is a gem I wrote for doing translations with the Google Translate API. It adds magic translation methods directly to strings, so you can run things like "How's it going?".to_spanish and "I hope everyone is okay!".from_english_to_japanese.

I just released version 0.3.0, which adds two great new features: batch translations using arrays and a command line utility. All the methods that were previously available to strings are now available to arrays as well, so you can do this:

["Uno", "Dos", "Tres"].to_english
# => ["One", "Two", "Three"]

As you can see, this is much simpler than looping through a collection of strings and calling a translation method on each one. It's also much more efficient because it only makes one HTTP request to the API.

The command line utility gives you a quick and dirty way to run a translation directly from the shell. You run it like this:

$ to_lang --key YOUR_API_KEY --to es "hello world"
hola mundo

You can translate multiple strings at once by simply passing more parameters. If your API key is available in the environment variable GOOGLE_TRANSLATE_API_KEY, you can leave out the --key option. You can specify the source language with the --from option as well.

$ to_lang --from en --to es one two three
uno
dos
tres

Give it a try! I hope you find it useful and fun!

Contributing to guard-rspec

As is probably apparent from my last post, I'm interested in improving the front end parts of my Rails apps and smoothing out the rough edges of working with Heroku. Some big changes are coming in Rails 3.1, but in the meantime I've been looking at ways to make the process of asset compilation and packaging easier. The recent rewrite of this site is the first time I used Sass, and it's great, but it adds another step to getting CSS ready for deployment. The Sass gem has a command line tool to compile a Sass file into CSS. It allows you to watch a directory for changes and compile them automatically. Sadly, if it doesn't have access to FSEvents through RubyCocoa, it will constantly poll your disk for changes, and since I use Ruby 1.9.2 with RVM, this is the case I found myself in. As it turns out, wanting to improve the process for Sass led me to something entirely different.

I looked around for a better way to listen for file system events with Ruby 1.9.2 and discovered Watchr, which, when coupled with ruby-fsevent, allows a generic mechanism for running scripts in response to file system events. While reading about Watchr, I started to notice people mentioning a slightly newer alternative called Guard, which uses the confusingly not-identical rb-fsevent. The nice thing about Guard is that it runs gem extensions which encapsulate the logic for a particular task. The "Guardfile" simply invokes such a gem and tells it which paths to watch.

Guard has quite a few of these gems already, and the first to catch my eye was guard-rspec, a flexible alternative to ZenTest/autotest. I tried it out and discovered that it was a bit restrictive in which command line options you could pass to RSpec. Instead of allowing you to pass arbitrary arguments, it allowed only a few of RSpec's options via its own API. These options were then redundantly curried to RSpec. I prefer my RSpec output in the nested format, and because of this restriction guard-rspec wasn't ideal for me. I added a feature request for it on the GitHub issue tracker, but then I remembered Zach Holman's recent tweet: "Marginal programmers +1 an issue. Real programmers patch the issue." So that's exactly what I did.

I'd made a few minor contributions to other projects before, but this time I spent some decent time working on a revised API for guard-rspec which allows you to pass arbitrary arguments to RSpec. It took a few days discussing my changes with the two authors, and ultimately they refactored a lot of my code to their liking, but it felt great to be a part of a significant improvement to a project that lots of other people use. And as always, my Ruby skills are that much more deadly after sharpening them on the open source grindstone.

My changes are included in the just-released guard-rspec 0.2.0. Check it out!

The challenge of asset packaging on Heroku

Page load time is an important consideration in web application development. Users have an expectation that navigating a website should be fast, and many people will simply leave if it takes too long to load a page. Two ways to improve it are to minimize the number of HTTP requests and to minimize the amount of data transferred. Both of these can be improved by concatenating, minifying, and caching CSS and JavaScript files.

Rails has a handy feature that helps with part of this: the stylesheet_link_tag and javascript_include_tag helper methods accept a cache option, which will take all files passed to them and concatenate them into a single file (and single HTTP request) in the production environment. This is a big improvement, but it could be better. In addition to combining the files, we want to reduce the data transferred by running them through a so-called minifier, which removes whitespace, comments, and makes various optimizations like variable name substitution and function inlining. Lastly, the big challenge: we want to be able to do this on platforms like Heroku, where our ability to write to disk is highly restricted.

Read only file systems

The biggest issue for asset packaging when deploying a Rails app to Heroku is that, with the exception of the tmp folder, we only have read access to the disk. This means that the cache option for the asset helper methods will not work, because the concatenated files are written to disk the first time they're needed. The Rails helpers also don't offer the ability to minify the output file, so we'll need to look into a plugin-based solution for asset packaging.

There are quite a few asset packaging plugins out there, including asset_packager, heroku_asset_packager, heroku_asset_cacher, and Jammit. If you Google around on the subject, you'll also find a multitude of blog posts and discussions where people have written Rake and Capistrano tasks to jury rig a solution for this problem. Clearly there is no ideal approach yet. I think Jammit has come pretty close, but it still comes up against a brick wall on Heroku's read only file system.

Precaching

The most common suggestion I've seen is to precache the asset files, i.e., to generate them all on the local machine and commit them to the repository before deploying. With this approach, nothing needs to be written to disk in the production environment. The downside is that we now have artifacts from our build process in our repository's history, which is far less than ideal. Still, some find this to be an acceptable compromise, and all the Rake and Capistrano based solutions you'll see automate the committing of assets before deployment to make it a little less painful. If having your history dirtied doesn't bother you, you can probably stop there. Personally, I'm not satisfied yet.

Caching or precaching to tmp

Unlike the built in Rails helper methods, Jammit writes the cached asset files to a special directory at public/assets. Using Jammit's helper include_javascripts :some_package, for example, will create a script tag linking to example.com/assets/some_package.js. On the first request to this address, the request will be routed to a special Jammit controller that will figure out which raw files need to be packaged. It will run them through either the YUI Compressor or the Google Closure Compiler, with options we specify in configuration, serve the response to the client directly, and cache the output by writing it to assets/some_package.js. The next time the address is requested, Rack will see that the cached file exists, and serve that instead of routing to the Jammit controller.

We are faced with two problems with this process on Heroku. The first is that we can only write to tmp. The second is that Heroku lacks a JVM, which is used by both the YUI Compressor and the Google Closure Compiler. Currently, Jammit doesn't offer a workaround for either of these issues. It would require a configuration option to change the full file path for the cached assets, and an alternative minifier which works without a JVM. One possible solution is UglifyJS, which runs on Node.js, and is already being used for projects like jQuery. An interface to UglifyJS and Node might be provided by therubyracer-heroku and Uglifier.

Even if Jammit could write the cached assets to tmp, it's still not the best approach. tmp is not really intended for this purpose, as Heroku states in their documentation:

If you wish to drop a file temporarily for the duration of the request, you can write to a filename like #{RAILS_ROOT}/tmp/myfile_#{Process.pid}. There is no guarantee that this file will be there on subsequent requests (although it might be), so this should not be used for any kind of permanent storage.

The good news is that Heroku provides Varnish as an HTTP cache, so we should be able to use that instead of writing to disk at all. The first request for an asset package will hit the Jammit controller, which would add HTTP caching headers to the response. The next user that requests the packaged asset file will be served directly from Varnish, completely bypassing the application stack. And when the same user loads another page that includes the same asset package, the browser won't even request the file from the server because of the HTTP caching headers that have been set. Now that's efficient.

Busting the cache

Okay, we've got a good plan for caching asset files, but what happens when we update the content in those files? Without some intervention, the user will be served outdated content from the cache. The Rails and Jammit helpers solve this by adding a timestamp to the query string, created from the mtime of the file. After deployment, the old cached files are removed, and new ones are generated with a new cache busting string. The user's browser and Varnish will both see this as a new file, and request the new content. This is a pretty good solution, but still not totally ideal.

Because the cached assets are being recreated on every deployment, the mtime (and therefore the cache busting string) changes even if the contents of the files themselves don't change. Users are forced to redownload all the assets on the entire site again after each deploy, even if only one of them has changed. A better approach would be to use an MD5 hash of the file's contents as the cache busting string, so the query string only changes when the contents of the file change, and the asset files can stay cached across deployments. We'd probably also want some mechanism for remembering the MD5 for a particular asset file, or we'd have to get the MD5 every time a script tag was generated with one of the helper methods.

It's a tough problem

As evidenced by the multitude of plugins and scripts which attempt to solve this problem, it's a tough nut to crack. I think the current tools are good, but still not quite up to par. I will continue to investigate this myself, and will hopefully be able to whip up some code to contribute, but I hope the Rails and Heroku communities can really work together to find a solution for asset packaging and caching on Heroku that makes things as efficient and painless as possible.

Page 9