Thursday, August 2, 2007

The Right Tool For The Job: Scripting


Though it's barely planned
The kludgiest of Perl scripts
Is one day maintained


I've been learning Perl lately, after having used Python wherever possible for a couple of years. It's gut-wrenching. So today's pedantry is on the topic of scripting languages -- interpreted, batteries-included, "!#/usr/bin/env"-ready languages for getting a simple job done with a minimum of hassle, as I'm defining it.

Google for "little $LANG script", in quotes, replacing $LANG with each of the most well-known scripting languages. My results:

Table 1:

$LANG @Hits
===== =====
Perl 32,300
shell 24,400 * what does this mean, exactly?
PHP 15,500
Python 12,000
VB 1080 * Skewed, because "vb script" is also a language
bash 808
batch 624
Ruby 511
Tcl 411
js 271
sh 266
vim 76
C++ 7
scheme 7
lisp 6
emacs 3
haskell 3


To further abuse Internet statistics, let's search for each language on Google Code:

Table 2:

$LANG @HITS
===== =====
C++ 6,000,000
Perl 1,420,000
Python 1,050,000
PHP 1,590,000
shell 879,000
Ruby 304,000
Lisp 238,000 * includes elisp
Javascript 212,000
Basic 202,000
Tcl 186,000
bat 183,000
Scheme 103,000
Haskell 67,700


Now, combine these two tables to get a ratio representing the "scriptability" of each language. Or rather, divide the Google Code hits by "little script" hits to get a "Script Factor" inversely proportional to the fraction of existing code that qualifies as little scripts. This is hard science.

Table 3:

$LANG Google Code "Little" Script Factor Notes
==== =========== ======== ============= =====
shell 879,000 24,400 36 * "Shell" is vague
Perl 1,420,000 32,300 44
Python 1,050,000 12,000 88
PHP 1,590,000 15,500 103
Basic 202,000 1080 187 * Includes non-Visual basics
Batch 183,000 624 293
Tcl 186,000 411 453
Ruby 304,000 511 595
Javascript 212,000 271 782
Scheme 103,000 7 14,714
Haskell 67,700 3 22,567
Lisp 238,000 9 26,444 * Includes elisp and common lisp
C++ 6,000,000 7 857,143


Interestingly, this is close to the first list of "little script" languages, with the three P's right up top. The functional languages I threw in for fun are ranked by absurdly small denominators, so I wouldn't
say the results are meaningful beyond indicating that even the hardcore people using these languages for real projects are using P-languages and the shell for simple scripts.

What does this all mean?

  • Scripters are using the right tool for the job. Good scripting languages float to the top.

  • Even the most hardcore Lisp and Haskell programmers use something else for scripting. In other words, they know multiple languages, and they, too, use the right tool for the job.

  • There are seven idiots in the world writing scripts in C++. One would only do this if unaware of any other scriptable language, and therefore capable of using only one tool for any job.

  • Emacs users call their customizations "packages" or "modes," not "scripts." Foiled.

Now let's get specific.

Shell

"Shell" came in first in scriptability, second-place in "little $lang script," but merely fifth in Google Code usage. So shell scripting is a popular way to get things done, but not so much for writing full-on applications.

What language is shell scripting, exactly? I'm assuming the search hits refer to bash, ksh, csh, zsh, and the rest of the Unix shells, mostly because that's how it showed up on Google Code, and because bash seems to be the default on the major Linux distros. Plus, Windows programmers don't talk about the "shell"; if they wade into the muck of cmd.exe at all, they call it batch, DOS, or occasionally command-line scripting. And they don't talk about it online as much as Unix/Linux gurus, outside a few Microsoft-specific websites, from what I've seen.

The strengths of the shell are (1) everything is a string; (2) courtesy of Unix design, the sources and recipients of character streams consistently look like filenames; (3) complex programs can be used like functions and filters, directly adding to the shells abilities (the ultimate FFI, in a way); (4) since code can be data and commands can be piped and redirected around, flow control can be pretty concise. The flaws, as I see them, are (1) everything is a string, meaning nontrivial structures must be serialized and parsed at every step; (2) there are few guarantees about what's actually available to the shell on a given system -- paths, environmental variables, program versions -- so sharing scripts between systems is wildly unreliable. Still, I've never seen a GUI tool as broadly useful as the shell is for getting computer tasks done.

Perl

Legend has it that Larry Wall designed Perl to pull together all of the various Unix sysadmin tools into one effective package, with the plan for it to be especially useful for text manipulation (Reporting and Extraction). So C, bash, awk, sed, grep, and friends are all in there -- in short, it keeps the shell's advantages and does its best to eliminate the disadvantages. (Best of all, it finally got regular expressions right.) And then there's CPAN. I'm not surprised that Perl is #1 for "little scripts" that are just complex enough to be worth saving.

What is Perl the right tool for?
  • One-liners that bash doesn't have an equivalent for -- Perl is installed almost everywhere bash is
  • Straightforward text-processing scripts (Python's immutable strings are a weakness here, and Ruby installations still aren't a universal default)
  • It was a great server-side scripting language during the first dotcom boom (though Java managed to cast itself as the more legit (enterprisey) big brother here). Since Perl coders weren't afraid to get things done "right now," mod_perl made the combination of Apache and Perl effective, scalable, and most importantly, available just when it was needed.

Python

Python fixes Perl, says the next legend. But its strength as a scripting language is that it fixes Java, too -- and as it turns out, Python's "scriptability" is exactly half that of Perls. Spooky, no?

I like Python. It makes sense to C programmers and Unix hermits. And, thanks to Guido's diligent attention to aesthetics, ugly Python code almost always means you're doing something awkward, slow or wrong. The language rewards good behavior with readable, concise code. You know that whitespace issue where if you copy code from a forum and paste it into your own code, the interpreter will crap out on the indentation? It's punishing you for blind copy-and-paste. Doesn't that creep you out a little? Guido is basically handing out candy if you read the documentation on generator expressions, and slapping you on the wrist if you don't read your own code before running it.

There doesn't seem to be a single theoretical approach that guarantees a language will work that way, but for Python, it seemed to work.

What is Python the wrong job for?
  • One-liners -- remember that thing about whitespace?
  • Unix tasks that have already been thoroughly solved with existing command-line tools (see
    Bash).
  • Number crunching (by itself, but see SciPy and Parallel Python). Python 3.0 borrows most of Scheme's numerical tower, so that may improve the situation.

Wednesday, August 1, 2007

Upgrading Ubuntu

Upgrading Ubuntu between major versions is a game. Your machine was running smoothly before you ran update-manager and plunged into a massive set of repository changes and software upgrades, so clearly, your machine is also capable of running the next major version of Ubuntu. The game is when something breaks during the transition. It's randomized to make it more of a challenge, so you can't just look up a walkthrough on the Ubuntu forums or Gamefaqs. Somewhere, a config file was mangled, or the flags on a low-level program changed and a caller failed to compensate for the new configuration. Now, Edgy is counting on you to use your command-line skills to track down the culprit and make him pay.

New installations from a CD are a breeze these days (as of Breezy...), assuming you've already tried the live CD on your system and it seemed to work. Before Dapper Drake, nobody (meaning, not myself) expected dist-upgrade to work without a hitch. Ubuntu was a fresh new operating system, cranking out new major versions every 6 months or so, and if you're into trying out new OSes, you've probably quickly learned, the hard way or the easy way, to use a fresh CD if you need your computer up and running again today. If the game of fixing a broken system is too frustrating, your live CD will save you: Death from above to the existing installation, and reinstall from scratch. If you have a separate /home partition, you're in good shape. If you need to rescue some files before the great annhilation, the live CD helps you there, too.

But as of Dapper Drake, the game is winnable for most users. Since Dapper is designed for long-term support, update-manager normally doesn't offer the option to do a dist-upgrade through the GUI. As I recall, dist-upgrade didn't really work either in October 2006 when Edgy was released, either. Dapper took 7 1/2 months to finish instead of the usual 6, and Canonical compensated by pulling together Edgy in 4 1/2 months. So, there was some revolting hack posted on the Ubuntu wiki for getting 'er done. Perform the listed incantations, and you end up with a system that should be pure Edgy, but in practice dies horribly. For me and others, X died, and the command-line interface mostly died, too. There was a prompt with strange terminal font rendering, and the shift key had inscrutable behavior, so if your password involved that advanced functionality... well, shoot. I think I used the recovery mode and ran variations on apt-get to finish the upgrade and get Ubuntu back on its feet again.

I've been involved in two Dapper-to-Feisty upgrades this week, and the game is prettier to look at now. Better graphics, flashier bad guys, improved gaming experience overall. On my veteran P3 desktop, a handcrafted relic from the 20th century, I rediscovered the power cord and brought it back into service. Yep, still a computer. Standard Ubuntu Dapper. Before getting my current Toshiba POS vintage 2001 laptop, I used this box for trying out exciting new Linux distros, and had kind of a stormy relationship with Synaptic. Some things are installed that shouldn't be, in strange ways, with hand-mangled config files. Sounds like a good candidate for the newly streamlined Dapper-to-Edgy upgrade process:

sudo gdsu "update-manager -c"

This launches update-manager with a graphical prompt for superuser privileges, and tells it to check for distro upgrades. Type a password, click the shiny buttons, let chill for 2 hours before rebooting.

On ye olde Pentium the Third, it worked pretty well. Some packages were broken, but X launched and Gnome loaded -- with a couple of angry message boxes letting me know that gnome-panel was disgruntled. Helpfully, Bug Buddy popped right up to tell me that it couldn't do anything useful for me or the developers, and allowed me to close it. The Gnome panel then restarted and crashed again, launching another tragic Bug Buddy, ad infinitum. At first I was fooled into thinking Bug Buddy was a modal dialog on top of all of Gnome, locking me out from doing anything else, but in fact it was just the panel that was broken, and Bug Buddy could safely be ignored. The desktop and other applications still worked.

Having failed to set a memorable key binding for xterm, and lacking the initiative to look up the built-in way to do it (didn't know about Ctrl-Alt-F2 and virtual terminals yet), I cobbled together a desktop icon to launch xterm, and used that to run update-manager:

sudo update-manager

This works in Edgy because here, update-manager checks for distro upgrades be default. If I understand correctly. Anyway, it worked, I followed the graphical upgrade sequence again to end up with a fully functioning Feisty Fawn, and I didn't even have to get Edgy working properly. Most of the relevant configuration files get updated in the upgrade process, so whatever had previously been mangled was corrected in the wonderfully silky-smooth Feisty upgrade process.

Back in 10/2006, I recounted my upgrade woes to my sister, probably finishing the story with "it wasn't really that bad," and she opted to stick with Dapper until a better upgrade path came along. Well, that path came along. She was dismayed to find that you can't go straight from Dapper to Feisty, but decided that Feisty was worth it. The specific motivation: Her laptop (newer than either of my machines) uses certain lame components that don't have particularly good driver support in Dapper, and she believes her wireless and ethernet connections will behave better in Feisty.

I wasn't there for the prologue, but we teamed up for the big game this morning. This time, it really wasn't that bad at all. Gnome came up correctly, and since the only program we need access to in Edgy is update-manager, that was the immediate next step.

The GUI option to upgrade to Feisty didn't come up. Strange. Something's wrong. On updating the archive list, we saw a stream of errors connecting to the repositories -- OK, now we know what the game is. The Internet died. More specifically, since this is a wired connection and she doesn't use NetworkManager, the mangled config file is /etc/network/interfaces.

For newer Ubuntu users, this is how you find out what happened to your internet access:
1. If you're using wireless, type iwconfig to see what wireless devices are active. Or, type ifconfig to see what all your networking devices are doing.
2. If you see warning messages, pay attention. If you see just a list of devices and no sign of activity, and you're not using NetworkManager, the file, your network interfaces config file is wrong.
3. sudo gedit /etc/network/interfaces (replacing gedit with your text editor of choice if you care)
4. Fix the config file. Use the Ubuntu wiki or an online search if necessary. (Obviously, from another computer.) If you're lost, you can erase whatever you don't understand and go through the Networking GUI to rebuild it.

As usual, the next upgrade from Edgy to Feisty was clean and uneventful. The only complaint I heard later was that the wireless situation was about the same level of fussiness as before (I still don't know the details). However, Network Manager seems to be alive by default in Feisty -- or else I didn't notice when it was installed -- so there are more options for fussing with the wireless connection right off the bat. Fuss, fuss, dhclient, fuss, restart, gold. So, I think we're in a good place now.