Sunday, March 8, 2009
Mnemosyne: Getting Things Memorized
That brought to mind a Wired article about Piotr Wozniak and his spaced-repetition memorization program, SuperMemo. When I originally read the article I wasn't in grad school and didn't have an urge to memorize any particular list of things. Anyway, SuperMemo appeared to be Windows-only software, and an algorithm like this would be more fun to code from scratch anyway. Enough fun, really, that there had to be one or two open-source implementations floating around.
Mnemosyne popped up as the closest match in an Ubuntu package search, so I'm running with that. Putting the flash cards together was pretty simple; I was able to do it in a few minutes from inside the program and export it in the standard XML format. I zipped it up with a quick plain-text README and uploaded it to the project home page as the Amino Acids card set.
The content came from a slide in a lecture, and I did a quick sanity check on Wikipedia before uploading. The notation for the 20 standard amino acids is complete, and that was the main goal of this. The assignment of amino acid "groups" seems to be a little arbitrary, depending on the source (by structure, functional groups, chemical properties, etc.), and I tried to make the categories complete without too much overlap -- there's a small deviation from my slide here. I also added another category for "side chain properties", pH and polarity. Another enhancement might be the standard codons for each amino acid, though I'm not sure I want to deal with that yet.
Saturday, February 7, 2009
Carrots and sticks
What's old is new again:
Professor makes his mark, but it costs him his job
In Zen and the Art of Motorcycle Maintenance, Robert Pirsig mentions his own experiment in withholding grades at a university. He didn't just announce on the first day that everyone would get an A+ (that seems gimmicky), but since it was a class on rhetoric, he spent the course developing an argument for eliminating the grades-and-degrees system and discussing it with his students. Most students were unenthusiastic or opposed -- grades and degrees are what they came for.
He assigned, collected and graded papers but returned them to students with only the comments, not the grade.
At first:
- A-students were annoyed, but did the work anyway
- B-C students blew off some assignments
- C-D students usually skipped class
He observed this and changed nothing. If students acted up, he let it slide.
Around 3-4 weeks into it:
- A-students got nervous and pushed themselves harder, in class and in papers
- B-C students saw what the try-hards were doing and returned to the usual level of effort
- C-D students who had committed to ditching would occasionally show up out of curiousity
And finally:
- A-students relaxed and began enjoying the class as active participants. In a final essay, still not knowing what their grades were, these students favored eliminating grades by 2-1.
- B-C students saw this and freaked, putting an unusual amount of effort into their work. Eventually, they joined the A students in engaging class discussions. These were evenly divided over the issue of eliminating grades.
- C-D students, or those who attended, also saw this and began trying to hand in reasonable work. Those who couldn't hack it freaked even more, and remained in a state of Kafkaesque terror until the quarter mercifully ended. Naturally, in the final essay these students were unanimously opposed to eliminating grades.
Interesting as this result was, he reverted to the regular grading system the next quarter because he couldn't provide any alternate goal for students -- those who can recognize quality in their own work don't need the university; those who can't need something to work toward, or they don't progress.
Saturday, October 25, 2008
Psychology research with Mechanical Turk
Elevator pitch: There's a missing gear in the machine of psychology research. Every significant human study requires weeks or months of data collection, and more time coding that data in a form that can be analyzed statistically. This makes it infeasible to do the sort of fast, iterative refinement of models that biology has seen in recent years.
Amazon's Mechanical Turk provides the missing piece. It provides an accessible interface for building a survey, interactive test, or other psychological measure, pushing it out to thousands of participants, quickly returning the results to the researcher in electronic form, and screening out unusable data. It's flexible enough to allow screening and debriefing, and gives access to a vastly larger pool of participants than Experimetrix. And it's cheap.
Background
First, take a look at this: Ten Thousand Cents.
When I first heard about bioinformatics I was under the impression that it was the exponentially increasing power of computers that made it irresistible to start using them for biological research. But actually, it was pretty much the reverse -- high-throughput experimental methods like gene sequencing, mass spectrometry and X-ray crystallography generated too much data for humans to process manually. Computers were only barely able to handle this workload in 1986, when the human genome project started -- scientists just did what was needed to move around the mountains of data coming out of their experiments. Similarly, new computational research is coming out of the Large Hadron Collider project now.
Psychology researchers (especially in social psychology) currently spend semesters at a time gathering data for their studies and converting it into data that can be quantitatively analyzed. High-throughput experimental methods are scarce and expensive, so there's no "data glut" driving the development of better information-management methods. Progress in the field is slow and lossy -- since there's not much demand for the raw data, conclusions are described qualitatively, which makes it hard to use prior results as a solid foundation for future work.
With Mechanical Turk, it's possible to do in one shot a study that would otherwise require a meta-analysis of several studies across particular locations or demographics. With more consistent data and larger populations, data can be reusable.
How it could work
If it fits behind a web interface, or can be described and completed with plain language or pictures, it can be done with Mechanical Turk. Necessarily, a form of consent can precede the main task, and a blurb of debriefing can finish.
To get a feel for how it's done, read this article: The Faces of Mechanical Turk
Naturally, the first study done this way should be something to determine how the population of Turkers corresponds to the general population and the student populations that have already been characterized in previous studies. A public-domain measure of the Big Five or something like the Narcissistic Personality Inventory would be good candidates. Then, let slip the hounds of statistics. Are Turkers as representative of the general population as psychology undergrads? More so?
Some research along these lines has already been blogged here: Mechanical Turk Demographics
Now, let's try some examples.
Surveys: You craft a survey, Turkers take it, and you retrieve and filter the results through the Mechanical Turk interface. Pretty straightforward, no?
Interactive tasks: This is what Mechanical Turk is designed for; only, the focus was expected to be the task, not the Turker. Anyway, the data's yours. An example of a task like this would be a simple, unbounded game (Flash or JavaScript) that the participant can quit any time (possibly paired with another stimulus). The returned data would be the play duration alongside any personal or demographic information requested.
Coding visual or audio data: Following the original intent of Mechanical Turk more closely, this application of the service distributes a repetitive task normally performed over several weeks or months by the researcher or a group of grad students. Rather collect new data about a participant, this simply boils down a vast quantity of data that's already been generated -- this is a problem we want to have. A two-step example: (1) run a Mechanical Turk task in which participants draw or assemble an arbitrary image; (2) run a second task with a different set of participants who look at these images and code (type or select) the relevant traits they see in the images.
Measure development: One of the more uncomfortable questions in social psychology research is the validity of personality measures. Devise a series of questions and a method for tabulating the results; run it on some participants; analyze the results to get some answers. But, what's really being tested here -- the population, or the measure? Tragically, there's no time to refine the measure very much; if the results are useful, you run with it. But! With Mechanical Turk, collecting survey results is cheap and quick; and since the general format of the survey isn't changing between revisions, the same set of statistical transformations can be applied programatically to each iteration of the survey.
This is a great way to build a psychological measure that you can be confident in: Push an initial draft of the measure out to Turkers, receive some results, perform a statistical analysis and save the operations as an R or SPSS script. Then, manually refine the measure, put it back on Turk, filter the new results through your analysis script, and repeat until it looks good. This can get as advanced as you'd like -- start with several times as many questions as you'd like to see in the final survey, then automatically dispatch random subsets of the question list to Turk, filter through your automatic analysis to get some scores indicating quality, and use a Bayesian classifier to narrow down the best possible subset of questions.
Update: Here's a conference paper on the same topic.
http://www-users.cs.umn.edu/~echi/papers/2008-CHI2008/2008-02-mech-turk-online-experiments-chi1049-kittur.pdf
Friday, March 7, 2008
Vimming your way to the top
http://www.vim.org/scripts/script.php?script_id=2599
Benefits:
- Different colors for lines ending in ':' or starting with '*' or '{'
- Assign keywords to be automatically highlighted, like important locations, coworkers' names, customers, taquerias, etc.
- Start sections with a line of underscores and a heading beginning with the '{' character. The heading stands out (red with GVim's "desert" color scheme), and you can jump between sections just like C blocks using ]] and [[ keystrokes.
- Ordinary text (i.e. not specifically formatted for this syntax) looks sane.
Update (4/2/09): I uploaded the script to vim.org, where it will be easier to track and update.
Saturday, January 19, 2008
On Blub
http://weblog.raganwald.com/2006/10/are-we-blub-programmers.html
Blub Theory evolves over the course of the comments. First off, the requirements for Blubbiness are defined as:
- There's at least one language that's worse for the task at hand, and the programmer realizes (validly) that it's less suited for the task than Blub.
- There's at least one language that's better for the task, and the programmer doesn't realize that it's better for the task than Blub.
Another commenter glances on the management perspective of Blub:
I misread "Blub" to be "bulb". As in when a programmer burns out his employer just throws him away and screws in a new one.'Nuff said.
When we're not trying to pin it down too carefully, we know exactly what Blub is. The first Blub was Fortran — and in some circles, it's still Blub. Guy Steele Jr. (who was involved in the design of Scheme, Common Lisp, and Java in previous efforts to take down previous Blubs) is currently working on Fortress with the same goal. Fortress looks nothing like Fortran, but the name's close enough to get the point across, with the point being that it's intended to be much better suited for tasks where scientific programmers instinctively reach for Fortran. C++ was Blub for the '90s, and since real Blubs never die, it's still the Blub of choice for most performance-critical stuff. Java out-Blubbed C++, and now Java and C# are splitting the Blub market. However, examples of Blub code are still generally a simplified C++, since the equivalent in Java would take too much boilerplate to be worth the column space.
(Sidebar: Modern Fortran looks almost nothing like the original Fortran. C++, especially variety found in Visual Studio now, and with the 200X extensions, is also a mutant. And it's not a superset of C99, either, defeating the original purpose of the language. Visual Basic, VBA and VB.NET are not compatible, despite the naming. How can a language take its users for such a ride over the years when another language with a different name might be the more logical next step? Javascript and C# surely got a bit more mileage on name recognition. All the evidence points to human psychology still working on programmers.)
But the argument breaks down when we try to explain why another language is better suited for the task than Blub. There are C++ programmers out there who can bust out a better program than you can write with any other language. Out of the last 10 years of the ICFP Programming Contest, 3 winners used Haskell, 3 used OCaml, and 2 used C++ — and this is a contest arranged by functional-programming gurus. Python and Ruby have never received any prizes, though Perl was recognized by the second-place team last year.
I see two axes to evaluate languages on: something like the front end and the back end. Semantics and implementation. Both are labeled "power", but for the front end that implies what the language does for the programmer, and for the back end it's what the program does for the machine.
Example 1: Ruby has an excellent front end. It's one of the most expressive languages available; that's probably why 37signals picked it for Rails. The best semantic ideas from Smalltalk, Common Lisp and Perl are all in there; most of the famous Design Patterns are built in either implicitly or explicitly. (It's not that the language makes them all obsolete; the language designer just had the foresight to implement the tricky ones for you.)
But the back end has been playing catch-up. Performance lagged well behind even Python and Perl until the very recent v1.9, and there's no native-code compiler. I could be misinformed, but I've also heard that: threading suffers the same issue as Python of limiting the interpreter process to a single processor (or so I've heard); there's no built-in foreign-function interface a la Haskell or Python's ctypes module; Unicode support has rough spots; and large-scale parallelism and concurrency basically mean running a bunch of separate Ruby processes.
There's a lot for a Blub programmer to pick on.
Example 2: Delphi, a.k.a. Object Pascal, has an excellent native-code compiler, with support for cross-platform compilation and single-file (DLL-free) executables, and can also run on .NET, with all these options available through the same IDE. It's competitive with C on benchmarks, often faster. Integration with databases and other external components is solid. Refactoring tools are included with the IDE, lots of fun with static analysis. Object Pascal itself was originally designed at Apple some years before Borland picked it for their offering, then abandoned, but there seems to be something inherent in the language that enables highly optimized compilation. The Free Pascal implementation, for instance, comes well ahead of every other language in the Computer Language Benchmarks Game when memory and speed are weighted equally. On the combined benchmarks, Free Pascal uses only half as much memory as C (gcc)!
The catch is, Object Pascal is a cheesy-looking language. On the same benchmarks set, comparing the size of the code in gzipped bytes (emphasizing tokens instead of characters), Object Pascal comes in 24th out of 33 languages, just behind C. It beats Fortran, Java and C++, but not C#. I think I'd just buy more RAM rather than rewrite a Blub program in Object Pascal.
The benchmarks tend to be heavy mathematical algorithms, rather than general-use applications, so certain things like I/O, libraries and support for bottom-up programming and meta-programming are discounted. Regardless, Python, Perl and Ruby are the top 3 languages for code size on these benchmarks — I think Lisp was hurt more by this aspect of the benchmarks, since syntactic sugar isn't built in; there's no room for code reduction via mini-language. Haskell was probably helped by the absence of I/O. In general the benchmarks show that Blub languages perform well but are somewhat verbose, while scripting and Web-friendly languages are concise but have poor performance; Prolog ranks badly in every way, while OCaml and Haskell do well in every way; this fits reality fairly well for number-crunching but not for the Web or AI. Let's acknowledge once again that benchmarks aren't perfect, and forge ahead.
Fact: A language and a compiler are not the same thing.
But the Object Pascal example should show that a language can baby the compiler to give better results. The three arguments go:
- Declaring static types and generally programming close to the metal gives the compiler the information it needs to generate an optimally efficient program. That's why C can be fast — it's a close fit to the hardware. Same goes for Java and the JVM.
- Using the right abstractions and strong type inferencing lets the compiler get a high-level view of what your algorithm is doing, allowing it to do more optimizations itself. That's why OCaml and Haskell can be fast — they're a close fit to the pure algorithm.
- While the expressiveness of new languages like Ruby and Python is appealing, the race to incorporate imperative, object-oriented and functional programming styles into every major language is actually resulting in weaker languages. Borrowing features doesn't bring a language any closer to providing a new model of computation, and it certainly doesn't give a better angle of attack at the whole point of all of this — making the computer hardware do what we want.
"Programming languages appear to be in trouble. Each successive language incorporates, with a little cleaning up, all the features of its predecessors plus a few more. [...] Each new language claims new and fashionable features... but the plain fact is that few languages make programming sufficiently cheaper or more reliable to justify the cost of producing and learning to use them."
The talk that began with this argument went on to introduce function-level programming. At the time, everyone thought Backus was talking about functional programming, so it unintentionally gave a boost to Lisp and later the ML family, of which Haskell and OCaml are derived. But no: it was actually about a new language called FP, somewhat based on APL. FP begat FL, which went nowhere, but Morgan Stanley created an ASCII-friendly variant of APL called A+ (which is now free, GPL'd software), and the proprietary J and K have carried the torch since then. The use of these languages now seems to be mostly in the financial world. (Perhaps because it's really very well-suited to financial tasks, and perhaps because that's where APL made its splash — who knows, it may have even Blubbed its way in.)
The main idea is point-free programming: rather than pushing values around (as even functional programming languages do), compose functions together to create an algorithm that only references functions, not values. Then create a basic set of operators that can be composed together to create higher-level functions. This is an excellent way to do manipulate arrays and matrices. Haskell touches on this idea but doesn't emphasize it.
Benchmarks for any of these languages are hard to find, but I see one cryptic statement about K here:
[k is much faster than c on strings and memory management.]
And another startling statement on Wikipedia:
The performance of modern CPUs is improving at a much faster rate than their memory subsystems. The small size of the interpreter and compact syntax of the language makes it possible for K applications to fit entirely within the level 1 cache of the processor. Vector processing makes efficient use of the cache row fetching mechanism and posted writes without introducing bubbles into the pipeline by creating a dependency between consecutive instructions.
It looks pretty convincing to me. Finally, a fresh look at how programming languages make a machine do work.
This seems to be the argument missing from every language war: by removing the non-orthogonal parts of a language, it becomes more powerful. K doesn't have objects or continuations, and it doesn't need them. Likewise, Haskell restricts the ability to modify state to monads, Erlang's flow-control constructs throw out traditional iteration entirely, and Lisp virtually strips out syntax itself.
Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary.
The corollary to (and irony of) the Blub paradox is that since these optimized languages are missing constructs found in Blub — by design — a Blub programmer will always have plenty to pick on.