“Methodological Irregularities in Programming-Language Research”

The title of this article caught my eye, not only because of the implied scandalous “irregularities”, but because I didn’t know we actually have “methodology” for programming language research, regular or not.

So what are Stefik and Hanenberg worked up about [1]?

“Given the substantial economic investment in soft- ware and its importance to all aspects of society, one would expect the industry to use rigorous empirical methodologies to ascertain whether the benefits of eliminating or modifying programming languages or introducing new ones outweigh the disadvantages.” ([1], p. 60)

Hmm.  I’m not aware that anyone generally considers cost-benefit analysis for software, let alone programming languages.  In my own experience, many programming languages are created as personal projects, or to support a particular system.  Sefik and Hanenberg reports about the literature of empirical studies of software engineering tools certainly support this intuition.

They also point out that many aspects of programming language design, including which features to add or remove, seem to have no empirical foundation at all. I’d say that stuff that gets done reflects the personal preferences of the engineers and funders.

Stefik and Hanenberg note that much of the argument for and against language features is mathematical.  This, too, reflects the interests of the designers, but isn’t especially relevant to users.  Personally, I don’t really care how pure the formal semantics of my languages is, if I can’t grok how to get stuff done.

Tellingly, conferences such as OOPSLA define acceptable evidence to include “proofs, implemented systems, experimental results, statistical analyses, case studies, and anecdotes.”   Long ago when I was a Psychology major, anecdotes were not considered evidence.   And, S&H  note that many reviewers have little understanding of how to evaluate empirical work—so there is a need to upgrade training.

Of course, this issue extends far beyond the design and development of tools.  It is well known that academic journals and conferences focused on human computer interfaces have warped notions of evidence. As everyone knows, the CHI conference requires most papers to have “empirical” data, whether that makes sense or not.  And much of the conference is filled with pointless (and shabby) “user studies”, which were done only to check a box.  Other papers, with equally outstanding ideas are rejected for lack of such “empirical” data.  Sigh.  No wonder user interfaces suck.

(And, of course, a lot of performance evaluation, AKA benchmarking, is funded out of corporate marketing budgets.  You know that’s gonna be open minded, let the chips fall where they may, empirical research.)

So what is to be done?  The authors call for adopting standards for methodology akin to medical testing. Control groups!  What a concept!

I’d add that part of the problem here is a lack of clear thinking about who are stake holders in this software.  Studies of the developer and a couple of friends isn’t a serious consideration of what populations should be tested, or why.  For that matter, decisions made for strategic business reasons (e.g., to promote and lock in a platform, or to cut back end costs) are obviously ignoring all kinds of other stakeholders (customers, to start with).

This problem becomes very difficult very fast.  Just exactly who is going to be affected by changes to software?  That’s not easy to say.  And even when you know part of the answer, just how can you get representative data?

Imagine that you wanted to empirically demonstrate that a change to some internet service is beneficial or at least not harmful.  Who should be the relevant population?  The whole internet?  And would a relevant control condition be?  The internet “as it is”?

I honestly don’t know how far Stefik and Hanenberg’s suggestions can really get us.  But it would certainly be a very good idea to force ourselves to think about how to do it, even if we can’t really do it very well.


  1. Andreas Stefik and Stefan Hanenberg, Methodological Irregularities in Programming-Language Research. Computer, 50 (8):60-63, 2017. https://ieeexplore.ieee.org/document/7999115

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.