Submitted by egdaylight on

## Dated:

Before discussing several "category mistakes" in computer science in follow-up posts, it is much preferred to first introduce a few categorical distinctions and definitions and to subsequently test these concepts on the writings of computer science's greatests (i.e., the writings of the people I and many others admire the most). I shall take a 2012 letter which John Reynolds wrote in connection with my book *The Dawn of Software Engineering* (and in which he provided an extensive amount of valuable feedback about the history of programming languages).

Specifically, and in line with the philosophy of technology, a distinction can be made between at least three separate categories:

- computers (including laptops and iPads) are
*concrete physical objects*, - computer programs and also computer programming languages are
*technical artefacts*, and - Turing machines, finite state machines, and prime numbers are
*abstract objects*.

Note that stating the existence of abstract objects does not commit one to believe in a platonic realm. I thank the philosopher of computer science Giuseppe Primiero for bringing this observation to my attention. Note moreover that I am not merely distinguishing between "concrete systems" and their "formal models" but between three separate categories.

Further clarifications are in order:

- A
*computer program*refers to the class of programs in standard, often commercial, computer programming languages that can actually be compiled and run on a specific electronic device; i.e., a computer. - A
*mathematical program*is a mathematical model — containing a formal syntax and a formal semantics — of a computer program.

- A
*technical artefact*is a physical structure with functional properties [1].

Computer programs and corresponding computer programming languages are technical artefacts in that they only fulfill their intended function because of their actual physical structure. The physical manifestations alone, however, are not technical artefacts [2,3]. Computer scientists build technical artefacts (e.g., computer programs, data types, and the like) which enable them to “reflect and reason about them independently of any physical manifestation” [3]. Reflecting and reasoning are often done by resorting to mathematics; that is, by using one or *more(!)* mathematical programs for *the* computer program under scrutiny.

Now, is it true that computer scientists oftentimes refer to a computer program while they are actually referring to a mathematical program? To be more precise: are they actually referring to a text-on-paper representation (called a *paper program*) or a text-on-screen representation (called a *screen program*) of the mathematical program under scrutiny, instead of the *computer program* itself (which is something that resides electronically in a computer)? Or am I just being pedantic and none of this really matters or all of this is well known?

When computer scientists refer to their mathematical model, is it true that they oftentimes incorrectly think that they are also *directly* referring to the actual computer program? Do they distinguish between different representations of the mathematical program, such as a paper program, a screen program, and a computer program? Each representation is, once again, a technical artefact and *not* a mathematical object.

I'd like to know whether Raymond Turner is the only computer scientist (or one of few computer scientists) who thinks along these lines [3] or whether the vast majority of, say, programming language specialists (a) knows all of the above and (b) doesn't consider it all too important. (I would be surprised if all of the above is common knowledge because the work of Turner and his colleagues, referenced below, has only been published recently.)

I'm also trying to find out whether the best of the best in academia really understand the categorical distinction between a "mathematical program" and a "computer program". So, as a first attempt, I take an excerpt from John Reynolds's 2012 letter. I had difficulty comprehending this part of Reynolds's letter because I had by then already studied Donald MacKenzie's 2004 book *Mechanizing Proof* in great detail. Here are Reynolds's words:

As an example, it is likely that, in perhaps ten years time, when you download a program from the Internet, you will also download a formal proof that the program will respect safety conditions (e.g. no buffer overflow or dereferencing of pointers into the wilderness) that will ensure that it cannot disrupt the behavior of other programs running simultaneously. And your computer will check these proofs before running the program.

As an example, it is likely that, in perhaps ten years time, when you download a COMPUTER program from the Internet, you will also download a REPRESENTATION of a formal proof that the MATHEMATICAL program (which is *a* MODEL of the COMPUTER program under scrutiny) will respect safety conditions (e.g. no buffer overflow or dereferencing of pointers into the wilderness). This will NOT ensure but will definitely increase our CONFIDENCE that the COMPUTER program cannot disrupt the behavior of other COMPUTER programs running simultaneously. And your computer will check REPRESENTATIONS of these proofs before running the COMPUTER program.

- "will respect safety conditions" --> "shall respect safety conditions"
- "ensure" --> "increase our confidence"

*not*have any guarantee that the safety conditions

*will*be met at runtime by an engineered

*system.*Likewise, Reynolds's verb "ensure" is also frowned upon in the large world of software engineers. This has nothing to do with brevity. In retrospect, I find it extremely ironic that non-formal-methods people have to persuade formal-methods advocates, like Reynolds and myself, to be more precise in the way we formulate our mathematical findings.

*precisely one meaning*to each computer programming language. But from an engineering perspective that's not the right thing to do. Just like a computer can have different, complementary!, models of computation, a computer programming language can be modeled in different (complementary!) ways. The richness lies in the multitude of models. People like Edsger Dijkstra and Christopher Strachey each attached a different semantic model to the same programming language and similar things are happening today around the world with regard to C and other industrial programming languages. Like it or not, C is informal, it is

*made*(by humans), and we can mathematically model it in

*many*ways.

This post served to illustrate that categorical distinctions are in order. I hope to eventually convince the reader that they should be made for the sake of clarity, for the sake of making sense to both outsiders *and* ourselves.

**References**

[1] P. Kroes. *Engineering and the dual nature of technical artefacts*. Cambridge Journal of Economics, 34:51–62, 2010.

[2] N. Irmak. *Software is an abstract artifact*. Grazer Philosophische Studien, 86:55–72, 2012.

[3] R. Turner. *Programming languages as technical artefacts*. Philosophy and Technology, 27(3):377–397, 2014. First online: 13 February 2013.