the refsmmat report - All posts

Further adventures with ICP accelerometers

Alex Reinhart — Sat, 14 Oct 2023 00:00:00 UT

In a previous edition, we discussed industrial accelerometers: sensitive instruments for measuring vibration inside structures, but also fun ways to listen inside watches, cameras, and watermelons. Unfortunately accelerometers are designed to connect to laboratory data acquisition systems that cost more than a hobbyist is likely to pay, so to use them, we need some kind of adapter to connect to ordinary audio recorders.

My previous project was a simple battery-powered adapter. It provides the 4 mA current source needed by ICP or IEPE accelerometers, at up to 24 V, from a single 9 V battery. A BNC connection supplies the accelerometer with power and receives its signal, and a 1/4" TS jack can be connected to an ordinary field recorder.

But as I tried using my accelerometers in the field, I found the setup too clumsy. A quick recording requires connecting the accelerometer to the adapter, connecting a short audio cable to the jack, connecting the other end to the recorder, and turning on the recorder. And besides the recorder’s battery, I have to worry about the 9 V battery, which is only accessible by unscrewing the adapter’s front panel entirely.

Most recorders can already provide power to microphones over XLR cables. Can we steal that power to supply a microphone, cutting out one battery and making a simpler, easier adapter?

XLR, phantom power, and ICP

Microphones for recording voice and music almost universally use 3-pin XLR connectors. One pin grounds the microphone; the other two are for a balanced signal. Their impedances to ground are balanced, so any electrical noise (from nearby electronics, cell phones, etc.) generates a nearly identical voltage on both signal lines. But the microphone outputs its signal as the difference between the two pins’ voltages, so noise subtracts right out.¹

It is a clever design, and it allows power to be supplied to microphones with active circuitry. In the P48 phantom power system, the recorder or mixer applies 48 V to both signal pins through 6.81 kΩ resistors. Again, the microphone signal is the difference between the pins, so applying this voltage is fine; the microphone can tap current off with similar matched resistors and power whatever preamplifier it needs. Under the specification, P48 devices should be able to supply up to 10 mA of current this way.

We need a way to use this current for ICP devices. ICP (Integrated Circuit Piezoelectric), IEPE (Integrated Electronics Piezoelectric), CCLD (Constant Current Line Drive), DeltaTron, and IsoTron are all brand names for a very simple power system that only requires two connectors, not three. The recorder is connected to the sensor with a simple coaxial cable: just one conductor and the shield. The recorder is a current source, not a voltage source, set to roughly 4 mA and capable of supplying at least 18 V (up to 30 V at most). The sensor consumes this current. With no signal, the cable typically carries 4 mA at about 12 V; when the sensor outputs a signal, the voltage fluctuates proportionally, with the current remaining constant.

For those of you who have DIYed microphones and are wondering how they can be powered off a current source, check out ESP’s Project 134 figure 4 for an example—it is surprisingly simple. But for the rest of us, we just need to know: How do we turn P48 phantom power into ICP and return the signal back to the XLR connection?

The adapter schematic

Our adapter will rely on a current-regulating diode, a cheap device that supplies a constant current almost regardless of the input voltage. I’m using the Semitec E-352, which supplies 3.5 mA. I’ll use one pin of the P48 power to supply the current, while the other returns the signal. Here’s the plan:

On left, the XLR connection to the recorder; on right, the BNC connector leading to the accelerometer.

We draw 48V from pin 3. D1 is the E-352, supplying a constant 3.5 mA to the BNC connector. The varying voltage on the BNC center pin is AC coupled to pin 2 of the XLR connector through C1. Pin 3 is kept stable by C2 acting as a low-pass filter. (Thanks to kennjava and Richard Lee on MicBuilders for pointing out the need for C2.) The capacitors must be rated to at least 50 V.

R1 and D2 are for safety. If the phantom power were turned on with no device connected via BNC, C1 could charge to 48 V; D2, a 30V Zener, ensures it can only charge to about 30 V, the maximum rating of most ICP devices. R1 discharges C1 and C2 after power is disconnected.

Those of you who have worked with XLR cables or DIY microphones will notice that this circuit is not balanced: pins 2 and 3 do not have identical impedance to ground. In principle, this makes the XLR output more susceptible to noise. But as we’ll see below, it will be plugged directly into the recorder with no cable. The XLR contacts will be shielded by the metal enclosure, and there’s very little opportunity for interference.

Building a real device

Next, I needed a convenient enclosure to fit the electronics. I discovered Neutrik’s NA-Housing, a simple extruded aluminum enclosure that accepts their D Shape connectors, including the NM3MD-B male XLR connector and the NBB75DSGB BNC connector.² The XLR connector can be plugged right into a recorder without a cable, making this a convenient recorder-mounted adapter.

For the wiring, I decided to go ahead and make a PCB. I’ve never made one before, and this is a simple enough starting point. The NA-Housing accepts 28mm wide PCBs in its slots, so I made a 28 by 35mm PCB in KiCad. No, my layout is certainly not optimal, but it almost certainly does not matter here.

The refsmmat XLR to ICP adapter PCB layout. The orange region is a ground plane on the back side of the PCB.

$7.60 to OSH Park later and I had three beautiful PCBs. Then it was simply assembly. Height of the parts is critical to fit inside the NA-Housing; fortunately, Nichicon UFG1J220MPM electrolytic capacitors fit as long as they’re not near the edges of the board.

And here are the results:

The guts of the adapter, ready to be sealed up inside its case.

The assembled adapter with spare PCBs, an Endevco 7251HT accelerometer, and a coaxial cable connected to a Bruel & Kjaer 4188 microphone on a 2695 CCLD preamplifier.

But how does it sound?

Noisy—if there’s no accelerometer connected. I think this is the noise of the 30 V Zener, though thankfully this disappears immediately when a sensor is connected, since the Zener no longer conducts.

With the accelerometer connected, the noise disappears and we get a clear signal.

Many balanced systems achieve this by sending equal and opposite signals down each pin, so the difference between the two is two times the signal. This is common enough that many people treat it as the definition of a balanced signal; but for rejecting noise, all that matters is that the impedances to ground are equal.↩
In principle this BNC connector is incorrect: accelerometer manufacturers usually use cable with 50Ω characteristic impedance, but this connector is 75Ω impedance. Neutrik makes 75Ω connectors because they are widely used for digital video (SDI) and audio (AES3). But at typical audio frequencies, the impedance mismatch does not matter; the cable is not a transmission line and there will be negligible effect on the signal.↩

Powering accelerometers for fun, profit, and field recording

Alex Reinhart — Sat, 12 Feb 2022 00:00:00 UT

It’s a common problem that I’m sure many of you have experienced: You find yourself with a PCB 352C68 accelerometer but no ICP signal conditioner to connect it to your recorder.

No?

Let me start over. Suppose you want to know how something sounds on the inside, by listening to the vibrations transmitted through its structure. Like this:

Be sure to watch the video with its sound. That’s not a microphone; it’s an industrial accelerometer that measures extremely fine vibrations, like a contact microphone.

Unfortunately, industrial accelerometers are designed to be connected to industrial data acquisition systems that cost many thousands of dollars, not to cheap audio recorders, and so I needed an adapter. Why? Partly to listen to watermelons, but more on that later.

A word on recording vibration

Recording vibration is common enough. Many acoustic guitars come with a “piezo pickup” that converts vibrations in the guitar’s body into signals for your amplifier or recorder, so you don’t need to use a microphone that might pick up background noises. You can use the same principle for many acoustic instruments, and professional sound effects designers have used this trick for years. If you want to record an unusual sound without any background noise, why not record the vibrations directly from the source?

In this application, sound designers and field recordists use what they call “contact microphones”. Contact microphones are made from a piezoelectric crystal sandwiched between pieces of metal:

A piezoelectric disk buzzer, such as you might turn into a DIY contact microphone. Photo by Gophi, CC BY-SA 3.0, via Wikimedia Commons.

A piezoelectric crystal produces a voltage when it is compressed. If you place it on an object that vibrates, the vibrations push on the crystal, and the crystal produces a voltage you can record just like the voltage from a microphone. You can make a DIY contact microphone using a cheap piezoelectric buzzer, or order premade versions, such as from Jez riley French or Cortado. Creative field recordists place their contact microphones on just about anything, discovering strange noises from things as mundane as chain link fences, suspension bridges, dead trees, and ice cracking on a frozen lake.

When a PCB 352C68 meets a watermelon

Vibration recording is also very common in science and industry. Consider some examples:

Your job is to ensure new car engines meet legal noise requirements and don’t rattle, shake, or vibrate in ways that wear out parts or irritate drivers. You need to measure how they vibrate.
You work in a facility full of heavy machinery with motors, gears, and pumps. If a machine unexpectedly fails, the company loses lots of money. You want to measure vibration in the motors and bearings so you can tell if one is wearing out and needs to be replaced.
You’re designing a lightweight structure for a vehicle. You want to know how that structure will flex and vibrate when it hits bumps and potholes.

These scenarios happen every day in engineering and industrial firms, and so specialized tools exist to measure all kinds of vibration. These are generally called “accelerometers.” Unlike the accelerometer in your phone that measures which direction is up, they are designed to measure vibration, often up to tens of thousands of Hertz, and they do so very precisely. Most accelerometers use piezoelectric crystals, just like in contact microphones, but in enclosed metal housings. A good accelerometer comes with a calibration certificate that states exactly how many millivolts of signal it produces for every g of acceleration it experiences, and a chart of how that changes with vibration frequency. A good accelerometer will withstand large vibrations in tough environments filled with heat, oil, and solvents, and will cost you hundreds or thousands of dollars.

Coincidentally, my father falls in category #1 above. For many years, he was a Noise, Vibration, and Harshness (NVH) engineer in the automotive industry. He used accelerometers and measurement microphones routinely in his work, and he still has an uncanny ability to identify engine problems by their sound.

Naturally, when I was a kid – and I’m sure you’re thinking the same thing already – all we wanted him to do was to use his technical abilities to tell us which watermelons are ripe.

Folk wisdom holds that you can knock on watermelons to judge their ripeness. A good watermelon sounds hollow and resonant, not dead or muted. Presumably this has something to do with the stiffness of the flesh, the amount of water it contains, and maybe some chemical changes in its structure as it matures. And if your dad has expensive measurement equipment, why not use it to measure the ideal watermelon?

So that’s what we did. One summer in high school, I was part of the Young Engineers and Scientists program at the Southwest Research Institute, and part of the program asked us to develop our own science project to run over the summer. I chose to study watermelons, and asked Dad for some technical support. He turned to PCB Piezotronics and asked if they might have an accelerometer that could be used for a science project – maybe one that was slightly out of spec or had been returned. To our surprise, we soon received a spotless model 352C68, and off we went.

The experiment was simple. Dad brought home a high-end laboratory data acquisition system and hooked it to his laptop, while Mom selected eight watermelons for test. We strapped the accelerometer to one side of the watermelon and gently tapped the other side with a small wrench. The result was eight recordings with surprising variety. Just listen to how different Melon A is from Melon C:

Melon A:

Melon C:

Then Dad invited over some coworkers for a party, and everyone tasted the watermelons and gave them ratings. Using professional acoustics software, we extracted various features from the signals, such as resonant frequencies and reverberation times, and I plugged the numbers into Excel in the hopes of producing a prediction system.

Unfortunately, with only eight melons it’s hard to build a sophisticated model. (Also I was in high school, had never taken a statistics class, and had only a fuzzy understanding of what regression does.) We did note that the mushy and overripe melons were easy to distinguish from the rest, because the mushy ones had no reverb and the knocks damped very quickly. But that’s as far as we got. We considered collecting more data, but eventually I left for college and the accelerometer sat in the closet.¹

The basics of ICP signal conditioning

Recently I found the accelerometer again and wanted to try it out. But I hit a snag: Industrial accelerometers are not like microphones or headphones and don’t plug into ordinary recorders or amplifiers you might use for podcasting or music. They use a system variously called ICP, IEPE, DeltaTron, CCLD, or IsoTron by different manufacturers. The short version is this: Signals from piezoelectric crystals have high impedance, so when they are plugged into a typical amplifier or audio recorder, they sound quite tinny. You need to use a specific type of preamplifier that can convert the output to a low-impedance signal. ICP accelerometers have these preamplifiers built in, but to make those electronics work, they require a power source. To provide that power and also transmit the signal, they use a coaxial cable.

The center conductor of the cable carries the power, from a constant-current supply usually set at 4 mA and 18-30 V. The shield acts as ground. The accelerometer’s integrated electronics consume the 4 mA supply, and the voltage across the accelerometer varies in proportion to the amount of acceleration it experiences – record the voltage and you have the signal. In the case of the PCB 352C68, measurements of ±50 gs of acceleration become ±5 volts. So whatever device you plug the accelerometer into must have the ability to supply the right power and then extract out the signal.

(Professional condenser microphones also have integrated electronics, and receive 48 volt “phantom power” over the same cable that conducts the microphone signal. But the details are different, and you can’t just hook an ICP accelerometer to phantom power.²)

Because industrial accelerometers output a precisely calibrated voltage signal, they are typically plugged into precisely calibrated measurement equipment. Much of this equipment can provide the necessary power. If not, you can always buy a “signal conditioner”: plug it between your accelerometer and your recording device and everything will work great.

Except there are two problems:

PCB’s cheapest battery-powered signal conditioner is still $250.
Signal conditioners are designed to be plugged into specialized test equipment, not commodity recording equipment. They come with BNC connectors for coaxial cables to be plugged into industrial data acquisition hardware, not with standard audio connectors (like phone jacks or XLR cables).

I’m just a guy who wants to record funny noises. I don’t care about calibration, and I don’t want to pay hundreds of dollars for a signal conditioner and thousands more for a special data acquisition system.

Back in high school I built a simple signal conditioner, in anticipation of further watermelon experiments. It was based on a circuit design from an expert in the field, shoehorned into a small plastic box from Radio Shack. It required three nine-volt batteries so it could provide 27 volts, but that made it bulky; instead of a constant-current supply, it used a resistor to limit the total current; and the only output was a 3.5mm audio jack. I wanted to create a new version using a better circuit design, a shielded metal enclosure, and more convenient output jacks for connection to recording equipment.

Fortunately the pandemic has supplied limitless boredom. So how can we make this problem more complicated while solving it?

Electronics for a DIY ICP supply

The first problem was understanding the circuit I needed to build so I could buy the right parts. Unfortunately, it turns out that very few people have made their own ICP power supplies, or if they have, they haven’t posted it on the Internet. (I’m sure many technicians and engineers have built or repaired them in the course of their work.) The only detailed example I found was Project 134 from Rod Elliott of Elliott Sound Products, who follows the “Why buy a part when I can build it myself?” design philosophy. His version requires wall power, a transformer, and electronics to produce a 24 V DC supply, but I wanted something battery powered and simpler.

After stumbling around, I discovered a PCB manual that provides a handy diagram of their ideal circuit:

From PCB’s General Signal Conditioning Guide.

The description states:

The signal conditioner consists of a well-regulated 18 to 30 VDC source (battery or line-powered), a current-regulating diode (or equivalent constant current circuit), and a capacitor for decoupling (removing the bias voltage) the signal…. The current-regulating diode is used instead of a resistor for several reasons. The very high dynamic resistance of the diode yields a source follower gain which is extremely close to unity and independent of input voltage. Also, the diode can be changed to supply higher currents for driving long cable lengths. Constant current diodes, as shown in Figure 8, are used in all of PCB’s battery powered signal conditioners.

Current-regulating diodes turn about six different components from Project 134 into a four parts you can order for 97 cents each from Mouser. (Four because each current-regulating diode supplies 1 mA.) My previous rat’s nest of electronics was the resistor-based design PCB warns against, so I started sketching out a new design:

The Accelerominator 3000 ICP power supply.

Some notes on the components used:

The power passes through a Traco TRN 1-0515, a regulated DC-DC converter that boosts 9 volts to 24 volts. It’s a switch-mode power supply rated for a minimum 100 kHz switching frequency and 45 milliamps of output. It’s nice because it’s tiny and it’s rated for a suitably low power; I tried using a Pololu adjustable supply that was rated for multiple amps, but I was drawing so little power that the switching supply would skip pulses periodically. Unfortunately the frequency at which it skipped pulses was in the audible range, so this manifested as high-frequency noise in the audio signal.
R3 is not strictly necessary, but allows smartphones to recognize input from the accelerometer as a microphone. Smartphones use TRRS connectors, which are 3.5mm jacks that support stereo output and mono microphone input; to detect whether you’ve plugged in ordinary headphones or a headset with microphone, your phone senses the resistance between one of the rings and the sleeve. Manufacturers don’t supply specifications for the exact resistance needed, but I read that 1.6 kΩ worked, and I had a 1.5kΩ resistor handy. I’m not using this feature in the current circuit as I haven’t built a TRRS jack into the case, but I could easily add the jack if I find it useful.
Capacitor C1 provides AC coupling between the input and the output, ensuring that the 24 volts supplied to the accelerometer is not supplied to the phone or computer recording the signal. Together with R3, it forms a high-pass filter with a shoulder at about 10 Hz.
The diagram shows a power switch, but I didn’t want a switch poking out of the enclosure that I could bump accidentally. Instead, I used a Switchcraft 113X panel-mount 1/4" TS jack for the audio output, which has a normally open isolated terminal that closes when a plug is inserted. The circuit is hence powered whenever something is plugged into the output jack.
Diodes D6 and D7 are 7.5 V zeners. When the BNC jack is disconnected or the power is turned off, the positive side of C1 can suddenly drop from 12 to 24V down to zero, causing a voltage spike on the output. The diodes clamp this spike to roughly 7.5 V. Most ICP devices can only provide ±7 V of output, so this should never clip a real signal.

I assembled the components on a Perma-Proto breadboard and crammed it, with a 9V battery, into a small extruded aluminum enclosure from Hammond. The prototype looks great:³

The Accelerominator 3000 in its enclosure.

I use a short TS cable to connect the output to a Tascam DR-40X audio recorder, which is able to treat TS inputs as either mic-level or line-level. (I would have used XLR output instead, but I couldn’t find a cheap panel-mount XLR jack that would fit in my enclosure easily.)

Field recording with an accelerometer

To record interesting sounds, I needed reliable ways to mount the accelerometer onto surfaces. Its end has an integral threaded stud, but the threads are #5-40 UNC, and it turns out nobody uses that thread size in products – by convention, it seems everyone just uses the even-numbered sizes like #4-40 and #6-32. For instance, many vendors sell threaded magnets that could be useful to mount the accelerometer onto metal surface, but none sold them with #5-40 threads. Instead I built my own using a #5-40 washer, a magnet, and Gorilla Glue.

The accelerometer also came with a small box of what PCB calls “petro wax”, which I assume is simply paraffin. This is meant to be used to stick the accelerometer onto a surface temporarily. Because the wax is soft, it attenuates some of the higher vibration frequencies, but it can work for non-metallic surfaces.

(In industrial settings, one usually attaches accelerometers more robustly: either drill and tap a hole into the item being tested, so the accelerometer can be screwed right in, or glue the accelerometer to the surface with cyanoacrylate glue or epoxy. Once the test is done, the glue can be removed with a suitable solvent and the accelerometer carefully pried off the surface. Because the glue is extremely hard, it dampens vibrations very little.)

In any case, how does it sound?

First, here’s the winding and shutter mechanism of my Canonet G-III QL17 film camera (dating from the early 1970s):

Canonet QL17 shutter, via accelerometer:

The Canonet is all-mechanical – no batteries necessary for basic operation. You can hear the winding ratchet mechanism, the shutter firing, and the clockwork mechanism that times the shutter. (In between the two shots, I adjusted the shutter speed to 1/4 second.) It’s like being inside the camera.

For this second test, I wax-mounted the accelerometer on my Unicomp mechanical keyboard, successor to the famous (and famously loud) IBM Model M. Here’s the result of a quick typing test:

Model M typing, via accelerometer:

Bonus round: Measurement microphones

Accelerometers are not the only sensors that use the ICP system. Measurement microphones, such as those made by PCB or by Brüel & Kjær, often use ICP because their users already have ICP data acquisition equipment.

Measurement microphones are an interesting category. They’re designed for engineering and scientific applications: they are robustly built, have extremely flat frequency responses over their useful range, and are designed to be extremely stable, meaning they output a very consistent voltage for a certain sound pressure level. This allows them to be used for precise measurements, and because they can be calibrated to measure sound pressure level accurately, they can be used in noise measurements in applications where there’s a legal noise limit on a product or workplace.

Measurement microphones are almost invariably small-diaphragm condenser microphones, just like the pencil microphones you might use in music recording. (But unlike the large-diaphragm condensers you often see used for vocals, which usually do not have flat frequency responses – a recording engineer chooses the microphone whose tone best suits the singer’s voice.) Just like good-quality studio microphones, they are expensive to buy new. And by “expensive”, I mean they don’t even have a price listed, only a form to request a quote.

But through the magic of eBay, I found a Brüel & Kjær Type 4188 microphone capsule mated to a Type 2695 preamplifier for just $240, including shipping. That’s much lower than you’d typically find either listed for individually. (The same seller lists another Type 2695 for $800!) This deal was slightly undermined by needing to spend $50 on a 12-foot cable with the right connectors (BNC and 10-32 microdot) to connect to my ICP power supply, but I still think I came out ahead. The microphone is astonishingly tiny, just 1/2" (12mm) in diameter and 1.5" (40mm) long, but more sensitive and accurate than all but the most expensive studio microphones.⁴

Here’s what the same Model M keyboard sounds like, with the microphone perched a few inches above the keyboard on my water bottle, because I don’t own a microphone stand:

Model M typing, via microphone:

Concluding thoughts

Now that I have the equipment to use them, I’m not sure what to use the microphone or accelerometer for. Maybe there’s a niche for ASMR for mechanical keyboard enthusiasts.

I do think it’s interesting that the field recording community has found uses for piezoelectric contact microphones, and even more esoteric instruments like geophones, but I’m not aware of anyone using an industrial accelerometer for any similar recording. Possibly this is because nobody knows they exist; also their cost is likely prohibitive for anyone who just wants to tinker with sound. But used ICP accelerometers are available on eBay regularly for not-outrageous prices, and the ICP power supply above requires about the same skill to build as a good contact microphone preamplifier. If you’re feeling adventurous and don’t mind building your own electronics, maybe it’s worth a try.⁵

I hope to explore more interesting noises this year when the weather warms up. I’m sure Pittsburgh’s many metal structures and bridges make interesting vibrations, though I’d prefer they not get too interesting while I’m standing on the bridge.

There has been real scientific research about acoustic watermelon ripeness detection, though some of it seems… dubious. See this summary for references. I told this story, including the dubious research results, in the second statistics class I ever taught – an intro course during the summer 2014 session. At the end of the semester, the students brought me a gift: several miniature watermelons. It was touching, except I couldn’t figure out how to carry them all back to my office before the next instructor arrived in the classroom.↩
ACO Pacific does make the IEPE 1248 phantom power adapter system, but I’m afraid to ask how expensive it is.↩
Special thanks to my friend Alex, who owns one of every power tool and let me use his drill press and step bits to make holes for the audio jacks.↩
That’s not to say it’s better than most studio microphones. Studio microphones are usually directional, so they can record a singer or instrument without picking up other noises from the room, echos from the studio walls, or crowd noise on-stage. Measurement microphones are omnidirectional, so isolating your subject is challenging unless you have a soundproof anechoic chamber. Sadly I could not find one of those on eBay for $240.↩
Be sure to get an ICP accelerometer, not a charge-mode or charge-output accelerometer, which requires a different preamplifier. You’ll see many triaxial models that measure outputs in three directions, and hence have three outputs and need three signal conditioners; but a single-axis model should be plenty. Mine has a sensitivity of 100 mV/g, and some are available with even higher sensitivities. Be aware that some have very low sensitivity, such as 2.5 mV/g; these won’t be useful for recording small and quiet vibrations, but since they can withstand extremely high accelerations, they might be useful if you want to record something extremely violent.↩

Using statistics and cognitive science to understand how students learn statistics

Alex Reinhart — Thu, 06 Feb 2020 00:00:00 UT

On November 18, 2019, I gave a department “internal seminar” on the TeachStat Research Group’s work on statistics education research. The internal seminars are intended for faculty in Statistics & Data Science here at CMU to get to know each other’s work better, and to interact with graduate students and faculty from other departments who may also be interested. This is not a transcript, but, as Sir Humphrey would say, “a version which represents my views as I would, on reflection, have liked them to emerge”.

For the past two years, I’ve been working with a group of Ph.D. students and faculty here at CMU on new projects in statistics education research. There have been quite a few people involved:

The TeachStat Research Group team. Top, from left: me, Philipp Burckhardt, Peter Elliott, Ciaran Evans, Amanda Luby, Mikaela Meyer, and Josue Orellana. Bottom: Ron Yurko, Gordon Weinberg, Jerzy Wieczorek, and Rebeecca Nugent. Credit also to Sangwon Hyun, Kevin Lin, and Christopher Peter Makris for their support.

So what’s the problem we’re trying to solve? It’s a big, long-term problem. It’s one you’ve probably all seen when you teach your classes. There is simply a massive gap between how we think, as statisticians with experience analyzing data, and how our students think, as novices seeing data for the first time. And that gap doesn’t shrink nearly as much as we would like it to during a course.

This manifests in several ways. You might have had the experience of teaching an upper-level class and referring to some concept that the students should already know—only to find that they don’t seem to understand it. You might have taught an introductory class and seen that no matter how carefully you explain the meaning of hypothesis tests, the students still misinterpret them. You might have supervised students conducting a course project and seen that they lack intuition about data or the judgment to decide what to try or how to interpret their results.

But it’s not just students. Practicing scientists are notoriously bad at the same things: they misinterpret hypothesis tests, misunderstand regression assumptions, overinterpret small effects, read signal into noise… There’s a lot of research on how scientists understand statistical evidence, and there’s also a lot of research on the prevalence of misinterpreted statistical evidence in science. So much research, in fact, that one can even write a book about it all:

A shameless plug.

We’ve been pursuing two main projects to address this problem.

First, we’ve been examining and measuring student thinking in introductory statistics courses, using think-aloud interviews and assessments to explore their learning.

Second, we’ve been comparing how students and experts think about problems in a sophomore-level mathematical statistics course. Using think-aloud interviews and cognitive task analysis, we’re exploring the skills involved in solving these problems and exploring which skills are preventing students from succeeding.

We’re planning to use this research to build teaching experiments for our courses: our data will show what concepts students struggle with most, and our assessments will let us measure the effects of any new teaching strategy we try.

Student learning in introductory courses

This section summarizes work we presented in a recent preprint. Read it for more detail!

Let’s start with introductory statistics courses, where students seem to graduate our courses still holding many misconceptions about p values, sampling distributions, experiments and surveys, and all the other core concepts we aim to teach. (For a survey of the literature, check out my notebook on statistical misconceptions.)

Statistics education researchers have tried to quantify the size of the problem. They have developed many concept inventories: multiple-choice standardized assessments that you can give to students at the beginning and end of an introductory statistics course, to measure how much they learn. While we can quibble with the details of these inventories and how they’re constructed, one thing is consistent: pre-test scores are low, and post-test scores aren’t much better. There’s just not much evidence that students are learning the concepts we intend them to in the intro class.

For example, the Comprehensive Assessment of Outcomes in Statistics is the most popular and most thoroughly vetted standardized assessment of student statistical understanding. Large national datasets find average pre-test scores of about 45%, showing that students entering statistics courses have much to learn—but they also find average post-test scores of about 55%, showing that we have much to learn about teaching more effectively.¹

There’s a lot of ongoing work on new teaching strategies to improve this. For example, simulation-based inference is a hot topic right now: rather than teaching statistics through formulas and mathematics, simulation-based inference courses teach the concepts of variation and testing through computer simulations. Instead of learning t tests and using tables of distributions, students simulate permutation tests or do a bootstrap. This seems conceptually simpler: they can see directly what they’re doing, they can see that permutation really is making the null hypothesis true by making the two groups equivalent… but the effects are small. At best we’re getting a few more percentage points.²

Now, it’s worth asking why this is not a solved problem. After all, we’ve been teaching statistics for a century. Statistics education has gone through many revolutions, as computers enter classrooms and students get to practice hands-on with real data. And, of course, we all learn from experience. We adjust our courses based on what students got wrong last year. We introduce new demonstrations and simulations for important concepts. We write new homework and new projects on the topics we think students are missing. And yet here we are.

In fact, during my reading of the statistics education research literature, I have not encountered any experiments showing dramatically improved student learning in an entire course. I just read a Journal of Statistics Education article titled “Do Hands-on Activities Increase Student Understanding?,” and the results indeed confirmed Betteridge’s Law of Headlines.³ The same has been happening with simulation-based inference research; the new course designs and new textbooks seem to have only small effects on student learning, and the research is focused on finding exactly which students see a small benefit and which do not.

So why aren’t we making progress? In my view, a key obstacle is our approach. Tinkering with our courses based on intuition, fashion, or frustration is just going to make small steps toward a local optimum. Instead, we need a sound understanding of how our students really think and how they really learn, so we can build new teaching strategies based on systematic knowledge about student learning.

And, of course, we are statisticians. Why do we not teach like statisticians? We need to find ways to model how students learn, so our interventions can be designed based on a formal model of learning. We need to find ways to measure what students learn, so our proposed models can be calibrated and our interventions can be tested empirically. And since we’re statisticians, we need to base our models on real data, and ensure that real data means what we say it does and suits the purposes we want to use it for.

Measuring learning in our introductory course

So let’s start with the foundation: measuring student learning. Later I’ll talk about the steps we’re taking to model it, but first we need a foundation of real data.

Our research project began back in summer 2017. In the spring we held a reading group reviewing statistics education research, and we were interested in pursuing education research in the fall. My email over the summer started with somewhat modest goals:

Any ideas on what our goals should be? We had talked about developing an assessment to replace CAOS and judge whether the new statistical reasoning course is working; other potential topics would be more on psychology and pedagogy, different theories of learning, or writing fan fiction about Galton and Gandhi teaming up. (That was Rebecca’s suggestion.)

Our reasoning, however, was somewhat immodest. There are, of course, pre-existing assessments designed to measure if students are learning concepts in introductory statistics. One is the Comprehensive Assessment of Outcomes in Statistics, as I mentioned above. We weren’t fans of these assessments, which we thought focused on the wrong topics and emphasized memorization of definitions and results over understanding of concepts. We wanted something to measure conceptual understanding of the specific concepts we wanted to teach in our introductory course.

We figured that as statisticians, it can’t be too hard to write some conceptual questions and collect data on student answers. How naive we were.

We soon discovered that writing questions is hard. There’s a reason CAOS questions seemed to focus on remembering definitions: it is very hard to write a question that measures conceptual understanding of basic concepts. It is very hard students to reason about statistical questions and reach conclusions, without relying on them remembering specific terms and specific definitions.

Think-aloud interviews to the rescue

Fortunately, we can borrow ideas from other areas of education research. In other STEM fields, particularly physics, there is a long history of developing deeply conceptual concept inventories to get at how students think and reason about new situations. This work has been going on since at least the 1990s, and has led to some remarkable results.

The method we adopted was described best by Wendy Adams and Carl Wieman in a paper describing how to build assessments to “measure learning of expert-like thinking”.⁴ They describe an iterative method:

Experts write questions they believe will measure the intended concepts.
In private interviews, volunteer students answer the draft questions while thinking aloud, so the experts understand how the students approach the questions.
The experts use this data to write new distractor answer choices, improve confusing question wording, write new questions about unexpected misconceptions, and so on.
Even more think-aloud interviews with students validate that the final question versions measure the thinking they are intended to measure.

The method that Adams and Wieman advocated for was actually developed here at CMU, by Herb Simon and Anders Ericsson, back in the late 1970s and early 1980s. As part of their work in artificial intelligence and expert systems, they were interested in learning how human experts reason about complex problems. They developed what they called verbal protocol analysis, and what we now call “think-aloud interviews”, and used these interviews to figure out how an AI system should be designed to think like an expert. A think-aloud interview gets us the closest thing to actual student thinking that we can get without full brain-computer interfaces.⁵

In a think-aloud interview, a student—just one student, in a private setting like a conference room—is given one conceptual question to answer. They read it aloud, then say everything they are thinking as they try to find the answer. The interviewer just watches. The interviewer doesn’t prompt the student, offer help, ask questions, or anything else. The goal is to get the student’s authentic thinking, as they would have thought it while answering a question on their own, without the influence from an experimenter making them reflect on decisions or suggesting different strategies. (For practical details and further literature, see my notebook on think-alouds.)

We’ve now completed 47 hour-long think-aloud interviews with students at CMU and at Colby College, covering more than 50 conceptual questions. The results of these interviews are quite dramatic. We found that students frequently misinterpreted questions that we thought were perfectly clear, because students understood wording differently than we did. We found some questions that students could solve entirely through elimination or by using heuristics like “it’s never the answer with ‘always’ in it.” We found that some questions measured completely different thinking than we expected to measure.

As a quick example, we had a question about sampling distributions that we thought was quite good, until think-aloud interviews revealed that most students got it wrong—and got it wrong because they did not understand how to read histograms. Only after we rephrased the question to eliminate these misunderstandings could we trust that students get it wrong because of reasons related to their understanding of sampling distributions, and not of histograms. (For the full story, check out our eCOTS 2018 video poster.)

This process gave us confidence in our questions. Having seen real students answer the questions, and having heard everything they said while answering them, we now know why they are answering those questions and how they are interpreting them. We eliminated or fixed many questions that turned out to be confusing or elicited student reasoning unrelated to the concepts we intended to measure.

Assessment data

Along with the interviews, we’ve been collecting quantitative data from students in introductory courses at three different institutions. We started collecting data at Carnegie Mellon and at Colby College in spring 2019, extending to the College of Wooster in fall 2019. We’ve also collected data from students in 36-202, the second-semester statistics class in our department, to see what they retained after the first course. We hope to expand to more institutions in subsequent semesters.

Students complete the pre- and post-test online, through ISLE, the Integrated Statistics Learning Environment. ISLE presents them with 30 questions in random order and records the data; students get course credit for completing the assessment as part of their first homework and as part of their last homework. They can, of course, opt out of having their data used for research purposes.

Correlation and causation

The data is useful, and helps us measure changes in the course. But we also found that the access to student thinking we get in think-aloud interviews gave us much deeper insights into student learning.

Consider this question:

A survey of Californians found a statistically significant positive correlation between number of books read and nearsightedness.

Which of the following can we conclude about Californians?

A. Reading books causes an increased risk of being nearsighted.

B. Being nearsighted causes people to read more books.

C. We cannot determine which factor causes the other, because correlation does not imply causation.

D. We cannot draw any conclusions because Californians aren’t a random sample of people.

We wanted to test how students understand correlation and causation. We knew, from teaching experience, that introductory students struggle to understand the purpose of randomized experiments and how that differs from random sampling. We also know that people tend to be too willing to draw causal conclusions. So this question was intended to test that understanding. In interviews, and in later data we collected in class, students did quite well on this question. Great, you might think—they get it! But slow down.

Here’s another question we tested in think-aloud interviews.

A clinical trial randomly assigned subjects to receive either vitamin C or a placebo as a treatment for a cold. The trial found a statistically significant negative correlation between vitamin C dose and the duration of cold symptoms.

Which of the following can we conclude?

Here, correlation is causation. Because the subjects were randomly assigned, there can be no systematic confounding. Students in interviews answered correctly, and you might assume this means they understand the purpose of randomization—but it doesn’t!

Instead, we observed several students who strongly believed that correlation does not equal causation, and made statements to that effect while answering the question, but then said that this specific case “makes sense” and chose the answer that a causal conclusion can be drawn. One student specifically said that you “usually can’t assume causation,” then said it’s just correlation, then chose the causal answer. Perhaps, we thought, our students were getting the right answer simply because they already believed that vitamin C would help a cold, not because they were applying statistical reasoning.

To test this hypothesis, we tried a new version of the question. We replaced vitamin C, which makes sense as something that might cure colds,⁶ with mindfulness meditation. We thought this would avoid the reasoning that the treatment “makes sense,” and would better test whether students really understand the purpose of random assignment:

A clinical trial randomly assigned subjects to either practice mindfulness meditation or a placebo relaxation exercise as a treatment for a cold. The trial found that subjects who practiced mindfulness meditation had a shorter time to recovery than students assigned to the relaxation exercise, and the result was statistically significant.

Which conclusion does this support?

The results were dramatically different. Students now insisted that “causes” is too strong a word, and voiced doubts about the possibility of ever proving a causal claim. A couple of choice quotes:

“I think the word ‘causes’ is too strong… my friend who’s a stats major always tells me you can’t say this causes that—there’s always other factors”
“Usually [you] can’t assume causation”

7 of the 11 students who answered this version of the question in think-aloud interviews got it wrong, answering that correlation is not causation.

We also administered these questions to most students in the class, at the end of the semester. They seem to consistently prefer to deny causal relationships: most students correctly said that in the books question, correlation is not causation, but also said that in the meditation question, correlation is not causation. A smaller proportion recognized that in the meditation question, correlation really is causation.

	books wrong	books right
meditation wrong	10	67
meditation right	9	50

That’s exactly the kind of insight we gain by doing interviews instead of just handing out questions. We learned that students were using different reasoning than we expected—they were picking answers because they already believed in the use of Vitamin C—and were able to write new questions to give us a deeper understanding of their reasoning. We now have a small piece of the puzzle in figuring out how to better teach these important concepts.

The results above were from the post test, but the pre-test results were similar, suggesting something else very interesting: that students enter the course already possessing these misconceptions.

Indeed it did not. (source)

So where does that leave us? It appears that students are a bit like Cueball in this xkcd cartoon: they don’t think correlation implies causation, but that doesn’t mean the class helped. Crucially, they don’t understand when correlation does imply causation. Students seem to enter the class with misconceptions, so the problem may not be that we are teaching them bad habits, but that we are failing to break them.

Persistent misconceptions

This gets to a key reason why think-aloud interviews are so important.

Instructors often subscribe to what I call the “empty vessel theory” of teaching. Students begin the semester as empty vessels; as instructors, our job is to carefully distill all the important concepts of the course and carefully pour them into the empty vessel. As long as we use only high-quality ingredients and mix the explanations well, the students will leave the course filled with knowledge.

But, of course, that is not how it works. Students enter our courses with many beliefs about the world and how it works. When they hear our explanations, they do not simply accept them at face value; they already know things, or think they know things, and try to fit our explanations into their existing mental models.

There has been a lot of research on this problem in other fields. Notably, in physics, researchers since the 1980s have explored the numerous misconceptions and prior beliefs students hold about basic Newtonian mechanics. Students enter introductory physics courses having seen that objects in motion do not stay in motion, and with colloquial understandings of terms like “force,” “energy,” “momentum,” and so on.⁷

And because students “know” these things, they misinterpret our explanations of those same concepts in class. Our explanations do not work because they understand our words differently than we do. Our explanations do not produce a clear understanding in the students because they fit the explanations together with their prior “knowledge” and produce a disjointed, inconsistent understanding.

We have every reason to believe something similar is happening in statistics education. Our students in interviews applied their prior knowledge to override whatever we might have taught about randomized experiments and causality. We have also seen students in interviews appearing to hold multiple mutually incompatible ideas on how to approach problems, switching from one to the other whenever the first doesn’t seem to work—without reflection on how they may be inconsistent.

Fortunately, the education research literature also suggests that this problem can be tackled. It can be tackled when instructors are aware of the misconceptions and approach them head-on, and particularly when instructors use “interactive engagement” teaching strategies that ask students to answer questions and make predictions in class, discussing their predictions with their peers and getting immediate in-class feedback when they are wrong. Extensive data shows these strategies can double or triple student learning.⁸ I suspect much of these gains come because students are forced to integrate what they learn in class with their prior beliefs immediately, as they sit in the classroom, instead of passively nodding and accepting a lecture without serious thought.

We hope that think-aloud interviews, combined with our assessment data, can make it possible to craft introductory statistics courses that achieve similar success. But first it will take much more research to fully understand how students think.

Preliminary aggregate results

From spring, summer, and fall 2019, we have some preliminary results. These are from giving our assessment questions to our existing classes, taught with their usual teaching strategies. You can see here that the mean scores from pre- to post-test do not change dramatically, although there is a sampling bias—some students are not motivated to get a few points at the end of the semester, apparently, so response rates are lower at the end of the semester—that makes strong conclusions hard to draw. Fortunately we are working on addressing that bias in future semesters, and we can match students between pre- and post-test to conduct matched pairs analyses.

Section	Pre-test mean	Post-test mean
CMU 200	46.5%	53.0%
Colby SC 212	51.1%	59.9%
CMU 202		57.0%

The small learning gains don’t exactly fill us with hope. But they are not anomalous: prior research using CAOS and other conceptual assessments, flawed though they may be, has found similar results. These results us just how far we have to go.

(Our questions and our full assessment results are available as part of the supplemental materials to our recent preprint describing this work.)

Beyond the introductory course

But while this gives us 50-odd questions that we are confident in, it’s only the surface. We teach more than just introductory courses. Many of you have taught or TAed more advanced classes, and you see students struggle there too. I’ve heard lots of anecdotes about students in advanced classes who still don’t seem to understand small p values make us reject hypothesis tests, or students who seem to have forgotten everything you swear they must have been taught in prerequisite courses. And we complain that some students seem to want to learn by rote, memorizing procedures without ever developing the statistical thinking they need to succeed. I’ve heard plenty of stories about students who just want to know if their analysis is “right” instead of thinking about what it achieves and how it compares to other methods.

My thesis is that something is wrong with our understanding of how students develop statistical expertise. We’re expecting them to integrate many facts and skills together, then to develop the expertise to know which facts and skills are appropriate to apply to a given problem. How can we explore how students are doing this?

Cognitive Task Analysis

One tool is Cognitive Task Analysis. This sounds like a buzzword, but it’s actually just describing something fairly straightforward.

Cognitive Task Analysis models problem-solving as a series of small, discrete steps. Students go from the problem statement to the solution using many specific steps, and we can use interviews with expert problem-solvers to identify the steps experts take. Then, in think-aloud interviews with students, we can learn which steps they take and establish which steps most often lead to errors and mistakes.

Cognitive task analysis has been widely applied for many purposes. Research suggests explicit cognitive task analysis can help better structure teaching, because experts are usually quite bad at identifying the steps they take to solve problems: if asked how they solved a problem, experts often give explanations that are incomplete or outright contradict the result they actually obtained. The evidence suggests that if instructors use explicit cognitive task analyses to guide the skills they teach in a course, student learning can be significantly improved.⁹ (For further literature, and ways to assess cognitive tasks quantitatively, check out my notebook.)

Also, cognitive task analysis has already been applied to statistics. Marsha Lovett used it to explore how students analyze data in the introductory course—actually, in our introductory course, back around 2000.¹⁰ Think-aloud interviews allowed her to see the cognitive tasks being applied by students as they analyzed data, and led to a simple teaching intervention targeting one specific cognitive task that then improved their performance on another data analysis problem. Ken Koedinger and Elizabeth McLaughlin later showed that these targeted interventions can work better than just providing extra practice solving problems.¹¹

Exploring cognitive tasks in mathematical statistics

Now, ideally we could build on that work to develop a full cognitive task model of how students analyze data, break down the pieces in great detail, and then examine how all our courses cover the constituent skills as we lead students to graduation. But this is a fairly intimidating ensemble of skills, and data analysis is a very open-ended outcome.

Instead, TeachStat group members Josue Orellana and Mikaela Meyer started with a different problem. They started by considering how students approach simple mathematical statistics problems on topics like conditional and marginal distributions, maximum likelihood, and expectations and variances. They hypothesized there are at least four separate cognitive tasks involved: after reading the question, the student must

identify the relevant variables in the problem, such as which variable we want to maximize or which variables are constant,
select appropriate rules, such as Bayes’ theorem or rules about expectation and variance, that would solve the problem,
match the rules to the variables they must be applied to,
carry out the actual algebra and calculus needed to manipulate the rules and obtain the answer, and finally
report an answer.

To explore if this framework was correct, and to see which specific skills students struggle with, we conducted think-aloud interviews. We recruited volunteers from 36-226, our undergraduate introductory mathematical statistics course. Students in this class have already taken a probability course, and calculus is listed as prerequisite. We also recruited statistics and data science Ph.D. students as experts, so we could contrast student approaches to problem-solving with expert approaches. And we drafted 25 questions on the statistics topics, supplementing these with some questions about basic calculus, so we could locate gaps in mathematical understanding. All questions were checked for consistency with the notation and terminology used in the course.

Eventually we conducted 16 interviews with novices and 8 with experts. All were 60 minutes long, and we recorded all of the answers so they could later be transcribed and coded.

An example: Finding maximum likelihood estimators

Here’s one example. We asked the participants to find the maximum likelihood estimate for the mean of a normal distribution, given n samples:

The log-likelihood for x_1, x_2, \dots, x_n iid samples from a univariate normal distribution is

\log L(\mu, \sigma) = - \frac{n}{2} \log(2 \pi \sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2.

Find \hat \mu_\text{MLE}.

The ideal solution would involve taking the derivative with respect to \mu, setting it to zero, and finding the maximum in terms of x_i. You can probably imagine several ways students could mess this up, based on the cognitive tasks we described. Students might not know which variables are relevant, and take the derivative with respect to the wrong thing. Students might not know the relevant rule, and hence not know they need to take a derivative to maximize. Students might mess up the algebra or calculus as they implement the rule. Our interviews let us isolate these reasons.

In the interviews, we found that as you’d expect, experts immediately knew the procedure and the answer. They commented that the answer must be the sample mean, but they’d take the derivative to be sure.

For students, things were a bit different. One student reported that “I always get weirded out when I have to do the derivative of a sum, like I don’t really know if there’s rules…” We saw this confusion several times across multiple questions and students. Students seem confused by the summation symbol and don’t know how to take derivatives.

But more importantly, several students showed confusion about the relevant variable. Here’s a representative quote from a student who didn’t know what variable to differentiate:

So we just take the derivative of this with respect to… what do you call it, sigma, right? … Or is [it] with respect to sigma, or with respect to mu?

Another student wanted to take the derivative with respect to x.

Only 4 of the 11 students who answered this question in interviews got it right, despite it being essentially the bread and butter of a mathematical statistics course. And it seems several cognitive tasks are implicated. It’s possible, following Koedinger’s experiment that I mentioned earlier, that targeted practice on the specific skill of identifying which variable is relevant might improve student performance substantially. We’re hoping to find out soon.

Unfortunately we have not yet transcribed and coded many of the interviews, so I can’t go into much more depth about what we found. We will be transcribing and coding more questions soon to summarize the cognitive tasks in more detail and, ideally, find an intervention to experiment with.

Oh—and as an aside, one problem we noticed consistently in our interviews was the Greek alphabet. Statistics often uses symbols like \mu, \sigma, \theta, and so on, but we repeatedly saw that students cannot recognize or name these symbols. Can your lectures be successful if students do not know what you’re referring to when you mention \mu or \sigma and point at a derivation on the board?

Conclusions

Let’s review the story so far.

Over the past two years, our group has conducted more than 60 think-aloud interviews with students at the introductory and sophomore level. In the introductory course, this has helped us develop dozens of assessment questions and isolate many misconceptions. We’ve also collected pre- and post-test data for hundreds of students at three separate institutions, leading to a preprint summarizing our results (and the data is available too). Separately, in the sophomore mathematical statistics course we have begun to break down the cognitive tasks involved in doing mathematical statistics problems, so we can see which tasks are most difficult for students.

So what do we plan next?

First, we’d like to use think-aloud interviews and cognitive task analysis to explore some specific topics more deeply. We’re currently working on questions and interviews to explore how students understand correlation and causation, for example, and we’d also like to explore their understanding of populations and sampling distributions. (There is also a surprising amount of confusion about the interpretation of histograms, which should be explored.)

Second, we’d like to use the interviews and psychometric analysis (such as item response theory) to further refine our assessment questions and produce a set of questions that reliably measure specific introductory concepts. We’re not sure that a single validated instrument, like CAOS or other concept inventories, would be as useful as a collection of questions on various topics that we can choose from as needed.

But having the questions will enable our third goal: designing and evaluating new teaching strategies based on our results. Once we thoroughly understand how students misunderstand a statistical concept, and we have assessment questions that reliably measure their understanding, we can revise our teaching and measure whether the new strategy works. And we can base our new teaching strategy on our research and prior results about student learning, rather than on our intuition about what explanations might work best.

We can then work gradually upwards through the statistics curriculum. Many years from now, I would love to be exploring the cognitive tasks involved in open-ended data analysis tasks such as regression reports and consulting projects, since open-ended analysis is the quintessential expert skill requiring “statistical thinking”, practice, and experience. Perhaps, with the aid of technological tools, think-aloud interviews, and careful experiments, we can begin to see how experts approach data analysis and help guide our students to think more like experts.

Acknowledgments

Carnegie Mellon University’s GSA/Provost GuSH Grant funding was used to support portions of this project. Many thanks to all the students who participated in interviews and assessments, making this research possible. Thanks also to Joel Greenhouse and Nynke Niezink for sharing their experiences teaching mathematical statistics, and to David Gerritsen for a tutorial on conducting think-aloud interviews.

R. DelMas, J. Garfield, A. Ooms, and B. Chance, “Assessing students’ conceptual understanding after a first course in statistics,” Statistics Education Research Journal, vol. 6, pp. 28–58, November 2007.↩
B. Chance, J. Wong, and N. Tintle, “Student performance in curricula centered on simulation-based inference: A preliminary report,” Journal of Statistics Education, vol. 24, no. 3, pp. 114–126, 2016.↩
T. J. Pfaff and A. Weinberg, “Do Hands-on Activities Increase Student Understanding?: A Case Study,” Journal of Statistics Education, vol. 17, no. 3, 2009.↩
W. K. Adams and C. E. Wieman, “Development and validation of instruments to measure learning of expert-like thinking,” International Journal of Science Education, vol. 33, no. 9, pp. 1289–1312, 2011.↩
K. A. Ericsson and H. Simon, “Verbal reports as data”. Psychological Review, vol. 87, no. 3, pp. 215–251, 1980.

K. A. Ericsson and H. Simon (1993). Protocol Analysis: Verbal Reports as Data (2nd ed.). Boston: MIT Press.↩
The evidence is not yet conclusive on whether vitamin C can prevent or treat the common cold, surprisingly enough: Hemilä H, Chalker E. “Vitamin C for preventing and treating the common cold”. Cochrane Database of Systematic Reviews 2013, Issue 1. Art. No.: CD000980.↩
I. A. Halloun and D. Hestenes, “The initial knowledge state of college physics students,” American Journal of Physics vol. 53, no. 11, pp. 1043-1048, 1985.

J. Clement, “Students’ preconceptions in introductory mechanics,” American Journal of Physics vol. 50, no. 1, pp. 66-71, 1982.↩
R. R. Hake, “Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses,” American Journal of Physics, vol. 66, pp. 64–74, 1998.

C. Crouch, A. P. Fagen, J. P. Callan, and E. Mazur, “Classroom demonstrations: Learning tools or entertainment?,” American Journal of Physics, vol. 72, pp. 835–838, 2004.

C. Crouch and E. Mazur, “Peer instruction: Ten years of experience and results,” American Journal of Physics, vol. 69, pp. 970–977, 2001.↩
D. F. Feldon, “The implications of research on expertise for curriculum and pedagogy,” Educational Psychology Review, vol. 19, pp. 91–110, 2007.

C. Tofel-Grehl and D. F. Feldon, “Cognitive task analysis–based training,” Journal of Cognitive Engineering and Decision Making, vol. 7, pp. 293–304, 2013.↩
M. Lovett, “A collaborative convergence on studying reasoning processes: A case study in statistics,” in Cognition and Instruction: Twenty-five Years of Progress (S. M. Carver and D. Klahr, eds.), ch. 11, Psychology Press, 2001.↩
Koedinger, K. R., & McLaughlin, E. A. (2016). “Closing the loop with quantitative cognitive task analysis”. In Proceedings of the 9th International Conference on Educational Data Mining (pp. 412–417).↩

Books of 2017

Alex Reinhart — Fri, 02 Feb 2018 00:00:00 UT

Picking up from last year, here are a few brief reviews of particularly interesting books I read in 2017, selected from a total of 45.

The Age of American Unreason, by Susan Jacoby

Tries to be the intellectual successor of Hofstadter’s Anti-Intellectualism in American Life, but doesn’t really succeed. Jacoby roughly moves chronologically, jumping from early America to the late 1800s (and the pseudoscience of social Darwinism), through the days of intellectual Marxism and communism, to the middlebrow culture of the 50s and the rebellion of the 60s, and on to a series of chapters on recent decades.

I found chapter 5, on “middlebrow culture”, particularly interesting. Not having lived through the 50s, I can’t judge the representativeness of Jacoby’s examples, but she portrays a nation of citizens aspiring to learn and approach higher culture, manifested through wide sales of encyclopedias, book-of-the-month clubs, public lecture series, thoroughly researched historical novels (as opposed to anything by Dan Brown), and various popular publishing enterprises reprinting classics and outlines of history. Apparently – and I wish good statistics were available – middlebrow culture was popular and people widely aspired to reading more “sophisticated” works, even if they did so via book clubs featuring a few rungs from the top of literary sophistication.

I see almost no analog of this now. Typical reading for self-improvement involves pop science or psychology works which take a few core ideas and pad them out to three hundred pages, rather than aspirational reading of “serious” (whatever that means) novels. I don’t know anyone (apart from my father) who reads history for the pleasure of understanding. There are, of course, people who take pleasure in the feeling of superiority derived from highbrow culture, but they’re the kind of people featured on /r/iamverysmart; most people don’t see the point of middlebrow aspirations now. Popular respect goes to those with great DIY or technical skills.

I giggled a bit in the final chapter, “Public Life: Defining Dumbness Downward”, as Jacoby commented on Bush’s clear non-intellectualism and expressed hope that the next crop of politicians – the book was published during the 2008 election season, apparently – seemed more willing to read detailed briefings and background material. Alas, Donald J. Trump has arrived.

Some of the pleasure in reading this book derives from looking down on the parts of culture deemed inferior. Jacoby is on the shakiest foundation here, generally making implicit assumptions about what kinds of culture are superior and what people should aspire to read and watch, assuming her audience agrees with her choice of culture instead of justifying it in any depth. I particularly enjoyed one part where she quoted a music critic writing

Lately I’ve been wondering why, as a more than casual Beatles fan, I’m not interested in note-perfect covers by Beatles tribute bands, even though, as a classical music critic, I happily spend my nights listening to re-creations—covers, in a way—of Beethoven symphonies and Haydn string quartets. What, when it comes down to it, is the difference?

The critic goes on to discuss the difference between covers and classical performances. Jacoby, on the other hand, retorts:

What, when it comes down to it, is the difference? It is the difference between Beethoven’s Ninth Symphony (or any Beethoven symphony) and any song or collection of songs by the Beatles. The difference is the infinitely greater emotional richness, technical complexity, and beauty of Beethoven. I too am a Beatles fan, but, let’s face it, if you’ve heard one version of “Sgt. Pepper’s Lonely Hearts Club Band,” you’ve pretty much heard them all.

Way to miss the point! Perhaps there’s a genuine retort to the critic in there somewhere – pop songs aren’t complex enough for covers to have room to differentiate, which any listener to Postmodern Jukebox knows is not true – but mostly Jacoby wants to complain about how inferior modern music is, and assumes we share her feelings. It’s fun snootiness if you agree, but since I’m listening to Steve Martin playing bluegrass banjo right now, maybe I don’t quite agree with her.

Now, maybe there is a good argument she could have made. In classical music, with the score already written, the focus is on the musicianship and interpretation, and we enjoy a performance which demonstrates great skill or puts a piece in a new light. In popular music, the musicianship is often not important at all – the band members aren’t even named, just the celebrity doing the singing. The focus is on the overall sound, the lyrics, and the celebrity doing the singing, and on marketing them. Covers are interesting when a new celebrity does something different with the lyrics.

That would fit with Jacoby’s thesis about the commercialization of music, but she’s too focused on scorn to think of the point. This distinguishes her from Hofstadter, who tried to trace the history of anti-intellectualism instead of rendering value judgments. I may agree with Jacoby’s judgment that intellectual curiosity and education are necessary for good citizenship, but she could have tempered her criticism of specific parts of culture (or buttressed it with argument instead of ridicule).

The Debate on the Constitution, Part One

A fascinating compilation by the Library of America of pamphlets, letters, editorials, and speeches made in the months after the Constitutional Convention’s proposal of a new constitution for the United States.

The opposition to the Constitution was louder than I had realized, sometimes bordering on hysterical: claims that ratification of the Constitution would amount to slavery, monarchy, or aristocracy; that amendment would be impossible and end only in civil war or rebellion; that state governments would be subsumed by the federal government and soon wither to empty shells; or that a federal government would be so ruinously expensive as to be impossible.

Some of the objections seemed to be founded on very little. The insistence that Congress would quickly become an aristocracy, defeating the checks and balances inherent in its separation into two Houses by colluding to deprive the people of their liberties, is easily answered by noting that the House of Representatives can be entirely voted out in just two years. A fear that Congress’s authority to regulate the election of representatives would lead to elections being held, say, at the least convenient place in the state, so as to stack the elections, was never realized as far as I am aware. The claim that federal taxes would subsume state taxes and render the states incapable of raising revenue seems hard to reconcile with the role of the Senate as representatives of the state legislatures.

A lot of time was spent on the vastness of the United States and the difficulty of travel across them: anyone seeking relief in a federal court may have to travel hundreds of miles, making justice impossible to find, and some doubted the possibility of electing a President when finding a man known to everyone in the States would be so difficult. (George Washington excepted, of course.) Two hundred years later, with air travel and interstate highways, these concerns seem quaint.

Other objections make clear how much of our system depends on norms of behavior rather than law. Many writers insisted the Congress would never approve Constitutional amendments which limit its power, no matter how necessary; the subsequent passage of the Bill of Rights showed this to be false. Others insisted that nobody would obey the federal government without threat of force, making the internal use of a vast standing army inevitable, eliminating liberty; legal action proved to be sufficient instead. Supporters of the Constitution argued that the personal honor and integrity of the great men selected to the Senate or voted to the House would restrain them from actions which would impugn their character; perhaps too optimistic today.

Several writers seemed concerned by the prospect of bribery – of representatives, or of those who elect them. The electoral college meets on the same day in every state to make it difficult to coordinate simultaneous bribery of every elector; some defend the cap on the number of representatives by arguing that very small districts would make it easy for representatives to re-elect themselves through bribery of the voters. Modern electoral campaigns, and their enormous expenses, don’t seem to have been anticipated at all, and the influence of campaign money on politics is not recognized at all – I would at least consider it a possible reason to make terms longer, to reduce the time spent on reelection, but many argued terms are already too long and preferred annual reelection so tyrannical representatives could be immediately voted out.

It’s also interesting how little of the modern role of the federal government was anticipated by the founders. Healthcare is an obvious example given current political debates, but also telecommunications regulation (obviously), agriculture, scientific laboratories and grants, conservation, weather, emergency response (beyond a simple militia), massive infrastructure and transportation projects, international agencies, workplace regulations, environmental regulation, intelligence agencies… the role of the federal government is vastly larger than the simple commerce and international affairs imagined by the founders, and yet the states continue to have independent existences and maintain clear sovereignty.

The language of the time is also fascinating to read. I admit that when I first glanced at the Federalist Papers years ago, I decided not to read them because of their long, winding sentences and archaic vocabulary. With a bit of practice that is not an obstacle. But standards of spelling, punctuation, and style have clearly changed a great deal: dashes, colons, and semicolons are often used in different roles than we now accept, paragraphs often stretch for a page or more and contain multiple independent ideas, sentences frequently feature many complicated clauses stitched together with copious commas, and variant spellings like “shew” and “chuse” are in abundance.

The Attention Merchants, by Tim Wu

A breezy history of “attention merchants”: those who sell attention as a business. Starts talking about the earliest advertisers – patent medicine hawkers – and works through early radio, television, the Internet, and finally mobile devices, showing how our attention has been sliced and diced into ever-smaller fragments to be sold.

Early patent medicine advertisements were a new concept and, of course, appeared mostly in print. Radio surprised its early developers when people willingly set aside time each day to listen to their favorite radio shows. Television began to command the power to have millions of Americans watch shows simultaneously. And at each step, attention was sliced into smaller bits to be sold: first entire programs were sponsored on radio and TV, then the “magazine” format with ads splitting up programs, to the banner ads of the Internet and intrusive banners on phones which only grab us for a few seconds at a time.

The history was fast and not greatly detailed, though enjoyable. This was more interesting just for the idea of being aware of our own attention and what demands are made of it, and not allowing outside forces to exert so much control over it without consent.

Weapons of Math Destruction, by Cathy O’Neil

I found this to be a frustrating book to read. O’Neil has a very important thesis: that unaccountable algorithms used to make decisions about people, whether for employment screening, bail decisions, or insurance quotes, are often wildly inaccurate and perpetuate bias against the poor, disabled, or historically disadvantaged. By turning a formerly human decision into one made by an algorithm, companies can appear more objective even when they make it impossible for those rejected by the algorithm to seek an explanation or correct faulty data.

But the development of this thesis is poor. O’Neil goes through several fields, quoting examples from each: recidivism prediction software, teacher evaluations from standardized tests, credit scores, online ad targeting, and many others. But examples are only examined in superficial detail, and in ways that do not suggest O’Neil is terribly familiar with statistics or machine learning. In one segment, for example, on a company developing ad targeting models using anonymous cell phone data, she mentions how the algorithm picked out patterns without any guidance, perhaps noticing how much time people spend on roads starting with the letter J. I know of no algorithm or feature selection method that would be so open-ended, or so stupid.

Or take this discussion of ad effectiveness:

The data scientists start off with a Bayesian approach, which in statistics is pretty close to plain vanilla. The point of Bayesian analysis is to rank the variables with the most impact on the desired outcome. Search advertising, TV, billboards, and other promotions would be measured as a function of their effectiveness per dollar. Each develops a different probability, which is expressed as a value, or a weight. (page 74)

This paragraph is deeply confused. What does Bayesian analysis have to do with ranking variables? Isn’t this just a classification problem? Are we talking about naive Bayes? How is expressiveness per dollar a probability? Either O’Neil does not understand the methods, or she has simplified the discussion beyond all recognition.

I do not mean to argue that she should have quoted mathematics or given a cogent and accessible description of support vector machines. But simplification should clarify, not obscure.

In other examples I left frustrated because O’Neil glossed over interesting details. One section talked about a project at St George’s Hospital Medical School in the 1970s, using a computer to screen applicants to the medical school to reduce the workload of the humans reviewing the paperwork. O’Neil says the computer was trained to use information from prior applications and ended up perpetuating human biases, like biases against foreign-born applicants or against women. But all I wanted to know was how a computer in the 70s was doing machine learning, and what kind of method it used. Did it really “learn” to perpetuate bias, or was computing sufficiently limited that it has to be directly programmed to do so? I can’t imagine anything more complicated than regression being practical then, and certainly it was not doing deep learning on applicant files.

And then there were the small errors:

In previous generations, those in the know were careful to organize the résumé items clearly and consistently, type them on a quality computer, like an IBM Selectric, and print them on paper with a high rag content. (page 114)

Uh. A quality IBM Selectric computer?

I realize I am not the intended audience of this book, since I have the mathematical and statistical training to understand many of the “weapons of math destruction” discussed in the book. I would have liked more depth, more clarity, and more thorough analysis of the issues (and related topics, like the Fourth Amendment connections to predictive policing and Europe’s approach to privacy law), but perhaps O’Neil judged that additional depth would have been harmful. Maybe I just want a different book than she wanted to write. But I think there’s room for a better account of the issues than O’Neil provides.

Do Guns Make Us Free?, by Firmin DeBrabander

An interesting argument, attempting to show that rather than serving as our last line of defense against tyranny, guns actually make us less free.

First, one inevitable complaint. DeBrabander criticizes gun owners who make fun of gun critics who don’t know much about guns, as though the only people qualified to participate in the gun debate are those who own guns (and hence only people on one side of the debate). But later he shows why gun knowledge would be useful for critics. In one passage, he notes that after open carry laws passed in many states, purchases of larger handguns increased. Why, he asks, did gun owners want bigger handguns? Why were the smaller, more easily concealed handguns not good enough for open carry as well? The implication is that gun owners wanted to wave around bigger and more powerful weapons for no good reason; had DeBrabander talked to gun owners, he’d know that small concealed handguns can be hard to control and unpleasant to shoot (their light weight makes the recoil more potent), while full-size handguns are more ergonomic.

But that is a minor point. DeBrabander’s main arguments, neatly separated by chapter:

The NRA, and gun advocates more generally, have encouraged a “Culture of Fear”. The NRA talks about “lunatics” roving the streets, madmen who will burst into your house to kill you for no reason, and even riots and looting. Mental illness is out of control and untreated psychotics rove the streets. You should be afraid, and you should defend yourself.

But the narrative is false. Violent crime has fallen dramatically, and we live in one of the most peaceful times in modern history. (And history in general, if you believe Steven Pinker’s The Better Angels of Our Nature.) The mentally ill are not more likely to commit violent crime than the general population. Most acts of violence are not committed by random unhinged strangers. And most acts of violence occur between poor, urban minorities, not the kinds of people who own lots of legal guns: middle-class suburban whites.

This fear makes gun owners more likely to perceive ordinary situations as lethal threats to themselves, leading to episodes such as the guy who shot someone in a movie theater during an argument and claimed self-defense. (On gun forums you tend to see stories of how, say, the vigilant narrator was at a gas station and noticed an undesirable acting suspiciously while walking toward his car, and successfully deterred the obvious impending violence by showing his weapon.)
Guns, it is claimed, are also protections against encroaching tyranny. We grant power to the government as part of the social contract; should it violate our trust and abuse that power, we must have a remedy. But, according to DeBrabander, this conflicts with the justifications for Stand Your Ground laws, which allow gun owners to take justice into their own hands. If guns are meant to uphold the social contract which grants a government exclusive use of force to ensure order, taking the power away from warring individuals, why Stand Your Ground laws implicitly deny the government’s exclusive use of force and assume we are back in the state of nature, when every man must fend for himself?

DeBrabander also makes a silly suggestion here about Stand Your Ground laws being misused by criminals, who could, say, murder someone on the street to steal his wallet, then claim they felt threatened and had to stand their ground. Until this actually happens, I’ll put it in the category of “weird hypothetical things that no criminal would actually bother with”.

A better point follows. Would gun owners really be able to fend off a tyrannical United States government? Some point to Vietnam, noting that guerrilla warfare worked very well, but (a) the US military is phenomenally overpowered and (b) Vietnam paid incredible human cost for its successful guerrilla campaign, which was by no means easy.
Here’s where the argument gets more interesting. Why, DeBrabander asks, do politicians allow guns when they are clearly a threat to their authority? If politicians are as corrupt and power-hungry as the NRA would have you believe, surely they’d scramble to push gun regulations, NRA or not.

DeBrabander makes a detour through Machiavelli to answer this, but the point is actually simple. Tyranny no longer needs violence. Politicians can allow us to have guns because politicians can exert power in more effective, more subtle ways. DeBrabander, though without citing Solove as far as I can tell, portrays mass surveillance and Fourth Amendment violations as greater threats to liberty than gun regulation; the erosion of privacy allows subtle exertion of power that makes dissent less likely, for fear that someone may be watching. But the people advocating for civil liberties are most often liberal groups like the ACLU, while gun advocates are all for the government being granted sweeping powers when they’re supposed to be used against terrorists.
Guns are a threat to democracy. Their threat of violence silences discussion and undermines the rule of law. If we are to argue that guns are necessary to prevent tyranny, we are saying that unaccountable unelected gun owners should overthrow democratically elected governments when they overstep some ill-defined line of tyranny.

To quote DeBrabander, “the gun rights movement has already abandoned the democratic project: it has decided that portions of society cannot be negotiated with, and government can be negotiated with only by threat of force.”
Finally, a somewhat unfocused chapter on the meaning of power. Pro-gun arguments undermine the rule of law by arguing that the government is ineffective and untrustworthy, ironically creating the dangerous lawless world that gun owners fear. Threats of violence against government are more likely to harden the government and make it retaliate in kind; organized nonviolent protests are a more powerful mechanism to effect political change. And finally, a suggestion that the same demographic that buys guns has also been the biggest political loser in recent decades: blue-collar white men, with no wage growth, disappearing job opportunities, vanishing pensions, and uncaring politicians. Perhaps they think guns will give them the power they are missing.

Overall an interesting read, though I think it could have been more powerfully and clearly written. Some of the silly speculation, such as criminals taking advantage of Stand Your Ground laws, invites easy rebuttal by gun rights advocates, and a lot of the book quotes extensively from philosophers and journalists instead of developing its own coherent narrative. DeBrabander needed a stronger editor, more knowledge of the gun world, and more time to simplify and streamline his arguments.

Books of 2016

Alex Reinhart — Sun, 26 Feb 2017 00:00:00 UT

As part of my obsession with tracking things I’ve read, I keep a file with every book I’ve read since 2012(ish). I write myself brief reviews or commentaries for particularly interesting books, so I remember what they’re about in a few years. Here are some of the best of 2016, selected from a total of 32.

The Day of the Jackal, by Frederick Forsyth

This is what I needed after Tom Clancy: a sharp, well-written thriller with plenty of plausible detail. I’ll have to start reading Forsyth’s other work to see if he fell into Clancy’s trap of modernizing his work until it sucks.

I do wonder – the book is set during de Gaulle’s presidency in France in the 60s, so all the spycraft is based on disguises, falsified passports, shadowy meetings in semi-public places, etc. A modern version would involve the OAS exchanging encrypted text messages and emails, hiding their plans with steganography, and ditching burner phones, while the police use big electronic surveillance systems instead of canvassing hotels for occupancy records. Would the modern version come across as a nerd novel? Too much tech fanciness to enjoy the plot? The original spycraft just makes the assassin seem sophisticated and smart, not nerdy or technologically advanced – it’s a battle of wits between assassin and police, not just who has the biggest electronic surveillance budget and the fanciest apps.

The Space Merchants, by Frederik Pohl and C. M. Kornbluth

An interesting world-building exercise: it’s The Future, and the world is controlled by large corporations and advertising agencies, with governments largely subservient to them. (The House of Representatives consists of representatives to specific companies, not districts.) The world is vastly overpopulated, with everyone living in cramped apartments with fold-up beds and saltwater taps for bathing. Consumption is the primary good: the more you consume, the more you boost the economy, allowing large companies to thrive and pay for advertising to make you consume even more. Addictive substances are added to coffee and sodas. Many workers start in indentured servitude in systems engineered to create more debts to their employers, like the old mining towns and company stores. Companies settle disputes with physical violence and private security contractors.

For a book written in 1953, it’s surprisingly prescient. Growth and consumption are still our highest priorities. Instead of adding addictive substances to our coffee, ad companies try to learn everything about us, using megabytes of Javascript tracking code on every webpage to target advertisements directly to our preferences. They aren’t nearly as good as the advertisers in the novel yet, who apparently have incredible power over the subconscious minds of consumers, but they would happily use the power if they had it.

The plot itself didn’t grip me much – Conservationists don’t want the planned colonization of Venus to continue under the control of an ad agency – and the writing was good but not amazing. I just enjoyed the parallels with today’s economy, where questioning advertising is edgy, questioning consumerism is only acceptable if you continue buying useless gadgets anyway, and presidential candidates routinely advocate for the reckless use of natural resources for the benefit of our economy.

Makes you want to spend the rest of the evening with another lovely hardcover book, instead of Twitter, Facebook, and YouTube.

Double Star, by Robert Heinlein

Fun and interesting. An actor is hired to double for the solar system’s most prominent politician to avoid a political crisis, and ends up far deeper into the job than he anticipated. Turns out he’s a better Bonforte than Bonforte himself.

Doesn’t attempt to be nearly as deep or poetic as Sturgeon’s More than Human, but it’s a fun story, blemished only by some poor characterization (the only woman, Penny, is an emotional wreck who has to be repeatedly told to pull herself together) and his tendency to string along the plot by introducing one more crisis for the actor to manage. There’s no real ending either. But it was fun while it lasted.

It’s always fun to see our limited conception of the future. Double Star is set in a future where nuclear-powered spacecraft can travel at 2 g acceleration to Mars and back, but government records are kept on microfilm in a giant archive on the Moon. More subtly, they mention “scramblers” for video and audio calls which can undoubtedly be broken – actual secure cryptography wasn’t foreseen either.

The Public Domain, by James Doyle

See my full review.

The Closing of the Western Mind, by Charles Freeman

Despite the title, this is not a Sam Harris- or Richard Dawkins-style polemic against Christianity. (Not that I have read books by either of those people.) Instead, this is a detailed accounting of the early development of Christian theology and its roots in (and separation from) Greek philosophy. Freeman details how the Greeks developed a rich tradition of rational thought and empirical investigation, exemplified by Aristotle, but early Christianity leaned more Platonic, favoring the idea that there are pure truths about God which cannot be discovered from empirical evidence alone.

Once the Roman Empire adopted Christianity and made it the state religion, with patronage and tax benefits and all the related benefits, the definition of “orthodox” Christianity quickly became a serious issue, and the many disparate theologies that had arisen from the contradictory and incomplete mass of scriptures, Gospels, and epistles had to be formed into a single creed. This led to centuries of church infighting, imperial interventions and meddling, and political intrigues – bishops were scheming to depose each other, riots broke out over obscure details of the theology of the Trinity, and heretics were exiled outright.

Gradually the Church (at least the Western church) settled on many of the details, but rational thought and disagreement were stifled to protect the consensus. Church leaders openly denigrated pagan philosophers as foolish for attempting to understand the world God created. It became more important to be orthodox than to think freely, for fear of sanction, and as the Church became more Latin-speaking, the works of the Greek philosophers and theologians faded away.

Freeman spends most of his time on a straightforward account of the theological disputes rending the church, and the various imperial interventions to resolve them, rather than hammering home the thesis again and again. This means the book is interesting as a history as well – it covers the basic history of the early church without boring me like Christianity: The First Three Thousand Years did, though I admit to losing track of all the sects and their opinions on the Trinity and Christ’s divinity.

The Rise and Fall of American Growth, by Robert J. Gordon

I have mixed feelings about this book. The first part of the book, detailing the enormous growth in the American standard of living beginning in 1870, is fascinating: we often talk about how life was so simple and calm in the old days, but don’t mention needing to carry in buckets of water and firewood just to have a hot bath, rampant infectious disease, monotonous diets, homemade clothing, slow and dangerous travel to anywhere more than a few miles away, and everything else that comes with a “simple” life. I learned quite a lot about life in America around 1900.

On the other hand, much of this knowledge came through Gordon’s detailed descriptions of charts and graphs, rather than through evocative prose. Much time is spent discussing inflation-adjusted indices and productivity ratios and quality-adjusted life expectancies, even when the core points would be well-supported with much less detail.

But anyway. The second portion of the book describes a claimed slowdown in quality-of-life improvements after 1970: a frequent refrain, after each pre-1970 improvement, is that it “could have only happened once”. Once you’ve provided everyone with clean running water, what can you do that would bring that magnitude of benefit again? You can’t re-eliminate cholera.

Gordon reviews the areas of American life that have dramatically improved since 1970, mostly focusing on communications and computers. Yes, we have the Internet, smartphones, a Jeopardy-playing computer program, and plenty else. But these haven’t dramatically improved our lives, argues Gordon, and can’t provide the same benefits and growth as, say, not dying from typhoid fever. And computers and automation are eliminating much of the core of middle-class jobs: jobs requiring moderate but routine intellectual or manual labor, like assembly-line manufacturing or routine bookkeeping. Instead we’re left with high-end non-automated jobs (famous musician, professional lawyer or doctor, engineer, etc.) and low-end manual labor that can’t be entirely automated.

I can buy this argument. Short of a Singularity event which eliminates death, we can’t replicate the same health gains we made from 1870-1970; short of teleportation or hypersonic airplanes, we can’t beat the 20th century for transportation improvement; short of ubiquitous high-speed Amazon drones, we won’t replicate the gain in convenience and variety brought by supermarkets instead of small general stores. But I suspect we are not very good at predicting the impact of modern inventions – 3D printing, ubiquitous machine learning, networked everything – on the quality of life.¹ There is less room for gains, because so many have been made, but hindered by our liminted imaginations, we do not see where gains can be made.

Gordon then offers an analysis of what can be done to improve growth. Some of this – reduce government regulation, eliminate some patent and copyright barriers, reduce inequality – seems straightforward, but I am in no way qualified to understand the economic implications. I will reserve judgment. I was somewhat disappointed that the recommendations were not more detailed or accompanied by evidence that they’d work – say, comparisons with European countries with different regulatory regimes, or between states introducing different policies. But fixing American growth is surely an entire book on its own, so I can’t blame Gordon for not spending more time on the topic.

Tinker Tailor Soldier Spy, by John le Carré

This is my second reading of this book, after reading it last in 2012 and seeing the movie in 2011 and again earlier this year. It finally all made sense. I finally could keep track of all the characters, by remembering their faces from the movie, and I could finally make sense of the plot and Smiley’s schemes. A great novel but one that requires careful reading. I should move on to The Honorable Schoolboy and Smiley’s People.

As with The Day of the Jackal, this book makes me wonder if it’s possible to set an engaging spy thriller in modern times. It makes me think of Mark Kac’s old explanation of genius:

There are two kinds of geniuses: the ‘ordinary’ and the ‘magicians.’ An ordinary genius is a fellow whom you and I would be just as good as, if we were only many times better. There is no mystery as to how his mind works. Once we understand what they’ve done, we feel certain that we, too, could have done it. It is different with the magicians.

(He then went on to say that Feynman was “magician of the highest caliber”.)

Good mysteries and thrillers often involve the ‘ordinary’ genius, Smiley being an exemplar. Inspector Lebel of The Day of the Jackal was another example, as is Jack Ryan of the Clancyverse. But any modern thriller I see – usually in movie form, because I haven’t found an author I like – tends to lean heavily on technology (see recent Bond movies, for example), and technology ends up being a technobabble deus ex machina. Technicians are magicians. (Possibly because many authors are not technical experts, as evidenced by Dan Brown’s Digital Fortress. I wonder if a digital native would be able to craft a mystery story centered on technology and accessible to other digital natives. A nerd’s novel, say.)

On Paper, by Nicholas Basbanes

This is an interesting book, for its form as much as its content. The modern trend seems to be for nonfiction books to be framed around the author’s journey to find out about the subject of the book—read any Mary Roach book, for example, and you’ll hear zany stories of the scientists and colorful characters she meets as she does her research for the book. Some books take this far enough that there’s very little in the way of actual content.

Basbanes, on the other hand, is immensely knowledgeable, and fits a few paragraphs of narrative around a wealth of interesting material on all manner of topics related to paper: Chinese and Japanese papermaking, the use of paper in intelligence work, the paper that fell from the World Trade Center during its destruction, paper records of early American history, and much more in between. This is a book that will make you want to buy some nice cotton bond paper for no good reason.

Ten or twenty years ago, we probably didn’t expect that the greatest application of “big data” and machine learning would be in advertising. I guess that says something about the kind of world we live in.↩

R's Lists and its Detestable Dearth of Data-Structures

Alex Reinhart — Mon, 12 Sep 2016 00:00:00 UT

Update, September 2018: The R Journal recently published an interesting review of the state of data structures and collections in R: Timothy Barry (2018), “Collections in R: Review and Proposal”, vol 10 no. 1, pages 455-471.

R is the lingua franca of academic statistics. Many papers introducing new statistical methods are accompanied by a package posted on CRAN, R’s repository of useful packages and tools. Undergraduate programs often teach R and use it throughout their courses, as do most graduate programs—most PhD students I know are implementing their work in R as they do their research. R’s popularity is exploding, and even Microsoft has their own R distribution these days.

But R is an unusual language. It was designed by statisticians for statisticians, and has a number of convenience features—for example, there are no scalars. The number 17 is just a vector of length one, and the + operator can add arbitrary vectors elementwise by dispatching to fast C code. On the one hand, this means I can write foo + bar for two large vectors and get an efficient sum, but on the other hand, it means 1 + 2 has to go through the same loop instead of being converted to a fast addition.

There are other curious features of R, like Ross Ihaka’s famous¹ example of a function whose return value x is randomly local or global:

f <- function() {
    if (runif(1) > .5) {
        x <- 10
    }
    x
}

Ihaka says “No sensible language would allow this,” and misfeatures like this seriously limit R’s performance by making it difficult to efficiently run R code. (How can an optimizer deal with a variable that is randomly local or global?)

But the performance of a program depends on more than just how quickly the interpreter can execute the instructions. For a program doing vectorized operations on large arrays of numbers, sure, but what about a program that needs different data structures, like sets, graphs, or trees? What about a hash table? Do we have the tools to efficiently store and search data in appropriate data structures?

Lists lists lists

Besides vectors, and their multidimensional generalizations in matrices and arrays, the other core type in R is the list. If we want to build data structures, we’ll have to build them from lists. A list has named fields which can contain arbitrary data—vectors, other lists, strings, functions, and anything else. For example,

foo <- list(walrus="walrus", func=function(x) { -4 * x + 3 },
            walden=c(1, 2, 3, 4, 5))

foo$func(2)
# -5

foo$walrus
# "walrus"

Lists are the structures upon which much of R’s other features are built. For example, R’s famed data frames are just lists with one named field per column, and S3 classes (such as the lm objects returned from a linear model fit) are usually just lists with their class attribute set to their class name. Generic functions like predict are implemented by checking the class attribute and passing the list to the appropriate function.

Lists also support numbered entries, and some R functions return lists without any names. We can access named and unnamed entries by their numbers:

foo[[3]]
# [1] 1 2 3 4 5

Lists are the only key-value mapping type provided in base R: there are no dictionaries or associative arrays.² Nor are there structure or record types. So you’re encouraged to use lists for everything. Returning a complicated set of results from a model fit? Use a list. Looking up items by their name? Use a list.

It’s great to have such a flexible data structure, but it turns out that lists are a poor foundation to build a language upon. Let’s start with the basic problem: finding elements in a list.

What about the lookups?

Lists support some unusual features. For example, we can access their elements with shortened names, and we’ll get the element with the best match:

foo$wald
# [1] 1 2 3 4 5

Hang on—how does R do that? In a standard hash table, we hash the key (turn the string “wald” into a numeric code), map that into an array, and look up the entry. But if we look up “wald” instead of “walden”, we’d end up in the wrong place and fail to find the entry. How does R do it?

Simple: it’s not a hash table. It’s a named array. Lookup means an O(n) linear search through the array, checking the names as we go, not an O(1) hash table lookup.

We can see how lists are constructed in src/main/builtin.c, inside do_makelist. There are two steps. First, R iterates through the arguments to list (which are provided as a SEXP—yes, in reference to S-expressions, since R code is internally kept as linked lists) and checks if any of them have names. If they do, it allocates a names vector and fills it up with the names of the list elements, then attaches it as an attribute to the resulting list, which is internally stored as a vector (an array).

So lists are a pair of arrays: one array of pointers to the list elements, one array of their corresponding names.

Subsetting (the $ operator) is implemented in src/main/subset.c, inside R_subset3_dflt, where R iterates through the list to find the element with a matching name:

R_xlen_t i, n, imatch = -1;
int havematch;
nlist = getAttrib(x, R_NamesSymbol);

n = xlength(nlist);
havematch = 0;
for (i = 0 ; i < n ; i = i + 1) {
    switch(pstrmatch(STRING_ELT(nlist, i), input, slen)) {
    case EXACT_MATCH:
        y = VECTOR_ELT(x, i);
        if (NAMED(x) > NAMED(y))
            SET_NAMED(y, NAMED(x));
        UNPROTECT(2); /* input, x */
        return y;
    case PARTIAL_MATCH:
        havematch++;
        if (havematch == 1) {
            /* partial matches can cause aliasing in eval.c:evalseq
               This is overkill, but alternative ways to prevent
               the aliasing appear to be even worse */
            y = VECTOR_ELT(x,i);
            SET_NAMED(y,2);
            SET_VECTOR_ELT(x,i,y);
        }
        imatch = i;
        break;
    case NO_MATCH:
        break;
    }
}

So we iterate through the elements of the list, checking their names. Exact matches are returned immediately, but unique partial matches require extra processing later (R optionally throws warnings on partial matches, since they’re easy ways to screw up). If there are multiple matches, no match is returned, since picking one arbitrarily would surprise the user.

Now, I mentioned this means it’s an O(n) linear search, not a O(1) hash table lookup. What implications does this have for practical programs? Unfortunately, it makes lists unusable for many tasks you’d want a hash table for.

For example, in our statistical computing course, we give students a simple challenge: given a file of dictionary words, find sets of words which are anagrams of each other, and print out the sets of grouped anagrams. (For example, “iceman” and “cinema” would be printed together.) The best way to do this is with a hash table, keyed cleverly so that anagrams have the same key.³

As an example, I read in 91,000 English words from a text file and made them keys of a list, with the values just always 1. I then selected a random subset of 200 of the words.

words <- read.csv("english-words.txt", stringsAsFactors=FALSE,
                  header=FALSE)

wordlist <- as.list(rep(1, length(words$V1)))
names(wordlist) <- words$V1

keys <- sample(words$V1, size=200)

I then wrote a simple function to time how long it takes to look up those keys in lists of varying sizes:

time.lookup <- function(n) {
    sublist <- sample(wordlist, n)

    t <- system.time({
        for (key in keys) { sublist[[key]] }
    })

    t[["elapsed"]]
}

Some quick plotting and we obtain:

svg("list-lookup.svg", width=5, height=4)
sizes <- seq(from=1000, to=90000, by=500)
times <- sapply(sizes, time.lookup)
plot(times ~ sizes, xlab="List size", ylab="Lookup time (seconds)",
     pch=20, las=1)
dev.off()

Yes, it’s roughly linear, so list lookups are O(n). For the anagrams problem, this becomes a prohibitive time cost, making the program orders of magnitude slower than the equivalent in Python or another language with built-in O(1) hash tables.

Obviously, for small lists (like a data frame with five columns), a linear search through such a small array is hardly a cost at all—it may even be faster than calculating a hash function. But once we start storing large datasets in lists, we run into problems.

Lists: powerful but inflexible

Lists are flexible and widely used in R. They’re the basis of S3 classes, they can store heterogeneous and deeply nested data, and the default vectorization machinery (like apply and lapply) produces lists. There’s extended list subsetting syntax with the [ and [[ operators, along with $, and useful features like record types (or structs) can be emulated by simply using a list.

But the next problem with R’s approach to lists is the names: they may only be strings. In Python (or Racket, or many other languages), on the other hand, I could use any immutable type as a key:

foo = {(1, 2): 7, "bar": "baz"}

foo[(1, 2)]
# 7

Any immutable type can be hashed, so there’s no reason it can’t be used as a key in a dictionary. But R only supports strings as list keys. This may seem like a minor niggle, but it turns out that arbitrary keys can be amazingly useful for O(1) lookups of things other than strings. They’re also useful for storing sets, collections of items for which every item appears only once and which support efficient unions and intersections. I often make use of sets in my own code:

Finding duplicates in sets of data other than strings.
Storing sets; for example, my Conway’s Game of Life code (in Racket) keeps track of which cells are live by storing the current set of live cells, in terms of their (x, y) coordinates. (This way, I don’t need to store a matrix of all the grid cells—I can just store the live ones, and hence support an infinitely large grid. This trick came from Chris Genovese.) But in R, I couldn’t use this trick: there’s no way to look up if a cell is live without an O(n) search, and I’d have to convert the coordinates to “x,y” strings instead of using more natural c(x,y) vectors.

Code that tries to get around these restrictions has to use convoluted workarounds. The sets package, for example, offers native set data structures in R, but stores them as sorted lists. (Sorting is necessary to make it easy to check if two sets are equal.) For sets containing non-numeric elements, the elements are sorted by their string representation, so every object has to be converted to a string. To perform set intersection operations, or others that would require O(1) access, the list is converted to a hash table by converting all its elements to strings and using an R environment.

All these convolutions cost efficiency and complexity. Of course, the user of the sets package need not worry with the details, but if they care about performance, they’ll notice the cost.

The problem with absent data structures

R users have various superstitions about writing fast code: vectorize everything, avoid for loops, use built-in functions when you can, and write hot functions in C++ (through Rcpp) whenever necessary. And that’s about it.

And yes, vectorization can gain you a large constant factor in performance over a for loop, since the loop is done in fast C code instead of a slow bytecode interpreter. But if you want to go from O(n²) to O(n), you need to use the right data structure instead of munging together lists of lists using vapply and merge and whatever other functions you can piece together.

I’d love to see basic sets, hash tables, and trees built into R. I don’t know if it can be done while keeping compatibility, but it’s necessary if R is to become a more general-purpose programming language. I don’t know if that will happen or if Julia and Python eat its lunch.

For certain values of “famous”.↩
I realize there are environments, but they are a curious beast of their own. The hash package uses them to implement hash tables with string keys.↩
Stat computing students: it’s cheating to read the rest of this footnote. But, in short, sort the characters in the words. “iceman” and “cinema” both sort to “aceimn”, so they’d end up at the same key in the hash table.↩

Simpson's Paradox and Statistical Urban Legends: Gender Bias at Berkeley

Alex Reinhart — Sun, 08 May 2016 00:00:00 UT

Introductory statistics textbooks usually point out Simpson’s paradox, an interesting phenomenon that’s usually illustrated with a story from the University of California, Berkeley. The story goes something like this:

In 1973, UC Berkeley was sued for gender bias, because their graduate school admission figures showed obvious bias against women:¹

	Applicants	Admitted
Men	8442	44%
Women	4321	35%

Men were much more successful in admissions than women, leading Berkeley to be “one of the first universities to be sued for sexual discrimination”. (The difference is statistically significant with p ≈ 10^-26!) The lawsuit failed, however, when statisticians examined each department separately. Graduate departments have independent admissions systems, so it makes sense to check them separately—and when you do, there appears to be a bias in favor of women.

How does this happen? The simple explanation is that women tended to apply to the departments that are the hardest to get into, and men tended to apply to departments that were easier to get into. (Humanities departments tended to have less research funding to support graduate students, while science and engineer departments were awash with money.) So women were rejected more than men. Presumably, the bias wasn’t at Berkeley but earlier in women’s education, when other biases led them to different fields of study than men.

Now, this example has been analyzed to death in many places: on Wikipedia, in various blogs, in many textbooks (including my own book), and pretty much everywhere else. I’m not going to present a new analysis of the data or of Simpson’s paradox.

I just want to point out something simpler: There never was a lawsuit!

The real Berkeley story

A Wall Street Journal interview with Peter Bickel, one of the statisticians involved in the original study, makes clear that Berkeley was never sued—it was merely afraid of being sued:

Simpson’s Paradox has fooled many. In the fall of 1973, for instance, the University of California, Berkeley’s graduate division admitted about 44% of male applicants and 35% of female applicants. That raised eyebrows among school officials, who feared bias and asked Peter Bickel, now a professor emeritus of statistics at Berkeley, to analyze the data.

“The associate dean of the graduate school thought that the university might be sued,” Mr. Bickel says.

When Mr. Bickel and his colleagues scrutinized the data, they found little evidence of gender bias. Instead, they discovered that more women had applied to departments that admitted a small percentage of applicants, like English, than to departments that admitted a large percentage of applicants, like mechanical engineering.

The core paradox matches the usual story, but no lawsuit was involved. I’ve done some digging and I haven’t been able to find the original source of the mythical lawsuit—perhaps an early textbook or journal article author misheard the original story, wrote about a lawsuit, and authors ever since have copied the story unchanged.

Academic urban legends

A scan through Google Books reveals this urban legend has infected many recent books, and undoubtedly many older ones:

Paradoxes in Scientific Inference (2012), by Mark Chang
Learning Statistics with R, by Daniel Navarro
R Graphics Cookbook (2013), by Winston Chang
Math on Trial: How Numbers get Used and Abused in the Courtroom (2013), by Leila Schneps and Corali Colmez. Surprisingly, this book makes the claim in an entire chapter dedicated to Simpson’s paradox, after carefully explaining a separate lawsuit about gender bias in Berkeley’s faculty hiring process. After explaining the suit in detail, the mythical graduate admissions lawsuit is thrown in as an aside.
Einstein’s Riddle: 50 Riddles, Puzzles, and Conundrums to Stretch Your Mind (2009), by Jeremy Stangroom
Impossible? Surprising Solutions to Counterintuitive Conundrums (2011), by Julian Havil

It’s also present in the scientific literature:

Harris Cooper and Erika A. Patall. “The relative benefits of meta-analysis conducted with individual participant data versus aggregated data,” Psychological Methods (2009), vol 14(2), pp. 165-176. 10.1037/a0015565. “Perhaps the best known example of Simpson’s paradox involves a lawsuit brought against the University of California at Berkeley alleging bias against women in the selection of graduate school applicants”
Stanley A. Taylor and Amy E. Mickel. “Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise,” Journal of Statistics Education (2014), vol 22(1). “An example of this phenomenon is when the University of California, Berkeley was sued for bias against women who had applied for admission to graduate schools in 1973.”
Matteo G A Paris. “Two quantum Simpson’s paradoxes,” Journal of Physics A (2012), vol 45(13). 10.1088/1751-8113/45/13/132001. “One of the best known real life examples of Simpson’s paradox occurred when the University of California, Berkeley was sued for bias against women who had applied for admission to graduate schools there.”

Now, this isn’t the first time scientists have unwittingly propagated a myth through decades of the literature. The best example is possibly the century-old myth that spinach is a rich dietary source of iron: even the myth is a myth, as it turns out the common explanation (that German chemists measuring the iron content of spinach had misplaced a decimal point) is also a myth, propagated through the literature by scientists copying references without checking up on their provenance.²

It seems Simpson’s paradox has experienced a similar problem. Someone, perhaps back in the 1970s or 1980s, before the story was easily Googleable, wrote about a lawsuit to spice up their example, and the story has been repeated ever since.³ I only detected the problem because I am the type of nerd who wonders “Is the court opinion in this case available online? I’d love to read what the judge thought of the statistics”, and so I started hunting for a nonexistent court case.

Perhaps when we use stories to illustrate common statistical errors, we should make sure our stories are not in error as well.

Bickel, P. J., Hammel, E. A., & O’Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkeley. Science, 187(4175), 398–404. http://doi.org/10.1126/science.187.4175.398 ↩
Rekdal, O. B. (2014). Academic urban legends. Social Studies of Science, 44(4), 638–654. http://doi.org/10.1177/0306312714535679 ↩
Kudos to the anonymous Wikipedian who noticed that none of the Simpson’s Paradox article’s sources could confirm a lawsuit and removed the mention.↩

Review: James Boyle's 'The Public Domain'

Alex Reinhart — Wed, 17 Feb 2016 00:00:00 UT

James Boyle is a law professor specializing in copyright law, and The Public Domain is some of the most readable legal writing I’ve seen in a long time. (He gets bonus points for making the ePub and PDF freely available on his website.) Boyle covers the recent expansion of copyright law to protect more works for longer terms, depriving our culture of the benefit of their free use. He starts by covering the foundational arguments of copyright law: just what is “intellectual property,” and what sorts of rights do authors have in their works?

Boyle is a fan of Thomas Jefferson’s thoughts on the subject. Jefferson argued that there is no inherent property right in intellectual works: once you read my book or hear my speech, you have an irrevocable mental copy, and your possession of the work does not prevent anyone else’s. Unlike physical property, many people can have the work simultaneously. Copyright is instead a limited grant of a monopoly by the government, with the intent of promoting artists and authors; because monopolies were axiomatically considered Bad Things at the time, the monopoly must be as limited as possible to carry out its purpose.

This stands in contrast to the common European (and, most often, French) views of copyright, usually phrased in terms of “author’s rights” instead. In this system, authors have an inherent perpetual right to works they produce—works that are the embodiment of their unique genius, works they poured tremendous creative energy into, works which others could only spoil with their tampering. The distinction is better explored in Peter Baldwin’s Copyright Wars, though just about everything else is done better by Boyle; Baldwin’s work, though exhaustive, was not nearly as fluid and readable, and showed symptoms of Academic Sentence Convolution Syndrome.¹

Boyle reframes the argument in favor of limited copyright terms in terms of government subsidies. Using government power to protect the revenues of the tiny fraction of works which still draw revenue several decades after publication has a cost: it renders the vast majority of unprofitable and abandoned works completely unavailable to anyone. We ban ourselves from the majority of culture to subsidize a few percent of published works.²

I find this argument convincing. Project Gutenberg shows that old culture can be digitized and reused for all sorts of interesting purposes, Wikipedia just how valuable free culture can be, the entire Internet what can happen when information can be shared freely and easily. Why not make the war correspondence and novels of the 1940s available to all? Why let the literature of the 1950s slowly rot away in libraries when it can be read and enjoyed again?³

Boyle also discusses the “Internet Threat”: that computers and the Internet will make copying and redistribution so easy that new rights must be granted to copyright holders to prevent serious loss of business. Digital Rights Management falls in this category: companies must be granted a right not only to monopolize distribution of their work but to prevent reuse, under fair use or not, of work already sold to a paying customer. Media companies describe DRM-breaking tools as burglary tools, while advocates point out that they’re breaking into works they already own. Boyle says the DMCA’s prohibitions on DRM-breaking are a new, and constitutionally unfounded, right granted to media companies: there is no similar scheme whereby I can put a mark on a book to say “you may not copy this, even under fair use”, so why should breaking DRM be any different?

(As an aside, Boyle’s comments on DeCSS and movie piracy are amusing. He quotes from the decision in Universal City Studios, Inc. v. Reimerdes, a 2000 court case against DeCSS, a description of the process of movie piracy: half an hour decrypting the DVD, 10 to 20 hours synchronizing the compressed audio and video, and six hours to transfer it to someone else over IRC. Compared to “renting the same movie at Blockbuster for $3”, Boyle thinks, this is no threat. Blockbuster’s shareholders would beg to differ, as would The Pirate Bay.)

Similarly, some companies tried to use copyright law in lieu of patent or trademark law, preventing competitors from making garage door remotes compatible with their openers or music stores compatible with their iPods. The courts looked down on this; while it may count as circumvention, there certainly was no intent to redistribute copyrighted works or otherwise violate the rights of the rightsholders.

Circumvention is followed with a discussion of copyright in music, where nearly every work is a derivative work: borrowing melodies, arrangements, styles, and even portions of recordings is an entirely normal part of the creative process. But I’m not very interested in the music world (to my detriment, perhaps), so I skipped past much of this discussion.

Next, Boyle covers open-source software and the Creative Commons movement, which has somehow produced high-quality works without any profit incentive whatsoever, and uses copyright law to enforce inclusivity (through copyleft), rather than to exclude uses. The existence of open-source software and free culture suggests a flaw in the usual arguments for intellectual property: quite a lot of intellectual property needs no profit motive to be created. People write code because they want to, not because they expect payment. People write entire books because they enjoy the process or because they believe the message is important (as I did). As a result, Boyle argues that proposed intellectual property legislation should consider not just its impact on commercial ventures but on open-source and free-culture works, because they are just as important a source of innovation; advance commercial interests at the cost of open-source software and you will damage a number of businesses unintentionally.⁴

Patents also apply to software, though the book was written before Alice Corp v. CLS Bank demolished the idea that the phrase “by means of a computer” turns unpatentable abstract ideas into patentable inventions. Some of the discussion is hence out of date. Software patents are still a legal minefield, though, and one (from my admittedly biased perspective) worth exploring in more depth.

The final chapters explore evidence-based intellectual property policy, like evaluations of Europe’s odd database rights scheme, and how policy is often based on the bald assertion that “more rights terms benefit authors” without any analysis of costs, incentives, or economics. (Retroactive term extensions are particularly inexcusable in economic terms, since the authors were clearly willing to produce the works without them.) As a statistician I’m particularly annoyed by database rights; most databases I encounter were created or funded by government agencies and hence by my tax dollars, and my current research project involves combining an assortment of databases to answer questions the original compilers would never have thought of.⁵ Database rights would make my work much more difficult, blocking important research, and bring very little benefit to the compilers of the databases.

Overall I was happy with The Public Domain, and I wish more people involved in intellectual property policy (politicians, scientists and publishers involved in the big open access debates, authors) would read it. Perhaps it doesn’t go into incredible academic depth on every issue, but it is likely a better book for it. To put it another way: I finished the book wanting to learn more, showing not that it was too short but that it was interesting and engaging enough to leave me wanting more.

Fred Rodell famously complained that lawyers are incapable of saying anything directly or forcefully. Criticizing a judicial decision, a lawyer might write “It would seem that a contrary conclusion might perhaps have been better justified.” Baldwin is guilty of this kind of writing; Boyle simply writes “If this [decision] were a law school exam, it would get a ‘D’. (Maybe a C given grade inflation.)”↩
Mark Twain argued that this was exactly why copyright terms should be long: you benefit the children of those few authors, and harm nobody else. But making culture unavailable is a harm, and a state subsidy for the children of successful authors should be recognized for what it is: a state subsidy for the children of successful people. Hardly popular in today’s political climate.↩
I once heard a vaguely plausible counter-argument: free access to 20th-century literature and art would mean new work has to compete with free older work. Publishers currently can put books out of print after a few years, removing them from competition, but if a vast back-catalog of out-of-copyright works suddenly appeared, would people still read new books? Watch new movies? Well, yes, probably; the marketing power of the publishers is considerable. But would you deny us the benefit of learning from our history to subsidize new work?↩
It’s interesting to see that there’s a modern backlash against copyleft in the open source community: many modern projects choose licenses like the BSD and MIT licenses, which have no copyleft terms and are compatible with reuse in proprietary software. Developers now regard adoption of their code by closed-source for-profit software a good thing rather than a detriment to user freedom. Copyleft licenses like the GPL are seen as only restricting the potential uses of your software. I’m not sure how this shift happened—perhaps the wide success of open source software makes people believe copyleft terms are not necessary to ensure its continued existence?↩
A lot of statisticians can say this.↩

A flexible implementation of Conway's Game of Life

Alex Reinhart — Mon, 25 Jan 2016 00:00:00 UT

Conway’s Game of Life is a simple simulation with surprisingly complex behavior. We start with a simple 2D grid of cells. Each cell can be either alive or dead. The rules are:

A live cell stays alive if it has two or three live neighbors.
A dead cell becomes alive if it has exactly three live neighbors.

We started with a certain set of cell states and then iterate, applying the rules to every cell at each iteration. Despite their simplicity, these rules can produce amazingly complicated behavior. For example, here is Gosper’s Glider Gun, which produces “gliders” which wander off to infinity:¹

An animation of Gosper’s Glider Gun in action.

For our Statistical Computing course, Chris Genovese used the Game of Life as an example at the very beginning of the course to illustrate the flexible design of programs. Students first were told to write an ordinary implementation following the rules above – then asked how they’d abstract their algorithm so they could replace the rules or even work on a non-rectangular grid. Without careful planning, that kind of abstraction could be very difficult.

So here’s a sample implementation in Racket, my favorite little Lisp.

Setting up the rules

First we need to specify the rules of the game. Conway’s rules are the most popular, but it’s possible to use different rules to decide which cells live or die, so we’ll abstract the rules out to a function.

The rules function only needs to know two things: the number of neighbors the cell has, and whether or not it’s currently alive. Then it returns a boolean indicating whether the cell will stay alive.²

(define (conway-rules neighbors alive?)
  (if alive? (or (= neighbors 2) (= neighbors 3))
      (= neighbors 3)))

If we want to try out different sets of rules, it’s easy to define entirely new functions – with arbitrary logic and complexity – which can replace conway-rules. As you’ll see below, conway-rules is passed as an argument to our other functions, so replacing it requires no source code changes at all.

Operating on the grid

It’s tempting to store the state of the Game of Life using some kind of two-dimensional array or matrix. Then, on each step, we’d loop through the whole board, checking which cells should live or die. But this would become slow – the gliders we saw earlier can wander off arbitrarily far, so our board would have to be large and the loop slow. If our board isn’t large enough, cells could reach the edge, and we’d either need to increase the board size or accept that their behavior will be wrong.

Worse, the amount of computation needed would grow with the board size, regardless of how many cells in the board are actually alive. In the worst case, we’d have an enormous board of entirely dead cells, and at every step we’d still loop through the whole thing checking if anything should come alive.

But look back at the rules. All we really need to keep track of is the live cells. We can tell if a cell comes alive just by knowing if it has live neighbors.

Imagine we have a list of currently live cells. If we find the neighbors of each live cell, we’ll have a big list of coordinates; if a cell is the neighbor of three live cells, it’ll appear in that list three times. So, for example, a currently dead cell which is the neighbor of three live cells will appear in the list, and we can mark it as alive on the next step.

On our rectangular grid, we can easily find the neighbors of a single cell with

(define (neighbors-rect location)
  (let ([x (first location)]
        [y (last location)])
    (for*/list ([dx '(-1 0 1)]
                [dy '(-1 0 1)]
                #:when (not (= dx dy 0)))
      (list (+ x dx) (+ y dy)))))

Coordinates are stored as (x, y) pairs. We loop through perturbations, adding or subtracting 1 from x and y, generating the eight neighbor coordinates.

Taking the next step

I’m going to put together a key function from a few pieces here. Let’s start with the basic bits. To apply Conway’s rules, we start by getting the list of all neighbors of all cells:

(append-map neighbors (set->list live-cells))

We need to count how many times each coordinate appears in this list, so we’ll use a hash table. The result is a table of item,count pairs:

(define (count-occurrences neighbors)
  (for/fold ([ht (hash)])
            ([item neighbors])
    (hash-update ht item add1 0)))

hash-update takes a hash table, looks for a key (item), and updates it according to the function passed to it – in this case add1, so the count is incremented by 1. If the key doesn’t exist, we pass 0 to add1 as the default. As a result, we have a count of how many time each item appears in the neighbors list.

Then it’s just a matter of applying the rules. If we have the number of neighbors and a set of live cells, then applying the rules is as simple as

(conway-rules (hash-ref num-neighbors cell)
              (set-member? live-cells cell))

The hash-ref simply looks up the cell in the table, returning how many neighbors it has.

Now we can assemble together a function to take a set of live cells and return a new set:

(define (step rules neighbors live-cells)
  (let ([num-neighbors
         (count-occurrences (append-map neighbors (set->list live-cells)))])
    (list->set (filter (lambda (cell) 
                         (conway-rules (hash-ref num-neighbors cell)
                                       (set-member? live-cells cell)))
                       (hash-keys num-neighbors)))))

Running more generations

Our step function only runs one iteration: it applies the rules, generates the new set of live cells, and returns it. How do we run the Game of Life for more steps?

A simple route would be a for loop. But we can be fancier. Racket supports lazy sequences: infinitely long sequences whose elements are calculated only when we ask for them. We can define one recursively.

(define (life cells)
  (stream-cons cells (life (step conway-rules neighbors-rect cells))))

stream-cons is an interesting function that works recursively. It defines a stream (a sequence) that begins with cells and whose next element is formed by calling its second argument – which creates another sequence that starts at the next step.

In summary, life applies Conway’s rules on a rectangular grid, making an infinite sequence of cell states. We can ask Racket to provide any element in the sequence and it’ll be calculated on demand.

The results

Let’s try our code on a sample set of live cells. We’ll use the “acorn”, a configuration which starts small and then grows for thousands of generations. I’ve imported a draw-game function which draws the results. At the beginning, it looks like this:

(define acorn (set '(1 1) '(2 1) '(2 3) '(4 2) '(5 1) '(6 1) '(7 1)))
(draw-game (set->list acorn))

The acorn configuration.

But after just fifty steps, we have:

(draw-game (set->list (stream-ref (life acorn) 50)))

The acorn configuration after fifty steps.

And after fifty more:

(draw-game (set->list (stream-ref (life acorn) 100)))

The acorn configuration after 100 steps.

Life on a hexagonal grid

One more thing. Previously we defined a neighbors function, which gets the list of neighbors of a cell. That function assumed we were working on a rectangular grid. But, if you look carefully, you’ll see it’s the only function which assumes that. If we want to change the grid, we need only swap out the neighbors function.

Suppose, then, we swap out the rectangular grid for a hexagonal one. There are many possible hexagonal grid coordinate systems, so I’ve chosen one arbitrarily, and to make it brief, each cell has six neighbors instead of eight, and we can generate them all with a slightly different function:

(define (neighbors-hex location)
  (let ([x (first location)]
        [y (last location)])
    (for*/list ([dx '(-1 0 1)]
                [dy '(-1 0 1)]
                #:when (and (not (= dx dy 1))
                            (not (= dx dy -1))
                            (not (= dx dy 0))))
      (list (+ x dx) (+ y dy)))))

By swapping out neighbors-rect for neighbors-hex in our definition of life, we can work on a hexagonal grid. No other code needs to change except for how we draw the grid.

Animation from Wikimedia Commons.↩
I apologize for the minimal syntax highlighting. pandoc’s syntax highlighting library doesn’t support Racket yet, so I’m using the Clojure highlighting, which doesn’t recognize most of Racket’s keywords.↩

Review: Neil Postman's 'Amusing Ourselves to Death'

Alex Reinhart — Sat, 23 Jan 2016 00:00:00 UT

Amusing Ourselves to Death is interesting both for its arguments and the realization that things have gotten much, much worse since it was published in 1985. The key to Postman’s argument is that the medium affects the message: yes, you can say the same thing on TV as you would in print, but you probably won’t. To illustrate this, Postman begins by describing typographic America: America in the 1800s, before the invention of photography or the telegraph, when nearly every aspect of public discourse occurred in print. Print is a great format for argumentation—for laying out logical propositions and supporting evidence—because it is susceptible to careful analysis: arguments laid in front of you on a page can be dissected and examined any way you want.

Discourse not in print, like the many popular public lectures and speeches, was heavily inspired by print—public speakers used language and syntax that very much sounds like it was written, with complex multiclause sentences and subtle arguments.¹ People were expected to carefully listen to hours of public debates and lectures. The inherent biases of print shaped the rest of our culture.

Then came the telegraph and the explosion of irrelevance. Suddenly we were deluged with information which, we claimed, stood on its own merits—that is, information with no inherent use except as information. What are we to do about the latest Middle East conflict or political dispute? Or the price of beans in New Brunswick? We can vote, but voting is a blunt instrument, and our only other option is, essentially, to answer opinion polls and simply become more news. Worse, rapid-fire media like radio and television are intrinsically unsuitable for detailed exposition and argument, and so they now aim to amuse, not to inform. Postman puts this change pithily:

“We might even say that America was founded by intellectuals, from which it has taken us two centuries and a communications revolution to recover.”

I can see the point here. Read the New York Times (or, hell, the Daily Mail) every day for a week and you’ll find very little you can actually act upon. Maybe something that will influence how you vote,² maybe a book or product you want to buy, but most of the news items are irrelevant. Since 1985, things have gotten worse: the Daily Mail, Buzzfeed, and most of online media are aimed at brief moments of amusement (through listicles and animated GIFs), so much so that long-form articles are rare enough to have their own website curating them. We no longer carefully examine political discourse, and politicians know better than to attempt using sophisticated and subtle arguments in TV debates where they have only 30 seconds to provide a soundbite.

The Internet would horrify Postman. Now, not only do we demand that everything amuse us, we leave if it fails to amuse us within two or three seconds. Slow page load? Back! Reddit is the epitome of this: the default front page is people voting on what amuses them most. Comment threads are either puns (rapid-fire amusement) or funny stories, a few paragraphs at most. Political discourse is reduced to one-line expressions of outrage or disbelief. Serious discussion cannot survive unless determined moderators ruthlessly delete amusing posts.³

In short, we’re way past the Kodak moment. We live in search of the tweetable moment.

This much is easy to see. But: of all the long-form writing and arguments the so-called typographic Americans were digesting, what fraction was relevant? What fraction were actually informative? Were they reading for recreation, or were they reading about issues that mattered to them? Postman ignores this question entirely. Perhaps they were all reading about getting the best soybean crop or reading about politics and vigorously participating in the local city council, but I have no idea what people did on average, and it’s not clear that all the careful analysis amounted to anything. Just how rosy is Postman’s view of the 1800s?

I also wonder about quantitative support for Postman’s argument. As far as I know, books are selling more than ever, and I recall a recent article suggesting that fiction has been getting longer and longer over the years. If this is what we’d expect given Postman’s hypothesis? Or are the books less complex, more padded, despite the increased length? I don’t know how you’d evaluate that, though modern popular nonfiction does often seem content-free.

More broadly, there’s a possible quantitative counter-argument to Postman: in “typographic America”, print was the dominant medium, but it was also small. There wasn’t anything like the volume of publishing we see today, with everything from big publishing houses down to self-published authors and random people with blogs. More people participate in typographic media than ever before. It may be a smaller fraction of overall media consumption than it used to be, drowned in hours of Netflix and HBO, but in absolute terms it is enormously larger than it was in the 1800s. Of course, television and the GIF side of the Internet are crushingly larger, and they dictate popular culture and views. Donald Trump exists because he doesn’t need a coherent argument, just fabulous sound bites. Politics, news, and business are driven by non-typographic media. It doesn’t matter how large print is—it has lost the battle.

On an unrelated note,⁴ I also wonder how Postman would react to the recent emergence of TV dramas with long, complicated plots spread over entire seasons. They are, of course, still intended to amuse instead of inform, but they expect more than complete passivity from their viewers, who are expected to keep track of legions of characters and plot details over weeks or months. I suppose this is the medium shaping the message again: Netflix means a binge-watcher can down an entire season in an afternoon, so instead of four hours of unrelated programs, he can get four or five hours of a single continuous plot. Or, in other words, long story arcs became popular as soon as it was possible to time-shift them with DVRs, Netflix, and DVD boxed sets.

Overall, I enjoyed Amusing Ourselves to Death, and it’s a fascinating subject to think about.⁵ I just wish Postman were around to update the book to the modern era, and that he could marshal more evidence to his positions.

I assume this is why dialogue in old novels always seems so improbable—nobody talks in long convoluted sentences anymore. They probably don’t even know how to properly pronounce punctuation.↩
That’s optimistic. Normally we just see things we use to justify the voting decisions we’ve already made.↩
Perhaps as a reaction to reddit, Hacker News has a strong social taboo against puns and jokes, which are instantly downvoted. Instead, its users avoid serious discussion with middlebrow dismissals and pointless arguments about pointless details.↩
Postman’s most hated feature of modern media is its tendency to say “Now, this”—to drop one subject after thirty seconds of consideration and move to something completely different, tacitly admitting that you won’t pay attention unless they keep you amused with new things. This paragraph hence feels transgressive.↩
He says, in between checking Twitter, Facebook, and Hacker News for updates. Apparently it didn’t stick.↩