36-651/751: Software Licensing

– Spring 2019, mini 3 (last updated February 21, 2019) all courses · refsmmat.com

So you’ve developed some statistical software for your research (or your job or for fun). You’d like to publish your software so others can use it, maybe as supplementary material for a paper or maybe just for the sake of open science.

You can send the source code to the journal, post a Zip file on your website, or make your GitHub repository public. That’s easy enough.

But what are people allowed to do with that code? What conditions can you set on its use? What does “open source” mean, anyway?

Trademarks and Patents Aren’t Copyrights

This isn’t strictly relevant, but it’s worth making the distinction.

A trademark is a different thing than a copyright. Trademark refers to some name, brand, logo, or design identifying a product or brand and distinguishing it from others. Their purpose is to identify the origin of a product or service – the business or company or person who controls them – and trademarks exist primarily to protect consumers and inform free markets.

A trademark ensures that if you like a certain company’s products, no other company can sell products under their name without their permission, thus appropriating your trust in the brand and convincing you to buy something you otherwise wouldn’t. Trademarks can be licensed, meaning one company can grant another company permission to use their trademarks (e.g. a movie studio granting permission for Lego to make Lego sets for their movies).

No, you don’t need to use the little ™ or ® symbols any time you refer to a trademarked name. Companies use the symbols basically to give notice that their brands are trademarks; you don’t need to use the symbols when you refer to their names. Don’t crust up your writing with little barnacles of legalese every time you mention PepsiCo®™.

A patent is also different from a copyright. Patents are used to register inventions, and grant you the exclusive right to use and sell that invention for twenty years (roughly). In exchange for that exclusive right, you must file a detailed document (the patent) describing your invention, so the public may benefit from the invention. If you hold a patent you may grant licenses to others to use it (and charge them money for the privilege); after the patent expires, anyone may use the patented idea however they want.

Inventions in the form of software can be patented, but this is an area of some controversy. “Patent troll” companies tried to make money by patenting ordinary things with an extra “but done on a computer”, then suing other companies doing these things on a computer; the Supreme Court later ruled that adding “but done on a computer” to an abstract idea does not make it a valid patentable invention. Software patents are still controversial.

You likely will never need to patent or trademark your work, but you automatically hold copyright in it.

Who Owns Your Intellectual Property?

Questions sometimes arise about who, actually, holds the copyright in your work. If you’re a student at CMU and you write an R package, do you own the copyright, or does CMU own it? If you work for a big tech company and write some code, are you allowed to release it or do you need their permission?

CMU has an Intellectual Property Policy for its employees. Essentially, unless CMU paid specifically for your code, specifically hired you to write it, or provided expensive facilities to help you produce it, it’s yours, subject to a few conditions (i.e. “if you make lots of money from it, we get a cut”).

Certain grants may impose requirements, but usually the conditions are things like “the granting agency shall be allowed to use your code”, not “we own your copyright.”

If you’re hired by a company in the United States to a job that involves writing a lot of code or other copyrightable stuff, your work may qualify as “work made for hire”, e.g. you were hired specifically to create this work as part of your job. In that case, the company owns the copyright, not you, and you have no special rights to the work.

Alternately, you may sign a contract with a company specifying who holds the rights. Companies may have you do this even if your work qualifies as work made for hire, just so the rules are very clear. You can also sign contracts transferring your copyright to others – you might do this when publishing in a journal, for example. You’d have to read the contract to know what rights you’re giving up under what terms.

Software Licenses

Code qualifies as copyrightable work (usually), so since your code is automatically copyrighted, to release it to the public you need to specify what, exactly, you’d like people to be allowed to do with it. This usually means writing a license: a legally binding set of terms and conditions. If people abide by the conditions, they have your permission to do things that copyright law would otherwise forbid.

You can write your own license terms, but you shouldn’t. Large projects with expensive lawyers have already written plenty of licenses with different types of terms, and you should pick the most suitable one.

A great deal of scientific software – and general purpose software – is open source. “Open source” is sometimes interpreted to mean “the source code is visible”, but it means more than that. The Open Source Definition gives clarity: open source software is available in source code form, can be modified freely by its users, can be freely redistributed in original or modified form, and can be used freely in any field and by any person. We make software open source not just so people can see its source code but so they have the freedom to use the software in many ways, with only minimal restrictions.

When it comes to licensing, you have a continuum of choices.

“All rights reserved”
When people write this, they mean “I grant you no permission to anything.” Fair use still applies, but you can’t redistribute the work, translate it, or do anything else outside the bounds of fair use.
“Free for noncommercial use”
People sometimes write vague things like this in the hopes that researchers will use their software and businesses will pay for it. Unfortunately, vagueness and law do not go well together, and this will likely cause you problems.
Permissive (BSD-style) licenses
Permissive open source licenses grant broad permission to reuse the software with only a few conditions. The MIT License, for example, says anyone may “use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software”, provided they include the copyright notice with your name in all copies.
Copyleft licenses

These open source licenses grant broad permissions, like permissive licenses, but also require that any copies or modified versions be distributed under the same terms. Anyone may copy or modify your code, but if they release their version to others, they must grant others the same permission. The GNU General Public License is the most prominent example; Linux, R, Python, and many other prominent projects are licensed under its terms.

Copyleft licenses are intended to preserve the freedom of users to see the source code of the software they use, modify it, and redistribute it, while ensuring the source always stays open and free.

There are a lot of premade licenses. Don’t waste time trying to pick between ISC, Apache, BSD, MIT, LGPL, GPL, CDDL, and a zillion others; use the Choose a License website and move on. Unless your project has particularly odd or special requirements, you’ll be fine with its recommendations.

A notable license is the CRAPL, specifically designed for academic software, including such important terms as “You agree to hold the Author free from shame, embarrassment or ridicule for any hacks, kludges or leaps of faith found within the Program.”

To license your software:

  1. Place text in the README file indicating the license. You can write something like “Copyright 2019, Your Name Here. Released under the terms of the [license] license.”
  2. Add a LICENSE.txt file containing the full text of the software license. Put this in the root directory, along with the README.

Specific licenses may have other recommendations; the GPL recommends placing a licensing notice in a comment at the top of every file, for example.

Licenses for Non-Software

Software licenses are written specifically for software, containing lots of legal languages about source code and compiled code and executables and so on. But you may also want to release other things: documentation, papers, images, your thesis…

Copyright applies to all these works as well, and software licenses aren’t well-suited for them.

(Remember, data itself can’t be copyrighted in the US, unless it involves creativity. Mere statements of fact do not qualify.)

You could say “All rights reserved”, and readers would only have the right to read the copies you give them. But there are again a zillion more permissive licenses you could potentially choose from, or you could write your own. Instead, if you want to grant others permission to reuse and redistribute your work, choose a Creative Commons license. They have a simple license chooser asking a few questions:

  1. Do you want people to be able to distribute modified versions of your work? (They always have to acknowledge you as the original author and explain any changes they made.) You can also choose to require they only distribute modified versions under the same terms.
  2. Do you want to allow commercial use of your work?

Answer those two questions and they direct you to a pre-written license, with instructions on how to use it.

Many people discourage answering “no” to the commercial use question; if you don’t like the idea of other people making money off your work, instead select the option to require others to “share alike”. This way, if a company e.g. takes your work and adapts it into a textbook they sell for $150, they’re required to make that adapted version freely available and redistributable.

I advocate for releasing things under open, non-restrictive terms, including your papers; I recommend Boyle’s The Public Domain to understand why you’d want to do this.

Publishing Papers

Traditional academic journals make their money by selling subscriptions to access papers, so they do not want their papers available under open, non-restrictive terms. Typically, they require authors to sign a copyright transfer agreement granting the publisher the exclusive right to publish the work; authors only retain minimal rights to, say, use the paper in their classes and give copies to friends.

The rise of pre-prints and open access journals has challenged this. Most journals now realize that authors will post their preprints on the arXiv (or the equivalent for their field), and copyright transfer agreements allow this. Some also allow authors to post preprints on their own websites or on university databases, though they may require a delay of several months after publication. Check the SHERPA/RoMEO to find policies for any journal.

When you submit to arXiv, the default license for your papers is a “non-exclusive license to distribute”, which just means you grant arXiv permission to distribute the article, but retain your copyright and can publish the article elsewhere (like a journal).

Open-access journals, on the other hand, do not charge subscriptions, and release their articles under open license – often the Creative Commons Attribution license, allowing articles to be redistributed freely as long as they are credited to the original authors. You can also mark your arXiv preprint with the same license.

(Why does a free license benefit articles? Maybe someone wants to scrape your work and do classification with it, or make a website that automatically suggests new papers to readers based on their prior interests. Maybe you want to extract bibliographic data and use it for something. Maybe you want to make a joke website that presents real paragraphs from papers next to ones generated by Markov chains and challenge readers to tell the difference. An open license permits all these uses, and many others, whereas a traditional journal license does not.)