11 May 2017, 15:38

Writing Tools

Markdown / Pandoc

Introduction
What is Plain Text?
Why Plain Text?
An Example of the Plain Text Workflow
Parting Thoughts

Reading time: 10 minute(s) @ 200 WPM.

Introduction

The Plain Text Workflow is an alternative to writing with a word processor. Mind you, I said writing, not typesetting or formatting, which is a major part of what word processors do. The idea of the plain text workflow is that you separate the act of writing from that of producing a formatted, typeset final document. You initially capture your words using a plain text editor, perhaps using a lightweight formatting language like Markdown. Then, using freely-available software such as pandoc, you translate your plain text document into whatever file format you need to provide (to a colleague, reviewer, literary agent, journal editor, blog post, email, website, etc.), be it Microsoft Word, LibreOffice, PDF, HTML, or whatever. You might also bring this translated file into your word processor to continue tweaking the formatting. However, your original words are captured in one or more plain text files, which remain the source from which various other document formats flow. With the plain text workflow, you work in plain text, and all of those other document formats are outputs from your plain text source document.

This concept was described in 1999 by Allin Cottrell:

I am suggesting, therefore, that [there] should be two distinct “moments” in the production of a printed text using a computer. First one types one’s text and gets its logical structure right, indicating this structure in the text via simple annotations. This is accomplished using a text editor, a piece of software not to be confused with a word processor . . . Then one “hands over” one’s text to a typesetting program, which in a very short time returns beautifully typeset copy.

Although Cottrell was talking about printed copy here, he goes on to say that his remarks apply to digital documents as well.

Writing in plain text is not new, and dates back at least to 1978 when the TeX computer typesetting system was created by Donald Knuth. What is new is that writers are increasingly turning (or re-turning) to plain text because they are sick of having to deal with complicated, bloated, expensive word processing software, proprietary file formats that constantly change, incompatibilities across word processor versions and computer platforms (including, now, smartphones, tablets, and other mobile devices), collaborators and co-authors who each use a different word processor, et sic porro. Plain text files have been around since the birth of modern computing. HTML, the foundation of the Internet and World Wide Web, is a plain-text markup language. All computer programming source code is written in plain text. And now, as W. Caleb McDaniel has written in Why (and How) I Wrote My Academic Book in Plain Text, it “seems like the ancient past of personal computing is becoming the wave of the future.”

What is Plain Text?

A plain text file differs from a word processor file because plain text contains only the letters, numbers, and symbols that appear on a standard keyboard, known as the ASCII character set. Word processor files, however, contain additional, invisible formatting commands that the software uses to produce fonts, text styles such as boldface and italics, bulleted lists, placeholders for endnotes, footnotes, and bibliographic citations, and so on. Each word processor vendor uses different formatting commands, such that a document file from one vendor’s word processor cannot be used directly in another vendor’s word processor. At the very least, the proprietary file format of word processor A needs to be converted into the proprietary format of word processor B. Often, the files cannot be read at all, or if they can, some of the formatting may be lost in translation.

Plain ASCII text files, on the other hand, are universal and will be readable on any conceivable type of computing device that humanity will produce in the foreseeable future. Although we are not sure at this time how much of the future is foreseeable, the plain text format does alleviate all future and backward compatibility issues. Plain text files are multi-platform: You can edit them on a Mac, iPad, iPhone, Windows PC, Android, Unix/Linux, all without any compatibility worries.

Why Plain Text?

Without plain text files, if you are collaborating on a book with several co-authors and each uses a different word processor (Microsoft Word and Apple Pages and LibreOffice, say), then each author needs to use a different proprietary document format. This can complicate the process of collaboration, because in order for all authors to contribute to the book, each of their word processor file formats would have to be translated, then changes integrated back into the original document, raising the possibility of different versions of the same file getting mixed up, etc., etc. But if all authors are writing in plain text, the file compatibility problem is eliminated because every computer, and all editing software, can read and write plain text. One copy of the plain text document could be kept on a cloud storage medium, such as Dropbox, and every co-author could then edit that same text file directly, using whatever plain text editing software they desired.

And because it is “ubiquitously compatible and futureproof,” plain text is an excellent archival format. In Forget fancy formatting: Why plain text is best, David Sparks writes:

Although modern word processing programs can do some amazing things—adding charts, tables, and images, applying sophisticated formatting—there’s one thing they can’t do: Guarantee that the words I write today will be readable ten years from now.

If you’ve ever dusted off a 3.5-inch floppy disk (or, God forbid, a 5.25-incher), and then dusted off your old external floppy disk drive, fired it up and retrieved that brilliant essay that you wrote in Wordstar back in 1984, you know the problem. That old word processor format is no longer readable, because good old Wordstar has gone extinct. You could try opening the Wordstar file in a plain text editor, but all those proprietary formatting commands have turned your words into gibberish. If Wordstar had used the plain text file format, this problem would not exist.

Another advantage of plain text is that it works well with version control systems (VCS), software that writers can use to archive and retrieve drafts of their writing projects. Long used by software developers to store and track changes in their programming source code, VCSs such as Draft are now available specifically for writers. Benefits of a VCS include tracking every change in a document; tracking writing experiments, while keeping the main file intact; tracking co-authoring and collaboration; and tracking individual contributions. The proprietary, often binary file formats used for word processor files generally do not fare well in a VCS, but plain text files are what VCSs were designed for.

There are distinct advantages to setting down your words in plain text instead of in a word processor. Because plain text editing software is by design fairly minimal and uncomplicated, a writer can concentrate on writing and not be distracted by the myriad of formatting and typesetting options, icons, menus, widgets, gadgets, buttons, and other digitalia that a typical word processor presents. For example, here is a screenshot of Microsoft Word running on a Mac:

And the corresponding screenshot of MacDown, a free plain-text Markdown editor for the Mac:

Any questions?

Another plus is that most computers come with a free editor for plain text, for example Notepad for Windows and TextEdit for the Mac. There are also many third-party free and paid text editors with added features, for Windows, Mac, iOS, and Android.

Furthermore, plain text files can be used with reference management software, such as Zotero, to insert literature citations into the text and to produce formatted bibliographies. While Zotero and other reference managers have toolbar plugins for Microsoft Word, LibreOffice, and OpenOffice, which enable writers to access their references while writing with those word processors, it can also insert plain-text citation markers into plain-text documents. With a couple of additional steps those markers can be converted into formatted literature citations and bibliographies for both footnotes and endnotes. (Of course, users of LaTeX and BibTeX have been doing this for decades, but that’s another blog post.)

In addition to containing links to bibliographic databases, plain text files can include embedded computer code for producing data analyses, simulations, graphics, and other data-based content. For example, using programming statements from R, the open-source statistical and graphics platform, along with a variant of the Markdown language, one can write a single, plain-text document that will produce elegant formatting, an automatically generated table of contents, well-formatted mathematical expressions, tables, crisp figures, and an automatically generated bibliography. (Click here for an example.) With this approach, your plain-text file becomes not only the source of your finished, typeset document, but also maintains a record of the data management and analysis steps taken to arrive at your final result. For more details, see RStudio as a Research and Writing Platform.

By now I hope that I have convinced you that there are numerous advantages to writing in plain text. We now turn to an example.

An Example of the Plain Text Workflow

The following example uses Markdown as the starting point. The name Markdown is a play on the word markup. Probably the best-known markup language is HTML, or hypertext markup language, which is used to “mark up” a plain text file to create Web pages. HTML creates its markup with “tags.” If you look at a list of HTML tags, you will see that there are a lot of them. Markdown, in contrast, is what is known as a lightweight markup language, having a simple syntax that is easy to create using any text editor. Because it has a lightweight syntax, Markdown is easy to read in its raw form, unlike markup languages such as HTML and XML.

Click here to view an example of a Markdown file, and here to view the file rendered as HTML. You could also open the Markdown text in a Markdown-friendly editor, like MacDown, and be able to view the original Markdown and its HTML rendering side-by-side. More information on Markdown and its syntax can be found here.

Markdown’s lightweight syntax allows you to create a plain text source document on any text editor, on any computing device. The aim of the plain text workflow is to provide a simple means of setting down words efficiently. As mentioned earlier, keeping this Markdown file on a cloud storage account such as Dropbox would make it accessible, and editable, from any computing device that was synced to the Dropbox account, including mobile devices.

Using the MacDown editor, we can export this Markdown file to either HTML or PDF. However, the pandoc file conversion utility lets us convert Markdown files to any of the major word processor document formats (as well as a number of other formats). These include HTML, docx, ODT, EPUB, LaTeX, and PDF.

Pandoc is free and runs on all major computer platforms. Installation instructions are here, and the essential instructions can be found here. Greatly more detailed instructions are here, meant for people who are “command-line experts.”

Yes, pandoc is a command-line tool. There is no graphical user interface, no menu system, no point-and-click to pandoc. It’s a blinking cursor, waiting for you to type a command. But do not be afraid. We will work through a simple example, which most of the time will be all that you need.

Once pandoc is installed, you need to open up a terminal window, also called a command prompt or command shell. The exact way you would open up a command-line interface will vary according to your computer platform, but all of them basically look something like this:

To verify that pandoc is installed correctly, type:

pandoc --version

and you should see something that looks like:

pandoc 1.18
Compiled with pandoc-types 1.17.0.4, texmath 0.8.6.6, highlighting-kate 0.6.3
Default user data directory: /Users/rlent/.pandoc
Copyright (C) 2006-2016 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

Let’s try converting our markdown file, which, if you download it, should be renamed to markdown.md, to a Microsoft Word docx file. The conversion command is as follows:

pandoc -s -f markdown -t docx markdown.md -o markdown.docx

The first item typed on the command line is the name of the application, pandoc. The other items are command-line options, and are simply a sequence of options telling pandoc what we want it to do. Thus the -s means we want a “standalone” file, that is, a complete file and not just a piece of a file. The -f means “from,” meaning we want to convert from Markdown (the next item) to (the -t) a docx file (Starting to get it?). Next is the name of the input file (markdown.md), followed by -o for “output,” then the name we want to give to the converted file (markdown.docx).

For all of this to work, you need to be in the folder (directory) of your computer’s filesystem where the input file is located. In the command window you would have to use the cd command to navigate to the proper location. See Getting started with pandoc for more details.

You will also need the picture of the guy with the big fish downloaded into the same folder.

Assuming that everything worked, you should now have a file called markdown.docx, which if opened in Microsoft Word should look like a bona fide Word file, with proper formatting as created in the original, plain-text Markdown file.

What if a colleague didn’t have Word and instead wanted a PDF file? Just type:

pandoc -s -f markdown markdown.md -o markdown.pdf

Pandoc can figure out that you want PDF output by the filename extension specified on the output file.

Parting Thoughts

Other workflows, of course, are possible. You could capture your words in longhand with pen and paper and then type your manuscript into a computer file. You could dictate into a computer or smartphone, and software will automagically convert your spoken words to text. If you write exclusively on one computer, are happy with your word processing software, and you’re getting published, then that system is working and you may not want to mess with it. If a group of co-authors all agree to use the same word processing software, then the collaborative writing project is made easier. (Unless the word processor vendor goes out of business, changes its file format, or releases a new version of the software that no longer works with your favorite reference manager.) You should use whatever writing workflow enables you to initially capture your words, edit them, and then produce a finished product, be it printed on paper or stored in a digital file.

Like many things in computing, the plain text workflow is just one of numerous options. (Remember that Yahoo stands for You Always Have Other Options.) But unlike many of the alternatives, plain text has been around a long time, will continue to be around, and is resistant to change. Using the tools described here, the plain text workflow can give you “the reckless freedom to write anywhere.”

The Plain Text Workflow

Microsoft Word must die.

Introduction

What is Plain Text?

Why Plain Text?

An Example of the Plain Text Workflow

Parting Thoughts

Share!