File and Directory Layout for Storing a Scientific Paper in Subversion

In this first article of my collborative paper authoring series I'm proposing a directory layout for storing a single paper. The paper is stored in a subversion repository, thus I assume that you are sufficiently familiar with subversion. A motivation for using subversion and a proposal for storing a whole collection of papers from different authors will be presented in a later installment of this series.

Storing paper sources in a version control system is essential when multiple authors are editing papers concurrently. But storing your papers in a subversion repository makes sense, even if you are the sole author. In my experience, the general benefits of version control, such as archiving and restoring of different versions for all files, documentation of changes and painless synchronization of data between multiple computers, pay off quickly, even in a single user environment.

For this article, I'm assuming that your are writing your paper with LaTeX and that you are using a Unix derived operating system. But you may use the same directory layout also for other document types, e.g., Word, Framemaker or Powerpoint.

File and Directory Structure for each Paper

I'm proposing to use the following directory and file structure for storing the paper and the related resources in the subversion repository. Each paper is assigned a unique identifier (in this example: plessl2007a). This identifier is also used as the name of the directory that contains the paper.

plessl2007a/
   plessl2007a.bib
   plessl2007a.pdf
   trunk/
      Makefile
      plessl2007a.tex
      bib/
         plessl.bib
      data/
         dataset1.txt
      fig/
         fig1.pdf
         fig2.pdf
   tags/
      ...

The paper's top-level directory stores the final version of the paper (plessl2007a.pdf) as well the bibliography information describing the paper (plessl2007a.bib). These files are added to the top-level directory when the final version of the paper is completed.

The latest version of the paper is stored in the "trunk" directory. This directory holds the LaTeX sources of the paper (plessl2007a.tex), a directory containing the figures used in the paper and a Makefile for building the paper from source using LaTeX. Additional subdirectories, e.g., a "data" directory for storing experimental results, can be added according to your needs. All changes to the current version of the paper and the related data are made in the trunk directory. Whenever the user feels that the current state of the paper should be archived, the user commits the changes in "trunk" to the subversion repository.

In addition to "trunk" directory, the top-level directory contains also a "tags" directory that is used to label major revisions of paper. Whenever the paper has passed an important stage of editing (e.g. after it has been submitted for review, a major revision or the final version is completed) a copy of the trunk directory is placed in the tags directory.

A repository with a "submission" and a "trunk" tag looks like this:

plessl2007a/
   plessl2007a.bib
   plessl2007a.pdf
   trunk/
      Makefile
      plessl2007a.tex
      bib/
         plessl.bib
      data/
         dataset1.txt
      fig/
         fig1.pdf
         fig2.pdf
   tags/
      submission/
          Makefile
          plessl2007a.tex
          bib/
             plessl.bib
          data/
             dataset1.txt
          fig/
             fig1.pdf
             fig2.pdf
      final/
          ...

Tags are created with subversion's copy command. E.g., after the final version of the paper has been completed, you may create a tag named "final_version" by copying "trunk" to tags/final_version

    svn copy trunk tags/final_version

It must be noted, that subversion stores copies efficiently in the repository. That is, when copying "trunk" to "tags/final_version" the data will not be duplicated in the repository.

A big advantage of using the same directory layout for all papers, is that many steps in processing the publications can be automated easily, such running LaTeX to build the paper, publishing all papers to a Webserver, etc.

You may have noticed, that each directory that stores a copy of the paper contains a "Makefile" for building the paper and a "bib" directory, that contains the author's bibliographic library in BibTeX format. In the next articles in this series, I will present a generic Makefile for building the paper and I will show you a method for including your bibliographic information (that is stored in another branch of the repository) by reference.

Written January 11th, 2007