ask a scatterbrain: managing workflow.

As I have admitted before, I am a terrible electronic file-keeper. If I was to count up the minutes I have wasted in the last 15 years searching for files that should have been easy to find or typing and retyping Stata code that would have (and should have) been a simple do-file or doing web searches for things that I read that I thought I wanted to include in lectures or powerpoints or articles but couldn’t place, I fear I would discover many months of my life wasted as a result of my organizational ineptitude.

For a long while, these bad habits only affected me (and the occasional collaborator). It was my wasted time and effort. Now, though, expectations are changing and this type of disorganization can make or break a career. I think about my dissertation data and related files, strewn about floppy disks and disparate folders, and I feel both shame and fear. What would I do if someone asked for all my work? Would I be able to produce what they would need, to explain how things are coded, to share the program I ran in the lab? If it weren’t for all the detail that I actually include in my dissertation – significantly more detail than I normally include in my work given page and time constraints – could I replicate what I had done? More importantly, could someone else?

When I’ve occasionally asked others over the years if they’ll share notes or data and they have been unable to because of changing technology or office moves or some other excuse, I haven’t batted an eye. That was then; this is now. Given recent controversies (and gross errors) in academia, and a related shift toward more transparency in research and data-sharing in the sciences, effectively managing workflow (and making more than the final product available to others) has transformed from a useful skill to a fundamental job requirement for both qualitative and quantitative scholars.

I am trying to do better in my own life and I hope to get to my students before they have time to acquire bad habits. To help students learn to manage workflow effectively, I would like to incorporate relevant techniques into our graduate training. Scott Long’s book, The Workflow of Data Analysis Using Stata, recommended by a colleague and mentioned previously here, is an excellent place to start. Long covers planning, organizing, and documenting – from naming files to writing research logs – in addition to how to write do-files and automate much of your work. With his insight, files are clean and orderly and ready to share. I highly recommend this as a place for anyone interested in changing their ways – or learning the trade – to start and I wish I had the time to offer a course like he does or to incorporate the entire book into our Proseminar or Stats sequence. In the meantime, and as a complement, I am on the hunt for other resources, helpful tips, and ideas about how best to weave these topics into graduate education. Do readers have examples of where they’ve learned these habits, or where they wish they had? Best practices for qualitative researchers, to complement the approaches geared toward quantitative methods? Alternatives to full-fledged courses as way to teach these things (e.g., summer reading groups)?

10 thoughts on “ask a scatterbrain: managing workflow.”

  1. There’s a free Coursera course on reproducible research offered by the John Hopkins Data Science team. I haven’t taken that one yet, but their other courses are very high quality and quite useful (if you’re into R, and reasonably savvy to start with). The workload tends to be on the order of 3-5 hours per week for four weeks, or you can do it all in a couple days.


    1. Thanks, Dan. This looks like it could be helpful for students, especially those using R. I wonder how much insight on the streamlining of work pre-submission/sharing it provides.

      I would love to find something that is helpful to qualitative folks. Do you have a model that you’ve used to organize your research and collaboration on qualitative projects?

      Liked by 1 person

      1. I have a … system for qualitative/historical, collaborative work, but it’s kludgy. It mostly involves doing everything in a shared Dropbox folder, and being careful to check out key documents (draft text) so that two people don’t edit the same document at the same time. I like long, informative file names (usually including a short date). For co-authors comfortable with LaTeX, ShareLaTeX is a nice collaborative editing site, but most of the time it’s overkill for work that involves minimal formatting (the typical historical journal article with few tables, no equations, etc.).

        I have some slightly more refined thoughts on archival document storage – basically, create a file structure that mirrors the archive itself, with a folder for every folder from the archive. So, if you’re working at the UM Bentley archive, in the Admissions Lawsuit Collection, Box 10, Folder “Motions”, your file structure would be Bentley/Admissions Lawsuits/Box 10/Motions. Then in that innermost folder, you have one or more pdf files with the images captured from that folder. I group files together using Acrobat Professional and run OCR on them (which also straightens images and reduces file size, usually helpful). While I’m gathering the data, I keep a running list of every document I photograph, and I keep this as a text file in the top-level of the archive (i.e. under “Bentley” I’d have a word file, sometimes one per day or visit, with a list of everything I grabbed). This system is pretty robust – you have two records of where everything came from – and intuitive, and works fine for collaboration (just stick the top-level folder into an “archival data” folder in your shared Dropbox and you’re off to the races).

        Liked by 3 people

  2. Liking this discussion of qualitative research. I can’t “like” too much all the suggestions about orderly storage of data. As someone who has been around long enough to have obsolete boxes of 4×6 index cards with reading notes and file cabinets of photocopies, I’ll remind people that today’s best technology can be the future equivalent of an unreadable magnetic tape or floppy disk 20 years from now. And a system that works well for your dissertation can get out of control as the years and projects accumulate. Trying to plan ahead for expansion of your system and export or rolling over to as-yet-unknown new systems adds wrinkles.

    I also wish I had a better way to organize the oddments I accumulate as potentially relevant to teaching, or the reading notes that turned out to be not useful in one year’s project but would like more readily to locate in the future. Right now I’m kind of limping along with a pretty good EndNote structure for academic articles that I like as well as the excellent suggestion from a previous round of discussion to store all PDFs in one file using author date journal title as the file name, coupled with Evernote for storing web links and emails, but my system is not well cross-indexed (I mostly have to rely on keyword searches) and I am still vulnerable to technological evolution.

    The natural sciences have a much more well-developed culture of using lab notebooks and the technological equivalents. I wish I had been taught that way years ago and had been teaching that to my students.

    Liked by 1 person

    1. Thanks, OW.

      To clarify, you save all your .pdfs (regardless of if they’re for courses or papers or just because) in the same file folder, simply organizing them by author_date_journal label?


      1. I do the same thing (only substituting a brief description for journal title), and then link to the pdf from Endnote. And then I just use that shorthand (Collett 2011 Simmel) when writing notes to self, drafts, etc., elsewhere.


      2. The protocol is to have all PDFs in one folder. This really is easier, it turns out, than scattering them across a bunch of other folders tied to specific papers and also easier that naming the file by what interests you about it. The PDF folder becomes your library and it is sorted in the most logical way, by citation. You use EndNote or your other bibliography manager to handle the keywords and notes and other stuff that would let you know you want the article. In practice, I actually have two big folders for the two main types of work I do, that is a historical artifact, but even that gets cumbersome because there are overlaps in the projects and I sometimes have to look in two places for an article or toggle back and forth about where I am saving things. One folder really is best. My don’t add the PDFs to my EndNote file itself, I use the option to link to the file on my computer, which means the link will be bad if I’m accessing the EndNote folder from another computer, but that is acceptable to me because I have over 5000 references and having the PDFs in the EndNote file itself was making it way too big to copy. The latest version of EndNote with cloud synchronization should make that less of a problem but I’m still using my old file storage system. But having the PDFs in their own storage means that I could switch technologies. If they are stored as part of the EndNote file structure, it would be more of a hassle if I want to switch software.


  3. Jessica, I started writing out my data organization for workflow a couple of years ago on my blog. I was planning to use it with graduate students, but haven’t mentored any that would need this kind of data management system (we don’t have PhD students in my department). I was thinking of updating it since I have started working with a few people who have asked me about how I manage workflow. This might give me encouragement to actually do it.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: