lots of tables?

Let me be very clear what I’m asking. This is about tables of means, lots of them, 65 – 100 pages of them, integrated with explanatory text. Most social scientists don’t do this kind of stuff, but I do in my “public sociology” work. I need to generate reports that integrate text with LOTS of tables of means cross-classified by race, offense, geographic location, type of statistic. I need to be able to control what headers get put on columns and I need to have every table be output with enough information to be absolutely sure what it is. I have a lot of better things to do with my time than spending hours merging files and reformatting tables.  I really need to automate this stuff. I use Stata, I love Stata, but I’m about ready to kill either myself or Stata. It is a program for doing umpty-ump zillion flavors of regression and there are some wonderful ado files for reformatting regression results (outreg2 and estout are lovely for regressions, but useless for this problem). I’ve gotten quite good at making pretty graphs from it too.  But tables remain a nightmare. The Stata commands that generate means include tabstat and table with the c() option. Both are pretty restrictive in what they can produce and have the additional badness that they print the tables with spaces instead of tabs so that the tables in the log file cannot be automatically reformatted by a spreadsheet or word processor. I even looked at the smcl output to see if the tables in it were tab delimited (no, they are not) or had maker codes that could be used to insert tabs (not that I could see). You can get the tabs with the “copy table” option, but doing it manually that way for hundreds of tables is horrible, to say the least. An ado named tabout works pretty well when it works, but it is also pretty restrictive in what it can do has has its own internal logic (invisible to the naive user) about what it will put in the column headings.  I’ve done some stuff with file write , but this is very cumbersome and I keep making mistakes — it is very hard to get it right.  I’ve tried all sorts of Google searches to see if someone in the Stata community has written something to deal with this, and I’ve gotten some useful leads, but not enough to actually solve the problem. I see references to collapsing things and then outsheet-ing or putting it in a matrix and then somehow translating the matrix, but I have just gotten my brain all jumbled up and I can’t get things to work right. So, that is the question.  Do you know how or do you know of a tutorial that explains how to do this kind of thing?  Not just vague references to it being possible to do it, but actual examples of the code that does it. Or, is there some other software that does it easily enough that it would be worth my while porting the data to a new program?

For my last effort — done to a deadline — I just worked with the log file. I had to keep everything in Courier or the tables screwed up, and the only thing I could control to get things to fit right on the page was the font size, so I just manually went through the log file putting in page breaks, manually choosing portrait or landscape for particular sections, and changing the font sizes as needed to get the tables to print decently. I had to keep the original ugly variable names because, of course, I could not control the output well enough to edit them. Ugh.  Anyway, if you have some useful pointers, please drop them in the comments.

Author: olderwoman

I'm a sociology professor but not only a sociology professor. I keep my name out of this blog because I don't want my name associated with it in a Google search. Although I never write anything in a public forum like a blog that I'd be ashamed to have associated with my name (and you shouldn't either), it is illegal for me to use my position as a public employee to advance my religious or political views, and the pseudonym helps to preserve the distinction between my public and private identities. The pseudonym also helps to protect the people I may write about in describing public or semi-public events I've been involved with. You can read about my academic work on my academic blog http://www.ssc.wisc.edu/soc/racepoliticsjustice/ --Pam Oliver

21 thoughts on “lots of tables?”

  1. Have you looked into the tabout ado program? I think it can do at least half of what you are hoping for.


  2. As the post mentioned, yes I have. It works well when it works and I’ve solved some problems with it, but is quite restrictive and it is very hard to control its headers properly for some of the tables that I need.


  3. I wish I could help, but I’ve had the same sort of difficulty, and I’m not enough of a Stata jock to know the solution.

    But for the record, R + LaTeX + Sweave works great for doing this kind of thing once you get up the learning curve a bit. (h/t Kieran Healy for putting me onto that combo.) I know this isn’t what you want to hear, but I put it out for the benefit of anyone at the choosing-software stage of their career.


  4. Apologies for not reading carefully through the long paragraph about options you explored. I suggest crossposting this question on Statalist.


  5. Possibly not what you want to hear, but SAS is very nice for this. You have a lot of options to alter what’s in a table, it can spit tables straight to Word or Excel or to data (for more fiddling). In my work I’m occasionally asked to produce zillions of tables, and now it’s always SAS for me. (And Stata for the regressions, etc).


  6. OW – I have had similar problems in the past. I decided that my best option was to use matrices. You can run your descriptives, save the results from the return functions (e.g. r(mean)) and save those to the a matrix. I wrote a program that writes these results to tables that can have unique titles, column headers, row names and group columns and rows. I am more than willing to share, although to call it even a “beta” version would be doing programs everywhere and injustice. It is clunky, but pretty easy to use for automation of things like this. If you would be interested, let me know if there is a way to contact you and I would be happy to send it along.


  7. It sounds like you want -espost-, which is a relatively new part of the -estout- package. It allows you to store output from -tabstat- or -summarize- and then make a table using -estout-. See Ben Jann’s website here: http://repec.org/bocode/e/estout/hlp_estpost.html.

    If you want actual code, you’ll have to give more detail. If you go that route, Statalist will also give you more potential respondents.

    Hope this helps.


  8. correction: -estpost-

    Also, the -matrix- option might be helpful, but it’s hard to say without more detail.


  9. this is probably a lot clumsier than everything else that’s been suggested, but my instinct is to write a regular expression that converts a fixed width table to a tab-delimited table. you could do this with TextPad or N++ but ideally you’d have Stata do it automatically using the “shell” command. also, as a much dumbed down version of the R+LaTeX+Sweave solution, i’ve had some luck getting Stata to stitch everything together using Lyx (aka, LaTeX for dummies).


  10. ps, another way to do it would be to not think of it as a table but a dataset. almost anything you can do with table or tabulate you can also do with contract or collapse and then outsheet.


  11. Thanks for the ideas, folks. I know this is basically do-able, as I’ve done it before, but each production run is different from the previous one and involves new twists not present in the prior instance. And I’m easily confused by things I don’t fully understand (as I guess everyone is.) One of the very depressing things about this business is it seems like a lot of what I do is 10% sociology and 90% formatting — I’ve had this feeling about regular paper-writing, too.

    I’m going to email Mike to ask for a copy of his program. Even if it is rough, it may give me ideas for how to write better do files. I’m also going to look into estpost. I signed up for statalist and looked at their archives, mostly establishing that there isn’t any easy solution and that wanting things to work they way you want them can get you chewed out by other responders (e.g. http://www.stata.com/statalist/archive/2007-12/msg00234.html) , although I learned some stuff in the process. E.g. you cannot put strings into Stata matrices unless you work in Mata. I also looked at bit at SAS tutorials as programming in public agencies is done in SAS and if I want to talk them into producing these reports, I’ll probably have to learn to speak a little SAS. But I’m not going in that direction without assistance.

    I previously thought about and tested Gabriel’s idea, which almost worked and I may go back to it; it was easily fooled by row/column labels with spaces in them (just as Excel’s table parsing can be). My main objection to that was that I’d still have to go back to clean up the table labels.

    FYI I realized from reading that another “trick” is to embed weird characters into the beginning and end of tables so that you can use macros to automate the selection and conversion of text to tables in a word processor. This only works if you are rolling your own, though, not if you are trying to get someone else’s ado to work for you.


  12. Mike3550: You know, if you wouldn’t mind sending that program to me (drek_the_uninteresting@hotmail.com) I would appreciate it. I’m trying to do something fancy with matrices in Stata right now and am having the devil’s own time with it. Your code might help me pass my problem.

    No pressure, though.


  13. > it was easily fooled by row/column labels with spaces in them

    you might be able to solve this by writing the regexp so it treats spaces differently depending on where they stand relative to the pipe character (“|”). a table row in stata follows this format:
    so you’d want a regexp that treats space between the beginning of line as whitespace, otherwise before the pipe as literal, and after the pipe as whitespace.
    in TextWrangler or TextPad the syntax would probably be this
    ^ +(.+) | +([-\d\.]+) +([-\d\.]+) | +([-\d\.]+)$


  14. Don’t have any constructive comments, but OW, I could totally relate to this: “One of the very depressing things about this business is it seems like a lot of what I do is 10% sociology and 90% formatting — I’ve had this feeling about regular paper-writing, too.”


  15. Hi,

    I’ve had some luck with comma delimited tables, though this probably falls into the category of cumbersome solutions with file write. Combined with foreach loops, the code can be much shorter and easier to troubleshoot. Maybe play with something like the following:

    set more off
    ** add a note about what this file does
    cap file close results
    ** create a folder called “out” in your active directory
    file open results using “out/table1.csv”, write replace
    file write results “Means by …” _n
    file write results “Header 1, Header 2…” _n
    for each subpop in A B C {
    for each type in first second third {
    file write results “mean of `type’ for `subgroup,’
    summ `type’ if subgroup == `subgroup’
    file write results _n
    file close results


  16. Update: The file write solution using commas as delimiters (which Epigina suggested)turns out to be the easiest and most flexible for this kind of application, in which the rows/columns vary by subsets of data, although there are some details not in Epigina’s model that you have to watch out for. I posted a long sample of my own successful code over on my own blog in case it is helpful.

    Inside a word processor, you mark the table and use the built in “convert text to table” feature available in the word processor. (Trust, me it is there if you look for it.) This is still not 100% because I cannot get a properly-working macro in Word or WordPerfect: a recorded macro does not work the same when replayed.

    LaTeX appears not to be great for this kind of application, as on line help files indicate that LaTeX will not automatically wrap text in a column unless you pre-specify column width. Spreadsheets are not good targets for this work because each table has a different number of columns and each column has a different width; in any event, Excel for some reason did not recognize the _page code and neither inserts a page break nor permited the code to be search/replaced for a page break.

    Gabriel’s code did not work for me in Textpad, probably because I failed to fully specify what I was trying to do. I’m going to keep and refer to Mike’s program to learn more about Stata matrices (which are still largely a mystery to me), but it’s assumptions about what you are trying to do were not well suited for this particular problem.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.