cumulative frequency in stata

This would have saved me HOURS of work several months ago.  Where I grew up, the word “sum” means “add things up.” In Stata, it turns out, as a generate function it means what Stata-speakers call a “running sum” — a term I have never heard or used although I do speak about the running total in my bank account — and what I was always taught to call a “cumulative frequency.” The egen sum function I have been using is REALLY named “total”: if you use “sum” in egen, it works as an undocumented alias to total.  I can only guess that this generate function got named “sum” because the obvious abbreviation for a cumulative frequency is an obscene word, but if you look up sum in the Statalist threads, you’ll see that many people assume that the function does what the word means and this mis-naming is a source of endless confusion. In fact, the regular posters (with no apparent sense of irony) call this the most under-utilized function in Stata.  Do you suppose mis-naming it might have something to do with this? Cumsum or, if you are squeamish, runsum, would have been a better name. Even better cross-referencing in the help files would improve documentation. If you search “cumulative” inside Stata, the function does not come up (probably because the word cumulative is never used in its description). The closest you can get is in the second page of the hits, where you’ll get this FAQ: ” How do I tabulate cumulative frequencies?” and a link to:

Author: olderwoman

I'm a sociology professor but not only a sociology professor. I keep my name out of this blog because I don't want my name associated with it in a Google search. Although I never write anything in a public forum like a blog that I'd be ashamed to have associated with my name (and you shouldn't either), it is illegal for me to use my position as a public employee to advance my religious or political views, and the pseudonym helps to preserve the distinction between my public and private identities. The pseudonym also helps to protect the people I may write about in describing public or semi-public events I've been involved with. You can read about my academic work on my academic blog --Pam Oliver

1 thought on “cumulative frequency in stata”

  1. Ugh, having two functions that take the same input and use the same name but do different things depending on whether you type “gen” or “egen” is a clear software design mistake. In general, the existence of “egen” may be the single worst design decision within Stata; egen’s functionality always should have been invoked just by using gen.


