cumulative frequency in stata

This would have saved me HOURS of work several months ago.  Where I grew up, the word “sum” means “add things up.” In Stata, it turns out, as a generate function it means what Stata-speakers call a “running sum” — a term I have never heard or used although I do speak about the running total in my bank account — and what I was always taught to call a “cumulative frequency.” The egen sum function I have been using is REALLY named “total”: if you use “sum” in egen, it works as an undocumented alias to total.  I can only guess that this generate function got named “sum” because the obvious abbreviation for a cumulative frequency is an obscene word, but if you look up sum in the Statalist threads, you’ll see that many people assume that the function does what the word means and this mis-naming is a source of endless confusion. In fact, the regular posters (with no apparent sense of irony) call this the most under-utilized function in Stata.  Do you suppose mis-naming it might have something to do with this? Cumsum or, if you are squeamish, runsum, would have been a better name. Even better cross-referencing in the help files would improve documentation. If you search “cumulative” inside Stata, the function does not come up (probably because the word cumulative is never used in its description). The closest you can get is in the second page of the hits, where you’ll get this FAQ: ” How do I tabulate cumulative frequencies?” and a link to:

Author: olderwoman

I'm a sociology professor but not only a sociology professor. I keep my name out of this blog because I don't want my name associated with it in a Google search. Although I never write anything in a public forum like a blog that I'd be ashamed to have associated with my name (and you shouldn't either), it is illegal for me to use my position as a public employee to advance my religious or political views, and the pseudonym helps to preserve the distinction between my public and private identities. The pseudonym also helps to protect the people I may write about in describing public or semi-public events I've been involved with. You can read about my academic work on my academic blog --Pam Oliver

2 thoughts on “cumulative frequency in stata”

  1. Ugh, having two functions that take the same input and use the same name but do different things depending on whether you type “gen” or “egen” is a clear software design mistake. In general, the existence of “egen” may be the single worst design decision within Stata; egen’s functionality always should have been invoked just by using gen.


  2. Sympathies here. As you say, you’re far from the first person to be confused by this. There’s still a question of whether the documentation was ever confusing or ambiguous on this point.

    Mata now uses `runningsum()` for what you will guess from the name.

    But using `sum()` to mean cumulative sum (not cumulative frequency, because `sum()` cumulatively adds whatever is fed to it, which need not be a frequency) — except that under `egen` it means total — was indeed a bad choice by the company.

    In Stata 9 this was fixed to the extent that was easy: the `egen` function `sum()` went undocumented. It still works but as a synonym for `total()`. There is a good case that the mainstream function `sum()` should go undocumented in favour of `runningsum()` — but then there would be a persistent sequence of posts from those puzzled by mentions of `sum()` in code in textbooks, teaching material, etc.

    Pleased you found the cited FAQ helpful, as that was the author’s intent.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.