advice you never asked for: volume 1

I often find that people don’t ask me for advice. This is a good thing as I don’t know anything. Still, it means that in order to provide people with advice, I have to give it unsolicited. As you might expect, this works hand-in-glove with blogging which, after all, is a pass time for people who want to offer their opinions to those who haven’t asked for them.

Today I want to briefly touch on a topic that some of my fellow grad students out there might be struggling with: writing loops in Stata.

A loop is a way to make software (in this case Stata) repeat the same behavior or behaviors over and over until it finishes some monotonous task. This is good because humans are not good at doing the same precise thing over and over again without mistakes. I, for example, often accidentally misspell “again” as “agin”, which is still intelligible but makes me sound scottish. In Stata, loops are invoked using the “foreach” or “forvalues” commands, which you can look up in the help menu. A typical foreach command might look something like this:

generate paper=0

foreach year in 1 2 3 4 5 6 7 {
replace paper=paper+`year’ if tenure~=1
replace job=0 if `year’==7 & tenure==0
}

The foreach command defines a macro, “year”, which contains the values 1 through 7. Each time the program goes through the loop,* the next value of “year” is used whenever you see `year’ in a command. So, the first time through the first replace statement would read, “replace paper=paper+1 if tenure~=1” while the second time it would read “replace paper=paper+2 if tenure~=1”.

Now, the tricky thing with loops** is syntax. It may look in some fonts like I’m typing ‘year’ instead of `year’. If you can’t see the difference between the terms I just wrote then, congratulations, this post is for you. See, in a correctly typed usage (i.e. `year’) that first punctuation mark is not an apostrophe. Instead, it’s that funny little mark that usually appears with the tilde. I’ve included a picture for your convenience:

funnysymbol

If you use it properly, whenever you invoke the macro, then you’re fine. If, on the other hand, you use an apostrophe, Stata returns an unhelpful error message that ” ‘ is not a valid name”. So, whenever looping, just remember that ` ~= ‘ and you’ll be fine.

the_more_you_know2

* The first time through it is referred to as "looping" the second time as "loop-de-looping" the third time as "loop-de-loop-de-looping" and so on until you go mad.

** Yeah, right, because there's only ONE tricky thing with loops.

5 thoughts on “advice you never asked for: volume 1”

  1. Drek — unsolicited or not, I wish that you had given this advice three or four years ago. I might be closer to disproving the theorem the loop tests if you had…

    The other tricky thing is “foreach … in … {” and “foreach … of … {” Use the latter if you want to loop through variable lists. I also found this out the hard way, wondering why I would get all kinds of wacky things out of my loops, if they worked at all.

    Like

  2. Since discovering this myself a couple of years ago, i’ve often puzzled about what the difference between a ` and a ‘ is outside of Stata. Obviously, there must be one, since they felt the need to put it on the keyboard in the first place, but it’s one i don’t know the first thing about.

    Like

  3. The one next to the semi-colon is the ASCII apostrophe or “single-quote” (even though it isn’t really a proper quote mark or apostrophe, but never mind). The one next to the 1 key is different . For typographical purposes you can think of it as the grave accent — you use it to create things like è, à, etc. But in the unix shell it’s known as the backtick or backquote. When used there, pairs of backquotes replace the command/expression they enclose with the output of that command/expression. It does similarly meta stuff in other programming languages — such as Stata’s.

    In R, loops are generally inefficient, and using loops generally consigns you to the second or third circles of The R Inferno. On the other hand, the fourth circle is reserved for people who over-vectorize things.

    Like

  4. I’ve gotten really good at writing code using Stata foreach loops and passing local macros to label graphs and create unique file names. I’m not sure this is a good thing. It is possible to generate way more interesting graphs than can be digested and analyzed in any useful way.

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.