stata: roll your own palettes

I realize all the cool kids have switched to R, but if you still work with Stata, you may be interested in some routines I worked up to generate color and line pattern palettes and customize graphs fairly easily with macros and loops. This is useful to me because I am generating line graphs showing the trends for 17 different offense groups. Some preliminary tricks, then the code.

Trick 1 that I have learned is to generate self-labeling lines by creating a variable that has the label only in the last value of the x-axis variable, year in my case. E.g. gen xvalue15=Label if xvalue==15. Or self-labeling scatterplots by having a label for all values.

Trick 2 is to use Stata macros to generate the lines of a plot. The general scheme is:

local plotlist ""
foreach val in `list of values' {
    local plotlist "`plotlist' (code_for_one_line )"
twoway `plotlist', 

In this code, each line gets added to the macro plotlist. Pro tip: remember to reset the plot macro to " " (empty) (or use a new macro name each time) or you will get unpleasant results with repeated graphs.

Color Swatch Generator

Although Stata can generate colors using any set of RGB values, for a variety of reasons* I found it easiest to work with the built-in named colors. Named colors can be modified with the syntax "color*##. Numbers less than 1 lighten the color and numbers greater than 1 darken the color. The ado file full_palette  generates a swatch of the 66 named colors in Stata, with their RGB values (you can access this by typing help full_palette and installing the ado), and the built-in ado palette color  will show color samples and the RGB values for two colors (type help palette color to see the syntax of the command). But I wanted to see ranges of colors using the intensity values across several different named colors.**


stata 14.2
local colorlist "orange orange_red red ebblue eltblue purple"
local intenlist ".5 .75 1 1.25 1.5 1.75 2"
local ncolor=wordcount("`colorlist'")
local ninten=wordcount("`intenlist'")
local ncases=`ncolor'*`ninten'
disp "ncolor `ncolor' ninten `ninten' ncases `ncases'"
set more off
set obs `ncases'
gen case=_n
gen ncases=_N
gen color=""
gen intenS=""
gen colorname=""
** fill in the strings with colors and intensities
local ii=1
forval color= 1/`ncolor' {
forval inten= 1/`ninten' {
     replace color=word("`colorlist'",`color') if case==`ii'
     replace intenS=word("`intenlist'",`inten') if case==`ii'
     replace colorname=color+"*"+intenS
     local ii=`ii'+1
*** the num variables are sequential
encode color, gen(colornum)
encode intenS, gen(intennum)
encode colorname, gen(col_int_num)
gen inten=real(intenS) // this is the actual numeric value of intensity

local plot ""
summ col_int_num
local nplots=r(max)
forval point=1/`nplots' {
    qui summ col_int_num if col_int_num==`point'
    local labelnum=r(mean)
    local colorname: label col_int_num `labelnum'
    qui summ colornum if col_int_num==`point'
    local colnum=r(mean)
    local color: label colornum `colnum'
    qui summ intennum if col_int_num==`point'
    local intnum=r(mean)
    local inten: label intennum `intnum'
    local plot "`plot' (scatter inten colornum if col_int_num==`point', mcolor(`colorname') msize(huge) mlab(colorname) mlabc(`colorname') mlabsize(tiny) mlabpos(6))"
*disp "`plot'" 
local xmax=`ncolor'+1 
twoway `plot' , legend(off) ylab(.25 (.25) 2) xlab(0 (1) `xmax', val) xtitle(color) ytitle(intensity)
graph export sample_color_swatch.png, replace

Color Line Generator


My application has too many values to use just color (or so I judged) so I also used line type. Thus the code to generate sample lines.

stata 14.2
* insert colors, intensities, patterns in the lists as desired

local colorlist "orange_red ebblue"
local intenlist ".5  1 1.75 "
local lplist "solid dash shortdash"
local ncolor=wordcount("`colorlist'")
local ninten=wordcount("`intenlist'")
local nlp = wordcount("`lplist'")
local ncases=`ncolor'*`ninten'*`nlp'
set obs `ncases'
gen case=_n
gen Ncases=_N
gen hue=""
gen inten=""
gen linepat=""
set more off
set scheme s1color  // white background
*** fill in the color values, text variables
local xx=1
forval col=1/`ncolor' {
     forval int=1/`ninten' {
       forval lpat=1/`nlp' { 
          replace hue=word("`colorlist'", `col') if case==`xx' 
          replace inten=word("`intenlist'", `int') if case==`xx'
          replace linepat=word("`lplist'", `lpat') if case==`xx' 
       local xx=`xx'+1 
** CREATE 16 values for the X axis ****** 
Duplicate observations
expand 2, gen(copy1)
expand 2, gen(copy2)
expand 2, gen(copy3)
expand 2, gen(copy4)
gen xvalue=copy1 + 2*copy2 + 4*copy3 + 8*copy4

* generate text from other text
gen color=hue+"*"+inten
gen definition=hue+"*"+inten+" "+linepat
gen def15=definition if xvalue==15
* create numeric variables with the strings as values
encode color, gen(colornum)
encode linepat, gen(lpnum)
qui sum colornum
local ncol=r(max)
forval colnum=1/`ncol' { 
    local col`colnum' = `colnum' 
forval lpnum=1/`nlp' { 
     local lp`lpnum'=`lpnum' 

local plotlist ""
disp "ncases `ncases'"
forval case=1/`ncases' { 

    qui summ colornum if case==`case' 
    local cn=r(mean) 
    local color: label colornum `cn' 

    qui summ lpnum if case==`case' 
    local ln=r(mean) 
    local lpat: label lpnum `ln' 

    local plotlist "`plotlist' (connected case xvalue if case==`case', msym(i) mlab(def15) lc(`color') mlabc(`color'') lp(`lpat'))" 
twoway `plotlist', legend(off) xlab(0 (2) 22)
graph export color_lines_sample.png, replace

Offense line palette

This is the problem that started me on this path. I have 17 offenses for which I want to graph imprisonment over  time. Letting Stata choose the colors generates an unreadable hash. And brewscheme won't help because I want to assign particular markers/colors to particular offenses, not create a general order of colors. After working on this problem a while, I realized the graph could be more meaningful if similar offenses had related colors. Generating a variable-specific palette is easy using the skills developed above.


Step 1: Create a spreadsheet with the variable names and labels plus columns for variable groups, color name (hue), intensity, line type, and the order in which I wanted the graphs to appear in my sample. This last is to put the colors that might be difficult to distinguish next to each other in the sample. In my spreadsheet, I put different possible color schemes in different tabs. Here is one sample.

OffLab offdetail group hue intensity line order
Drugs 12 drugdwi navy 2 solid 10
DWI 20 drugdwi navy 2 dash 11
Escape_etc 21 misc ebblue 0.5 solid 16
Family 22 misc ebblue 0.5 shortdash 17
Larceny 8 property ebblue 1.5 dash 12
MVTheft 9 property ebblue 1.5 solid 13
Fraud 10 property ebblue 1 shortdash 14
OthProp 11 property ebblue 1 solid 15
Robbery 4 robbur purple 1 solid 9
Burglary 7 robbur purple 1 dash 8
Murder 1 violent orange_red 1.75 solid 7
NegMansl 2 violent orange_red 1.75 shortdash 6
Rape 3 violent orange_red 1.75 dash 5
Assault 5 violent orange_red 1 dash 4
OthViolent 6 violent orange_red 1 solid 3
Weapon 23 violent orange_red 0.5 solid 2
PubOrd 13 violent orange_red 0.5 dash 1

The do file reads the spreadsheet (with a local parameter that selects the tab) and generates a sample plot.

stata 14.2
local group set1
import excel "offense_colors_lines.xlsx", sheet("`group'") firstrow allstring clear
gen color=hue+"*"+intensity
encode color, gen(colornum)
encode line, gen(linenum)
destring offdetail, replace
destring order, replace

** I save this as a Stata file so I can merge it into the data file for production runs

save "offense_lines_2017-6-1`group'.dta", replace

levelsof offdetail, local(offlist) clean
foreach off in `offlist' {
    qui summ colornum if offdetail==`off'
    local cnum=r(mean)
    local col`off': label colornum `cnum'
    qui summ linenum if offdetail==`off'
    local lnum=r(mean)
    local line`off': label linenum `lnum'

expand 2, gen(copy1)
expand 2, gen(copy2)
expand 2, gen(copy3)
expand 2, gen(copy4)
gen xvalue=copy1 + 2*copy2 + 4* copy3 + 8*copy4
gen OffLab15=OffLab if xvalue==15

local plotlist ""
forval xx=1/17 {
   qui summ offdetail if order==`xx'
   local off=r(mean)
   local plotlist "`plotlist' (connected order xvalue if offdetail==`off', ml(OffLab15) ms(i) lc(`col`off'') mlabc(`col`off'') lp(`line`off''))"
disp "`plotlist'" 
twoway `plotlist', legend(off) xlab(0 (3) 20)
graph export "offense_lines_2017-6-1`group'.png", replace

Using this scheme in my production graphs involves this code:

use [data file]

merge m:1 offdetail using offense_lines_2017-6-1set1.dta

levelsof offdetail, local(offlist) clean
foreach off in `offlist' {
     qui summ colornum if offdetail==`off'
     local cnum=r(mean)
     local col`off': label colornum `cnum'
     qui summ linenum if offdetail==`off'
     local lnum=r(mean)
     local line`off': label linenum `lnum'

These local macros can then be used in the production graphs with the same code logic as was used to generate the samples.


* I originally tried to use the RGB values from specific palettes I found on line, but passing RGB values in a macro the way I do with my offense colors did not work. I think the problem is a subtle Stata bug/behavior about parsing quotes within quotes within quotes in macros referring to macros and/or the parsing of a list of numbers separated only by spaces. When I used the most straightforward syntax, Stata eliminated the spaces between the numbers (a very odd behavior!), and when I added the Stata special double quotes `" and "' , that problem was solved but the resulting code generated an error. However, if you use ado files you can find on line to create and save new colors with names, those new colors should work fine with this routine.

You create a new color by creating a file named in your personal ado path (I put it in a style folder that had previously been created but anywhere works); the content of this file must be

set rgb "255 255 255"

where you replace the 255's with the RGB codes for the color you want to name. If you examine the files in your system files (which you can find by typing "findfile" in a Stata session  and reading the resulting path) you will see that you can also include comments labels and other commands that don't get in the way of this core command, but this is the one you need.

** I spent some time studying the code for the ado files palette.ado and full_palette.ado trying to figure out how the RGB values were generated  from the color and intensity values so I could put them in my palette as well, but finally gave up. Both ado files read the RGB code for the base color from the color .style file, but I could not find the code in palette.ado that computes the derived RGB when there is an intensity factor. It must not look the way I'm expecting it to look.

By experimentation with putting values into palette color, I learned that an intensity greater than 1 consistently divides the RGB values by that number (e.g. ebblue is RGB 0 139 188 and ebblue*2 is 0 70 94). Lower RGB values are darker with black being 0 0 0). An intensity less than 1 increases the values of all three RGB values and pulls it toward white, which has RGB 255 255 255. So for example, red is 255 0 0 , red*5 is 255 128 128, red*.2 is 255 204 204, ebblue is 0 139 188, ebblue*.5 is 128 197 222, teal is 110 142 132, teal*.5 is 183 199 194, teal*.2 is 226 232 230. If the color is pure and fully saturated, the intensity factor adds (1-int)*255 to the other colors. I am sure I could empirically work out the formula for intensities less than 1 for the more complex cases if I spend more time on it, but it is not immediately obvious to me.  If you know the formula and put it in the comments, I would be grateful. I'm not sure it matters except to my curiosity. EDIT:  The correct general formula for intensity<1 is:  orig_RGBnum + (1-intensity)(255-orig_RGBnum) for each of the three original RGB numbers. I still have not found the actual code that implements these formulas in the palette.ado file.

Author: olderwoman

I'm a sociology professor but not only a sociology professor. It isn't hard to figure out my real name if you want to, but I keep it out of this blog because I don't want my name associated with it in a Google search. Although I never write anything in a public forum like a blog that I'd be ashamed to have associated with my name (and you shouldn't either!), it is illegal for me to use my position as a public employee to advance my religious or political views, and the pseudonym helps to preserve the distinction between my public and private identities. The pseudonym also helps to protect the people I may write about in describing public or semi-public events I've been involved with.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s