I often want to standardize my variables before using them in models. Especially if a variable either is heavily right-skewed (then I like to log standardize the variable; e.g. number of protesters) or the coefficient of variable is uninterpretable anyways (then I tend to follow Gelman’s (2008) approach; e.g. age and age^2 which results in very small coefficients in a regression). So far I generated such standardizations myself, e.g.:
However, since I tend to be lazy and try to minimize the amount of syntax I produce, I programmed a small program called standard2 which creates three new variables for each variable specified after the command “standard2“:
standard2 variable1 variable2
It will then create three new variables for each variable specified, e.g. for variable1: std2_variable1 (variable1 standardized by 2 std. dev.); mc_variable1 (variable1 mean centered); log_variable1=log(variable1+1)).
Note: Do not “blindly” use the program. Familiarize yourself with the pros and cons of standardization in general and about which approach might be most suitable in your case.
Gelman, Andrew. 2008. “Scaling Regression Inputs by Dividing by Two Standard Deviations.” Statistics in Medicine 27(15): 2865–73.
King, Gary. 1986. “How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science.” American Journal of Political Science 30(3): 666–87.