Author Archives: Kai Chen
Stata command to calculate the area under ROC curve
If we want to evaluate the predictive ability of a logit or probit model, Kim and Skinner (2012, JAE, Measuring securities litigation risk) suggest that A better way of comparing the predictive ability of different models is to use the Receiver
Stata commands to calculate skewness
Suppose we are going to calculate the skewness of 12 monthly returns. The 12 returns may be stored in a row (Figure 1) or in a column (Figure 2). This post discusses how to calculate the skewness in these two
Use Python to download lawsuit data from Stanford Law School’s Securities Class Action Clearinghouse
Several papers borrow the litigation risk model supplied in Equation (3) of Kim and Skinner (2012, JAE, Measuring securities litigation risk). The logit model uses total asset, sales growth, stock return, stock return skewness, stock return standard deviation, and turnover to
Calculate idiosyncratic stock return volatility
I have noted two slightly different definitions of idiosyncratic stock return volatility in: Campbell, J. Y. and Taksler, G. B. (2003), Equity Volatility and Corporate Bond Yields. The Journal of Finance, 58: 2321–2350. doi:10.1046/j.15406261.2003.00607.x Rajgopal, S. and Venkatachalam, M. (2011),
Commonly used Stata commands to deal with potential outliers
In accounting archival research, we often take it for granted that we must do something to deal with potential outliers before we run a regression. The commonly used methods are: truncate, winsorize, studentized residuals, and Cook's distance. I discuss in
Use Python to extract URLs to HTMLformat SEC filings on EDGAR
I wrote two posts to describe how to download TXTformat SEC filings on EDGAR: Use Python to download TXTformat SEC filings on EDGAR (Part I) Use Python to download TXTformat SEC filings on EDGAR (Part II) Although TXTformat files have
Sample code for “outreg” command in Stata
outreg is very powerful and timesaving command in Stata. The following code generates a readyforuse results table:
1 2 3 4 5 
outreg, stats(b p) sdec(3) /// summstat(r2_a \ N) summdec(3,0) summtitles("Adjusted R2" \ "N") /// starlevels(10 5 1) starloc(1) /// ctitles("", "Heading" \ "", "Subheading") /// keep(_cons x1 x2 x3 x4) 
Please change x1, x2, x3 and x4 to variables that you want to report.
Use Python to download data from the DTCC’s Swap Data Repository
I helped my friend to download data from the DTCC's Swap Data Repository. I am not familiar with the data and just use this as a programming practice. This article gives an introduction to the origin of the data: http://www.dtcc.com/news/2013/january/03/swapdatarepositoryrealtime The
Download FR Y9C data from WRDS
WRDS currently populates FR Y9C data quarter by quarter in individual datasets, like BHCF200803, BHCF200806, BHCF200809 and so on. WRDS has not stacked those individual datasets to formulate a single timeseries dataset like COMPUSTAT. There are two ways to overcome
TARStyle Word Template
I create a Word template that complies with The Accounting Review editorial style. My design philosophy is "simple but sufficient". I do not like those templates that are heavy and fancy (e.g., macros everywhere). This is just version 1. It
