How to remove duplicate GVKEY-DATADATE when using Compustat Annually (FUNDA) and Quarterly (FUNDQ)?

The annual data is easy to deal with, you just need to add conditions as follows:

indfmt=="INDL" & datafmt=="STD" & popsrc=="D" & consol=="C"

If you have downloaded FUNDA and converted it into Stata format, the uniqueness of GVKEY-DATADATE can be verified by a Stata command:

duplicates report gvkey datadate if indfmt=="INDL" & datafmt=="STD" & popsrc=="D" & consol=="C"

This command will return “no duplicates”.

The quarterly data is a little complicated. However, my test shows that as of December 5, 2107, after applying the same filter as above, 99.84% of GVKEY-DATADATE pairs on FUNDQ are unique. This means no matter how you deal with duplicates, even simply and brutally delete all of them, your results probably will not be impacted in a noticeable way.

Have that said, if you want to do something, WRDS gives this clue:

In the definition of datafqtr, WRDS notes that,

Note: Companies that undergo a fiscal-year change may have multiple records with the same datadate. Compustat delivers those multiple records with the same datadate but each record relates to a different fiscal year-end period.

Rule: Select records from the co_idesind data group where datafqtr is not null, to view as fiscal data.

Unfortunately, I find this tip does not work in my test. Thus, I come up with another way. The root cause of duplicate GVKEY-DATADATEs is firms changing their fiscal year end. For example, if a firm changes its fiscal year end from September 30 to December 31 when releasing its September 30 financial statements, there will be two records for September 30 on FUNDQ—one for fiscal quarter 4 based on the original year end and the other for fiscal quarter 3 based on the new year end). We can elect to keep the GVKEY-DATADATE that represents the fiscal quarter based on the new fiscal year end. This can be done by imposing a new condition fyr=fyrc for duplicate GVEKY-DATADATEs. fyr represents then-current fiscal year end and is in FUNDQ, and fryc represent the most recent fiscal year end and can be extracted from the following dataset:


Please note there may be a minor problem—if a firm changes it fiscal year end more than once, we may lose some GVKEY-DATADATEs completely. But again, this probably does not matter at all.

This entry was posted in Learning Resources. Bookmark the permalink.

2 Responses to How to remove duplicate GVKEY-DATADATE when using Compustat Annually (FUNDA) and Quarterly (FUNDQ)?

  1. YANLEI ZHANG says:

    Hi, Kai, I think it depends on what you need, it could be either calendar quarter or fiscal quarter. I usually try to collect the information for each calendar quarter, in that case, I only keep the recent convention for the duplicated observations.

  2. Thanh says:

    Hi Kai. I found that there is a difference in the number of observations every download from Compustat in the same period. For example, I downloaded 10 variables from Compustat in 1993-2016 with 380,0000 observations. Then I need another variable and download this variable separately from Compustat. However, I found that the number of observations are different in 2 files regardless of the same period. After that, I merge two files. After merging, some observations are not matched in 2 files. Can you help me explain why the number of observations in 2 files are different for the same period in Compustat?

Leave a Reply

Your email address will not be published. Required fields are marked *