How to remove duplicate GVKEY-DATADATE when using Compustat Annual (FUNDA) and Quarterly (FUNDQ)?

The annual data is easy to deal with, you just need to add conditions as follows:

indfmt=="INDL" & datafmt=="STD" & popsrc=="D" & consol=="C"

If you have downloaded FUNDA and converted it into Stata format, the uniqueness of GVKEY-DATADATE can be verified by a Stata command:

duplicates report gvkey datadate if indfmt=="INDL" & datafmt=="STD" & popsrc=="D" & consol=="C"

This command will return “no duplicates”.

The quarterly data is a little complicated. However, my test shows that as of December 5, 2107, after applying the same filter as above, 99.84% of GVKEY-DATADATE pairs on FUNDQ are unique. This means no matter how you deal with duplicates, even simply and brutally delete all of them, your results probably will not be impacted in a noticeable way.

Have that said, if you want to do something, WRDS gives this clue:

In the definition of datafqtr, WRDS notes that,

Note: Companies that undergo a fiscal-year change may have multiple records with the same datadate. Compustat delivers those multiple records with the same datadate but each record relates to a different fiscal year-end period.

Rule: Select records from the co_idesind data group where datafqtr is not null, to view as fiscal data.

Unfortunately, I find this tip does not work in my test. Thus, I come up with another way. The root cause of duplicate GVKEY-DATADATEs is firms changing their fiscal year end. For example, if a firm changes its fiscal year end from September 30 to December 31 when releasing its September 30 financial statements, there will be two records for September 30 on FUNDQ—one for fiscal quarter 4 based on the original year end and the other for fiscal quarter 3 based on the new year end). We can elect to keep the GVKEY-DATADATE that represents the fiscal quarter based on the new fiscal year end. This can be done by imposing a new condition fyr=fyrc for duplicate GVEKY-DATADATEs. fyr represents then-current fiscal year end and is in FUNDQ, and fryc represent the most recent fiscal year end and can be extracted from the following dataset:


Please note there may be a minor problem—if a firm changes it fiscal year end more than once, we may lose some GVKEY-DATADATEs completely. But again, this probably does not matter at all.

This entry was posted in Learning Resources. Bookmark the permalink.

4 Responses to How to remove duplicate GVKEY-DATADATE when using Compustat Annual (FUNDA) and Quarterly (FUNDQ)?

  1. YANLEI ZHANG says:

    Hi, Kai, I think it depends on what you need, it could be either calendar quarter or fiscal quarter. I usually try to collect the information for each calendar quarter, in that case, I only keep the recent convention for the duplicated observations.

  2. Thanh says:

    Hi Kai. I found that there is a difference in the number of observations every download from Compustat in the same period. For example, I downloaded 10 variables from Compustat in 1993-2016 with 380,0000 observations. Then I need another variable and download this variable separately from Compustat. However, I found that the number of observations are different in 2 files regardless of the same period. After that, I merge two files. After merging, some observations are not matched in 2 files. Can you help me explain why the number of observations in 2 files are different for the same period in Compustat?

  3. Vincent Zhang says:

    Hi, Kai. Thanks for sharing! I am wondering whether you notice that there are firms report financial data in Canadian dollars in COMPUSTAT (“curcd” = CAD). However, I barely saw papers mentioned that how to deal with it. What’s your opinion? Should we exclude the firms or exchange them to US dollars?

  4. Sujesh says:

    Hi Kai, the WRDS rule mentioned above mentions datacqtr (calendar quarter) and not datafqtr, if I am not mistaken. If all that you are interested in is exact duplicates, then the above method for quarterly data is fine. But there will still remain an issue of overlapping months, when firms change month of financial year end. In the strict sense, this too could be treated as a duplicate observation. For example, if in the year 2002 a firm changed financial year end from May to June, Compustat will capture both in subsequent rows. The May observation and the June observation will have 2 overlapping months (May and April). To remove all such cases of overlap, you will need to consider the observations where the datacqtr is a null/ blank. Doing so will subsume all cases of exactly duplicate observations as well. But just using the ‘datacqtr is a null’ criteria will not help uniquely identify exact duplicates. This is what I found out based on a subset of data, though I might have missed some additional possibilities.

Leave a Reply

Your email address will not be published. Required fields are marked *