Data sources

Annual reports for US companies

Company data:
  • Financial statements,
  • Company news and information
Keywords: company filings, annual reports, 10-K, SEC, textual analysis, text mining, business descriptions, risk factors, management discussion and analysis, MD&A
Accessible to: Aalto users, Aalto Students & Staff
Available via Azure Search endpoint with credentials provided by the author.

Data source type

National registry


Academic and non-commercial research purposes only

Analysis unit


Geographical coverage

United States

Time period coverage









A 10-K filing is a comprehensive report filed annually by a U.S. publicly-traded company about its financial performance and is required by the U.S. Securities. The 10-K includes qualitative information such as

  • Company Business Description (Item 1)
  • Description of Risk Factors (Item 1A)
  • Management's Discussion and Analysis of Financial Condition and Results of Operations (Item 7)
  • Other financial information

Academically, 10-K filings and especially textual Items 1, 1A, and 7 have been subject to extensive research ( 

The original data source is the SEC Edgar database. The enhanced data source provided on Data Hub by the author contains 10-K Items 1, 1A, and 7 from January 2003 to April 2022, covering fiscal years 2002-2021, with approximately 6,500 filings per year. Each observation includes the following:

  • item textual content
  • number of words
  • filing year
  • CIK stock code
  • metadata (filing form and item code)
  • a search score in case the user wants to use full-text search capabilitities of Azure Cognitive Search

Note that CIK stock codes are compatible with Compustat and CRSP databases in WRDS.


Jukka Sihvonen


Jukka Sihvonen

Cite as



The data are provided as is or only linked to. The author takes no responsibility for any errors and misinterpretation arising from however the data is used. The author cannot guarantee the accuracy and completeness of the information provided.

URL copied!

Related data sources

S&P Global in red font, Market Intelligence in black font below it


Compustat North America provides annual and quarterly fundamentals (accounting statement information) for public companies in the US and Canada. Additional Compustat data files include bank fundamentals, historical business segment data and executive compensation (Execucomp).

Company data:
  • Financial statements
WRDS subscriptions
CRSP Center for Research in security prices, LLC, written in dark blue font

CRSP (Center for Research in Security Prices)

CRSP provides security price, return, and volume data for the US stock markets dating back to 1925. Additionally, the data files include CRSP/Compustat Merged Database and US mutual fund data.

Financial markets:
  • Stock markets
WRDS subscriptions