Annual reports for US companies
Subject:
Company data:
- Financial statements,
- Company news and information
Keywords:
company filings, annual reports, 10-K, SEC, textual analysis, text mining, business descriptions, risk factors, management discussion and analysis, MD&A
Accessible to:
Aalto users
Attention:
Available via Azure Search endpoint with credentials provided by the author.
Data source type
National registry
License
Academic and non-commercial research purposes only
Analysis unit
Company
Geographical coverage
United States
Time period coverage
2003-2022
Frequency
Annual
Format
JSON
Encoding
UTF8
Description
A 10-K filing is a comprehensive report filed annually by a U.S. publicly-traded company about its financial performance and is required by the U.S. Securities. The 10-K includes qualitative information such as company Business Description (Item 1), description of Risk Factors (Item 1A), Management's Discussion and Analysis of Financial Condition and Results of Operations (Item 7), among other financial information. Academically, 10-K filings and especially textual Items 1, 1A, and 7 have been subject to extensive research (https://doi.org/10.1111/1475-679X.12123).
The original data source is SEC Edgar database. The enhanced data source provided on Data Hub by the author contains 10-K Items 1, 1A, and 7 from January 2003 to April 2022, covering fiscal years 2002-2021, with approximately 6,500 filings per year. Each observation include item textual content, number of words, filing year, CIK stock code, metadata (filing form and item code), and a search score in case the user wants to use full-text search capabilitities of Azure Cognitive Search. Note that CIK stock codes are compatible with Compustat and CRSP databases in WRDS.
The original data source is SEC Edgar database. The enhanced data source provided on Data Hub by the author contains 10-K Items 1, 1A, and 7 from January 2003 to April 2022, covering fiscal years 2002-2021, with approximately 6,500 filings per year. Each observation include item textual content, number of words, filing year, CIK stock code, metadata (filing form and item code), and a search score in case the user wants to use full-text search capabilitities of Azure Cognitive Search. Note that CIK stock codes are compatible with Compustat and CRSP databases in WRDS.
Distributor
Jukka Sihvonen
https://people.aalto.fi/jukka.sihvonen
https://people.aalto.fi/jukka.sihvonen
Cite as
-
Supporting materials
Search tutorials with MATLAB, Python, and R:
https://aaltohub.blob.core.windows.net/$web/filings_blockchain_matlab.html
https://aaltohub.blob.core.windows.net/$web/filings_blockchain_python.html
https://aaltohub.blob.core.windows.net/$web/filings_blockchain_r.html
Introductory presentation:
https://aaltohub.blob.core.windows.net/$web/filings_data_intro.pdf
Search result example in JSON format:
https://aaltohub.blob.core.windows.net/$web/filings_search_result.json
Disclaimer
The data are provided as is or only linked to. The author takes no responsibility for any errors and misinterpretation arising from however the data is used. The author cannot guarantee the accuracy and completeness of the information provided.
Related data sources

Compustat North America provides annual and quarterly fundamentals (accounting statement. Information) for public companies in the US and Canada. Additional Compustat data files include bank fundamentals, historical business segment data and executive compensation (Execucomp).

CRSP (Center for Research in Security Prices)
CRSP provides security price, return, and volume data for the US stock markets dating back to 1925. Additionally, the data files include CRSP/Compustat Merged Database and US mutual fund data.