WHAT TO FIX’s Meta Monetization Archive is the world’s first Monetization Archive prototype. It is a fully searchable compilation of Meta’s own partner-publisher disclosures, backed up since 2019.
Platforms:
Data source: https://www.facebook.com/brand_safety/publisher_lists (about the data)
Github:meta-monetization-archiveWHAT-TO-FIX • Updated Sep 11, 2025
Using the Archive
We can’t wait for you to take a look for yourself at the monetization partnerships Meta has gotten itself into.
Do let us know what you find. We would love to help amplify your findings.
If you publish anything, we would just ask that you attribute the data to the WHAT TO FIX Meta Monetization Archive.
The Meta Monetization Archive is released as is - and is meant as a prototype for what a monetization archive could look like.
At the speed Meta’s monetization programs are growing, we may not be able to continue backstopping transparency for long. We would love your help to advocate for platforms to maintain their own Monetization Archive.
We run a monthly monetization call to explore pathways to improving transparency and accountability.
Get in touch at hello@whattofix.tech if you would like to be involved in the conversation.
You can also sign up for our monthly monetization newsletter.
Get in touch at hello@whattofix.tech if you would like to be involved in the conversation.
You can also sign up for our monthly monetization newsletter.
Data Collection

Note: as of 2025, Meta is the only company releasing such data.
Data Processing
Did we make any changes to the data?
We wish all we had to do was consolidate the files into a database. But the truth is, Meta’s raw data files were far from perfect, and we had to set up validation and correction steps.
We did it all computationally, and kept copies of the raw files, so that it can be independently audited
Data Quality Issues
Inconsistent Releases
Contrary to its stated commitment, Meta does not release lists daily. We also failed to capture files on certain dates.
Solution: We defaulted to using the date of the last known file in lieu of a session’s end_date.
Limitation: Sessions’ end_date may be a few days earlier than the real end_date.
Solution: We defaulted to using the date of the last known file in lieu of a session’s end_date.
Limitation: Sessions’ end_date may be a few days earlier than the real end_date.
Data Standardization
We ran into different versions of the same language (e.g. Bengali/Bangla, Punjabi/Panjabi, Sinhalese/Sinhala) and country.
Solution: We standardized the fields as part of our data processing flow.
Limitation: Those fields may no longer match those from the Meta raw data.
Solution: We standardized the fields as part of our data processing flow.
Limitation: Those fields may no longer match those from the Meta raw data.
Missing Data
We encountered a number of entries which lacked an ID, making them impossible to process.
Solution: We disregarded these entries in the archive.
Limitation: The total number of records on impacted days may no longer match the Meta raw data.
Solution: We disregarded these entries in the archive.
Limitation: The total number of records on impacted days may no longer match the Meta raw data.
Missing Date_Added
Until 2020-10-04, the date_added field was framed as “new in the last 30 days” with the option of a “no” input.
Solution: We applied an algorithm to deduce missing date_added fields based on later data.
Limitation: For a limited number of early pages, which did not monetize for long, we lacked a specific date_added. If the account was marked as active for more than 30 days on 2019-10-25, we defaulted to a date_added of 2019-09-25.
Solution: We applied an algorithm to deduce missing date_added fields based on later data.
Limitation: For a limited number of early pages, which did not monetize for long, we lacked a specific date_added. If the account was marked as active for more than 30 days on 2019-10-25, we defaulted to a date_added of 2019-09-25.
Date_Added Errors
We faced a number of different types of errors with the date_added field, which is meant to reflect the latest date of onboarding. This included date_added predating the existence of the platform/program, date_added in the future, and date_added contradicting the record of inclusion in disclosures files.
Solution: we applied an algorithm to correct the various identified errors.
Limitation: the total number of records on impacted days may no longer match the raw data.
Solution: we applied an algorithm to correct the various identified errors.
Limitation: the total number of records on impacted days may no longer match the raw data.
Ghost Records
We encountered a number of records which, upon inspection, did not appear to be live on the platform at the time of their disclosure. These records shared a consistent pattern, with their account name (publisher), handle (username) and subscribers being empty.
Solution: we disregarded these entries in the archive.
Limitation: the total number of records on impacted days may no longer match the raw data.
Solution: we disregarded these entries in the archive.
Limitation: the total number of records on impacted days may no longer match the raw data.
Got a question or feedback? Notice anything that doesn’t look quite right?
Get in touch at meta-monetization-archive@whattofix.tech.
Monetization.wtf is maintained by WHAT TO FIX, with financial support from Luminate.
©️CC BY-ND 4.0 | Terms of use