The Meta Publisher Archive is an exclusive resource consolidating Meta’s disclosures of Facebook and Instagram revenue redistribution programs’ membership.
Platforms:
Data source: https://www.facebook.com/brand_safety/publisher_lists (about the data)
Github: https://github.com/WHAT-TO-FIX/meta-publishers-archive
Data Collection
The archive consolidates Meta’s partner-publisher disclosures, backed up by WHAT TO FIX since October 2019.
Note: as of 2025, Meta is the only company releasing such data.
Data Processing
Did we make any changes to the data?
We wish all we had to do was consolidate the files into a database. But the truth is, Meta’s raw data files were far from perfect, and we had to set up validation and correction steps. We did it all computationally, so that it can be independently audited
Data Quality Issues
Inconsistent Release
Contrary to its stated commitment, Meta does not release lists daily. We also failed to capture files on certain dates.
Solution: We defaulted to using the date of the last known file in lieu of a session’s end_date.
Limitation: Sessions’ end_date may be a few days earlier than the real end_date.
Solution: We defaulted to using the date of the last known file in lieu of a session’s end_date.
Limitation: Sessions’ end_date may be a few days earlier than the real end_date.
Data Standardization
We ran into different versions of the same language (e.g. Bengali/Bangla, Punjabi/Panjabi, Sinhalese/Sinhala) and country.
Solution: We standardized the language field as part of our data processing flow.
Limitation: Those fields may no longer match those from the Meta raw data.
Solution: We standardized the language field as part of our data processing flow.
Limitation: Those fields may no longer match those from the Meta raw data.
Missing Data
We encountered a number of entries which lacked an ID, making them impossible to process.
Solution: We deleted entries without an ID from the archive.
Limitation: The total number of records on impacted days may no longer match the Meta raw data.
Solution: We deleted entries without an ID from the archive.
Limitation: The total number of records on impacted days may no longer match the Meta raw data.
Missing Date_Added
Until 2020-10-04, the date_added field was framed as “new in the last 30 days” with the option of a “no” input.
Solution: We applied an algorithm (see) to deduce missing date_added fields based on later data.
Limitation: For a limited number of early pages, which did not monetize for long, we lacked a specific date_added. If the account was marked as active for more than 30 days on 2019-10-25, we defaulted to a date_added of 2019-09-25.
Solution: We applied an algorithm (see) to deduce missing date_added fields based on later data.
Limitation: For a limited number of early pages, which did not monetize for long, we lacked a specific date_added. If the account was marked as active for more than 30 days on 2019-10-25, we defaulted to a date_added of 2019-09-25.
Date_Added Errors
We faced a number of different types of errors with the date_added field, which is meant to reflect the latest date of onboarding. This included date_added predating the existence of the platform/program, date_added in the future, date_added contradicting dates of inclusion in disclosures files.
Solution: We applied an algorithm to correct the various identified errors (see).
Limitation: the total number of records on impacted days may no longer match the raw data.
Solution: We applied an algorithm to correct the various identified errors (see).
Limitation: the total number of records on impacted days may no longer match the raw data.
Ghost Records
We encountered a number of records which, upon inspection, did not appear to be live on the platform at the time of their disclosure. These records shared a consistent pattern, with their account name (publisher), handle (username) and subscribers being empty.
Solution: we removed these entries from the archive.
Limitation: the total number of records on impacted days may no longer match the raw data.
Solution: we removed these entries from the archive.
Limitation: the total number of records on impacted days may no longer match the raw data.
Got a question or feedback? Notice anything that doesn’t look quite right? Get in touch at meta-publisher-archive@whattofix.tech.
Monetization.wtf is maintained by WHAT TO FIX, with financial support from Luminate.
©️CC BY-ND 4.0 | Terms of use