Skip to content
AI4Good Foundation
Blog

AI4Good announces a research data partnership with Canaria

Canaria is contributing in-kind access to its research-grade deduplicated job-postings corpus to power our Career Navigator retrieval layer, our Skills Translator evaluation set, and our public research briefs. An institution-to-institution collaboration.

AI4Good Foundation, Programs Team6 min read
PartnershipsResearch Infrastructure

We are announcing a research data partnership between AI4Good Foundation and Canaria, a US-based labor-market data company. Canaria is contributing in-kind access to its research-grade deduplicated job-postings corpus to power three pieces of our public-interest work: the retrieval layer behind our Career Navigator copilot, the evaluation set for our Skills Translator tool, and the public research briefs we will publish on workforce trends in underrepresented communities. This is an institution-to-institution collaboration, with no payment in either direction and no exchange of learner data.

Why labor-market data is the bottleneck

Career-navigation tools live or die on the quality of the labor-market signal underneath them. A copilot that confidently recommends a credential pathway for a job category that no longer hires in the learner's region is worse than no copilot at all, because it costs the learner trust and time. The same is true for skills-translation tools: the value of mapping a resume to a US employer-readable profile depends on whether the target taxonomy reflects what employers are actually hiring for today.

Until recently, the only ways for a small nonprofit to get research-grade labor-market data were to pay enterprise license fees that ran into six figures per year, scrape public job boards directly (with the legal, ethical, and quality issues that entails), or rely on government statistics that, while excellent, are published on a lag of months to quarters. None of those paths are practical for the institutions we serve.

What Canaria contributes

Under this partnership, AI4Good receives research-only access to Canaria's deduplicated job-postings corpus. The relevant properties of that corpus, drawn from Canaria's published documentation, are:

  • Over 900 million unique job postings after semantic deduplication, drawn from sources that include large job boards (such as Indeed and LinkedIn) and more than 200,000 employer applicant-tracking career portals on platforms such as Greenhouse, Lever, and Workday.
  • 82 enriched fields per posting, including normalized job title, standardized occupation classification, salary intelligence, extracted skills, work mode, employment type, and seniority.
  • A skills taxonomy of over 37,000 entries that goes well beyond technical skills to cover certifications and soft skills.
  • Standard Occupational Classification (SOC) labels that use job title plus job description context, not title alone, which is important for occupational categories where titles are ambiguous.
  • A salary prediction model with reported mean absolute percentage error below 15 percent, trained on tens of millions of paired observations.
  • US-focused coverage from 2022 to the present, refreshed daily.

For background on Canaria's methodology and data supply chain, see decanaria.com.

How AI4Good will use it

Three program workstreams will draw on this data, and each one respects the boundaries of a research-only license.

Career Navigator retrieval layer.When a learner and an advisor work through next-step options in our copilot, the tool retrieves real, current job openings in the learner's region that match the target occupation, together with the prevailing skill requirements and salary range. Without a deduplicated corpus, the same posting echoed across a dozen aggregators would distort the picture; without an enriched corpus, the tool could not reason about which credentials would actually close the gap.

Skills Translator evaluation set.Our skills-translation tool takes a resume in any language and produces a US-employer-readable profile aligned to government taxonomies. Canaria's enriched postings give us a held-out evaluation set we can use to ask the right question: do the skills we extract from a learner's history actually correspond to the skills employers are listing in current postings for the target role?

Public research briefs. The first brief we are scoping with this data is a structural look at the education-to-employment pathway for internationally trained professionals in three large US metros. With a daily-refreshed, deduplicated corpus we can ground claims in current signal rather than the lagged averages that workforce briefs typically rely on.

The terms, in plain language

Honest framing matters, so the boundaries of this partnership are worth stating plainly.

  • The contribution is in-kind. AI4Good pays nothing for the data.
  • The data flow is one-directional. AI4Good receives research-only access to the postings corpus. No learner data, advisor conversation, evaluation result, or any other AI4Good information flows back to Canaria.
  • Use is limited to AI4Good's nonprofit research and program activities, not redistribution.
  • AI4Good will cite Canaria as the data source in published research, in the same way researchers cite any data provider.
  • Either organization may end the partnership at any time, and the arrangement carries no exclusivity in either direction.

What this unlocks

AI4Good is a launch-stage nonprofit, and we are not in a position to build a labor-market data backbone from scratch. This partnership means we do not have to. It lets us focus on the work where we add the most value: shipping open-source AI tools that the institutions serving underrepresented learners can actually deploy, and publishing the kind of practitioner-grade research that helps the broader workforce field decide where to invest. We are grateful for the contribution, and we look forward to publishing the first outputs that draw on it later this year.

Continue reading

More from AI4Good Foundation

Browse all posts

Community colleges, workforce boards, refugee resettlement agencies, Title I schools, and minority-serving institutions are invited to apply for a free six-month pilot of one of our three programs. Third-party evaluation is included.

6 min read

Per-seat enterprise AI pricing locks out community colleges, workforce boards, and community-based organizations. Open-weight foundation models change the math. Here is the technical case for building public-interest AI on Llama and other community models.

7 min read
For institutions

Want to pilot one of our programs?

Community colleges, workforce boards, refugee resettlement agencies, Title I schools, and minority-serving institutions are eligible for the Fall 2026 cohort.