Job Segmentation Quality Improvement: Job Board Level Up

Job Segmentation Quality Improvement: Job Board Level Up

 

Created Date

May 30, 2023

Target PI

PI4

Target Release

Jul 28, 2023

Jira Epic

Document Status

Draft

Epic Owner

@Nick Studt (Deactivated)

Stakeholder

@Oree Wyatt @Abby Santos @Mark Taylor @Everett Bloch @Xiang Li

Engineering Team(s) Involved

Micro C&E NLP

PART 1

Customer/User Job-to-be-Done or Problem

When using Lightcast Job Postings API to surface live job postings (Job Board), I need to be able to display Job Posting description and details in a way that is coherant, organized and consistent. However, the current field (“body”) contains clear scraping issues and is not organized in a way that I can customize the presentation of the various segments of data.

 

Value to Customers & Users

Displaying job details which can be filtered down in a constant, organized way is the primary purpose of a job board and value that job boards bring to their users. Without being able to provide clean text to summarize the job details, the client cannot really leverage our solution to power their job board and will need to look elsewhere.

A good guideline to start with is Google’s rules for job posting structure, found here:https://developers.google.com/search/docs/appearance/structured-data/job-posting

Since our job boards client’s product is the job posting, they will live and die by optimizing to these standards to drive traffic to their site.

 

Value to Lightcast

This work is important because it’s important to KEEP existing job board clients (primarily Legacy BG users) and grow this segment of our client base for Lightcast. Without a few key investments in this area, we will continue to lose renewals and deals involving job boards. This it a key piece of being able to win in this market and represent a large existing book of business (well over $1M).

I’d propose we leverage our existing job segmentation classifier as a starting point and build improvements on that to optimize for this use case. I think if we can clean up this classification it would also yield improvements to the input of our other classifiers including better skills extraction.

Cleaning up segmentation (especially around boilerplate/junk removal) and having a way to tag C&E classifiers on the cleaned up body that has boilerplate/noise removed would really help with certain miscodes that we can’t fix currently. This would also impact global tagging where we sometimes get broken html in a posting and can just scoop it out so we don’t mistag any skills or anything else within the broken html portion of a document.

Sample Data:

https://docs.google.com/spreadsheets/d/17lWBSNc2Ro8gdRmteUJAq5hzlimhsAUi7A36IF4s50w/edit?usp=sharing

Issues we could fix by taking Boilerplate out of being tagged by any classifier:

https://economicmodeling.atlassian.net/browse/SUPPORT-30898

https://economicmodeling.atlassian.net/browse/SUPPORT-30450

 

Target User Role/Client/Client Category

Clients who need to display job posting text in their workflow (job boards). We will also be able to deliver this value to our own tools where we display posting details (SkillFit, Analyst, Career Coarch, etc)

 

Delivery Mechanism

Improvement of Job Segmentation Classifier

US, UK and Canada Job Posting API (eventually Global Postings API)

 

Success Criteria & Metrics

Goal is to deliver MVP of this segmentation which is good enough (~85% accuracy) that we can expose these fields to client and iterate from there - leveraging key client partnerships to get feedback.

 

Aspects that are out of scope (of this phase)

 

 

PART 2

Solution Description

Early UX (wireframes or mockups)

<FigmaLink>

 

Non-Functional Attributes & Usage Projections

Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements

 

Dependencies

Is there any work that must precede this? Feature work? Ops work? 

 

Legal and Ethical Considerations

Just answer yes or no.

Have you thought through these considerations (e.g. data privacy) and raised any potential concerns with the Legal team?

High-Level Rollout Strategies

  • Initial rollout to [internal employees|sales demos|1-2 specific beta customers|all customers]

    • If specific beta customers, will it be for a specific survey launch date or report availability date 

  • How will this guide the rollout of individual stories in the epic?

  • The rollout strategy should be discussed with CS, Marketing, and Sales.

  • How long we would tolerate having a “partial rollout” -- rolled out to some customers but not all

 

Risks

Focus on risks unique to this feature, not overall delivery/execution risks. 

 

Open Questions

What are you still looking to resolve?

 


Complete with Engineering Teams

 

Effort Size Estimate

Estimated Costs

Direct Financial Costs

Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?

 

Team Effort

Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.

Team

Effort Estimate (T-shirt sizes)

Jira Link

Team

Effort Estimate (T-shirt sizes)

Jira Link

C&E

XL

https://economicmodeling.atlassian.net/browse/CE-1459