Job Segmentation Quality Improvement: Job Board Level Up
Created Date | May 30, 2023 |
---|---|
Target PI | PI4 |
Target Release | Jul 28, 2023 |
Jira Epic | |
Document Status | Draft |
Epic Owner | @Nick Studt (Deactivated) |
Stakeholder | @Oree Wyatt @Abby Santos @Mark Taylor @Everett Bloch @Xiang Li |
Engineering Team(s) Involved | Micro C&E NLP |
PART 1
Customer/User Job-to-be-Done or Problem
When using Lightcast Job Postings API to surface live job postings (Job Board), I need to be able to display Job Posting description and details in a way that is coherant, organized and consistent. However, the current field (“body”) contains clear scraping issues and is not organized in a way that I can customize the presentation of the various segments of data.
Value to Customers & Users
Displaying job details which can be filtered down in a constant, organized way is the primary purpose of a job board and value that job boards bring to their users. Without being able to provide clean text to summarize the job details, the client cannot really leverage our solution to power their job board and will need to look elsewhere.
A good guideline to start with is Google’s rules for job posting structure, found here:https://developers.google.com/search/docs/appearance/structured-data/job-posting
Since our job boards client’s product is the job posting, they will live and die by optimizing to these standards to drive traffic to their site.
Value to Lightcast
This work is important because it’s important to KEEP existing job board clients (primarily Legacy BG users) and grow this segment of our client base for Lightcast. Without a few key investments in this area, we will continue to lose renewals and deals involving job boards. This it a key piece of being able to win in this market and represent a large existing book of business (well over $1M).
I’d propose we leverage our existing job segmentation classifier as a starting point and build improvements on that to optimize for this use case. I think if we can clean up this classification it would also yield improvements to the input of our other classifiers including better skills extraction.
Cleaning up segmentation (especially around boilerplate/junk removal) and having a way to tag C&E classifiers on the cleaned up body that has boilerplate/noise removed would really help with certain miscodes that we can’t fix currently. This would also impact global tagging where we sometimes get broken html in a posting and can just scoop it out so we don’t mistag any skills or anything else within the broken html portion of a document.
Sample Data:
https://docs.google.com/spreadsheets/d/17lWBSNc2Ro8gdRmteUJAq5hzlimhsAUi7A36IF4s50w/edit?usp=sharing
Issues we could fix by taking Boilerplate out of being tagged by any classifier:
https://economicmodeling.atlassian.net/browse/SUPPORT-30898
https://economicmodeling.atlassian.net/browse/SUPPORT-30450
Target User Role/Client/Client Category
Clients who need to display job posting text in their workflow (job boards). We will also be able to deliver this value to our own tools where we display posting details (SkillFit, Analyst, Career Coarch, etc)
Delivery Mechanism
Improvement of Job Segmentation Classifier
US, UK and Canada Job Posting API (eventually Global Postings API)
Success Criteria & Metrics
Goal is to deliver MVP of this segmentation which is good enough (~85% accuracy) that we can expose these fields to client and iterate from there - leveraging key client partnerships to get feedback.
Aspects that are out of scope (of this phase)
PART 2
Solution Description
Early UX (wireframes or mockups)
<FigmaLink>
Non-Functional Attributes & Usage Projections
Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements
Dependencies
Is there any work that must precede this? Feature work? Ops work?
Legal and Ethical Considerations
Just answer yes or no.
High-Level Rollout Strategies
Initial rollout to [internal employees|sales demos|1-2 specific beta customers|all customers]
If specific beta customers, will it be for a specific survey launch date or report availability date
How will this guide the rollout of individual stories in the epic?
The rollout strategy should be discussed with CS, Marketing, and Sales.
How long we would tolerate having a “partial rollout” -- rolled out to some customers but not all
Risks
Focus on risks unique to this feature, not overall delivery/execution risks.
Open Questions
What are you still looking to resolve?
Complete with Engineering Teams
Effort Size Estimate |
---|
Estimated Costs
Direct Financial Costs
Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?
Team Effort
Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.
Team | Effort Estimate (T-shirt sizes) | Jira Link |
---|---|---|
C&E | XL | |
|
|
|