ONET 2019 values from LOT Spocc classifier (US Postings)
Target PI | PI#6/7 |
---|---|
Created Date | Oct 26, 2022 |
Target Release |
|
Jira Epic | https://economicmodeling.atlassian.net/browse/CE-334 https://economicmodeling.atlassian.net/browse/DT-2896 https://economicmodeling.atlassian.net/browse/TAX-1092 |
Document Status | Draft |
Epic Owner | @Hal Bonella |
Stakeholder | @Ben Bradley @Matt McNair @Jackson Schuur @Nathan Triepke @Tatiana Harrison |
Engineering Team(s) Involved | Documents C&E Taxonomy Data Quality |
Customer/User Job-to-be-Done or Problem
The Scope of the user problem should be narrowed to the scope you are planning to solve in this phase of work. There may be other aspects you are aware of and plan to solve in the future. For now, put those in the Out of Scope section.
When tagging documents with ONET 2019 values, I want to have the values be connected to the Specialized Occupation values from LOT, so I can consistency in occupation tagging in our postings data.
JTBD (external): As a user of US Posting data in Lightcast products (Analyst, Snowflake, API, etc.), I expect the occupation tagging to be of high quality especially for government taxonomies like O*NET 2019 and SOC. With the new release of Lightcast Occupation Taxonomy, major shifts must be explained, especially if I have relied on ONET 2019 and SOC tagging for many of my previous reports.
JTBD (internal): As one of the internal “clients” of our US postings (i.e. Documents), I want to simplify our occupation tagging and the number of occupation fields we have. I also aim to become independent from NOVA so we can reduce cost. Switching to an ONET 2019 classifier will allow us to 1) remove onet
and soc_emsi_2019
fields, simplifying how many fields are being process as part of our pipeline run and 2) get closer to getting off of NOVA. In addition, by connecting ONET 2019 (and thus SOC 2021) to the LOT/Spocc classifier, we essentially have one classifier for occupations for US postings, making it easier to track and resolve occupation tagging issues.
Value to Customers & Users
In the JTBD framework, these are the “pains” and “gains” your solution will address. Other ways to think about it: What’s the rationale for doing this work? Why is it a high priority problem for your customers and how will our solution add value?
For customers and users of our software (Analyst, API, Snowflake, Careercoach, etc.), they will be able to compare our LOT taxonomy with how we tag US postings with ONET 2019 and see more consistency based on how they will be connected in our enrichment process. This also has the value of fixing a LOT tagging issue will also correct an ONET tagging issue right away and vice versa.
Rationale to do work:
Primary value is to Lightcast see below
to help make sure clients get consistent results whether they use ONET 2019, SOC, or LOT for their reports.
Value to Lightcast
Sometimes we do things for our own benefit. List those reasons here.
This will have the following value for Lightcast
Removing a large NOVA/LENS dependency for US postings
ensure sunset of NOVA dependencies and to drive down costs
currently ONET 2019 values are based on ONET 2010 values from NOVA/LENS.
Allows us to remove fields currently tagged on documents, making pipeline process a bit easier.
One classifier for all our occupations on US jpa
specifically, all occupation tagging can be traced back to the specialized occupation the postings is tagged with
More consistency in our data
Allow us to more easily fix occupation tagging issues
Currently NOVA is putting many tagging issues from their values as a low priority unless its causing a major disruption for majority of clients.
Monetary Value to Lightcast
This project will mainly help reduce costs for Lightcast. Currently, having US postings connected to NOVA through NOVA-dependent fields results in a minimum of $17,000 per month for the company. This project removes a major NOVA dependency for US jpa, bring us closer to getting US jpa off of NOVA.
Target User Role/Client/Client Category
Who are we building this for?
Occupation is a field used across all BUs and products, hence all US JPA users are target users. That said, main concern for them is to communication about any data shifts.
Internally, target user is Documents team to help them get off of NOVA and make pipeline runs for US a bit easier.
Delivery Mechanism
How will users receive the value?
onet_2019
field will be populated from spocc classifier from C&E during enrichment phase of pipeline runThis will automatically update SOC_2021 values as well on documents
since values will populate an existing field (once the data quality has been approved), no changes are needed from Micro/API, Analyst, Snowflake, etc. Delivery mechanism for clients will be unchanged from their end.
Once switch is made by documents, all other teams will automatically get updated data
Will need to message clients about changes before switch
Success Criteria & Metrics
How will you know you’ve completed the epic? How will you know if you’ve successfully addressed this problem? What usage goals do you have for these new features? How will you measure them?
onet_2019 values are consistent with LOT specialized Occupation values seen by clients, as defined by the crossover rules from taxonomy
All shifts postings numbers for onet_2019 are defendable based on release documents for LOT for US.
Recall on O*NET should not fall (Currently around 95%)
Accuracy on O*NET should be greater than current accuracy (Current accuracy is 81% - Given that current accuracy of LOT in the US is 84%, this should be achievable!)
Previous fixes that have been applied via patches should also be examined specifically as part of QA - Regression on issues that clients have previously had fixed is very damaging.
Aspects that are out of scope (of this phase)
What is explicitly not a part of this epic? List things that have been discussed but will not be included. Things you imagine in a phase 2, etc.
UK SOC based on LOT classifier - Done in PI#6
CA NOC based on LOT classifier - Done in PI#4
US ONET_2019 and SOC 20219 on Profiles
Global Postings National Taxonomies (ISCO and others)
LOT taxonomy being tagged on postings/profiles
for US JPA, should already be completed
for other data sets, can be cone in parallel but out of scope of the phase of LOT project
Solution Description
Early UX (wireframes or mockups)
<FigmaLink>
Non-Functional Attributes & Usage Projections
Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements
Dependencies
Is there any work that must precede this? Feature work? Ops work?
US LOT specialized occupation must have met the acceptable data quality criteria
ONET 2019 tagging will be based on specialized occupation
The initial analysis on ONET coding from LOT is here More work will need to be done on analysis of this and on the mapping as we see a couple of items that that are not correct (1 number of nulls in ONETs and 2 a lot of generic ONETs not having coding, this is due to more granular mapping from spec occ)
Due to the large change in LOT it is expected that ONET will also have large change.
Legal and Ethical Considerations
Just answer yes or no.
High-Level Rollout Strategies
Initial rollout to [internal employees|sales demos|1-2 specific beta customers|all customers]
If specific beta customers, will it be for a specific survey launch date or report availability date
How will this guide the rollout of individual stories in the epic?
The rollout strategy should be discussed with CS, Marketing, and Sales.
How long we would tolerate having a “partial rollout” -- rolled out to some customers but not all
rc_onet_2019
fields. Do the same for all levels of soc_2021
rc_onet_2019
fieldsonet_2019
valuesrc_onet_2019
populate onet_2019
field on US jpaRisks
Focus on risks unique to this feature, not overall delivery/execution risks.
If there are major changes/data shifts for onet 2019 that are not explainable to clients, it could be a rick to Lightcast reputation are data quality. Clients may lose trust in our product.
Occupation tagging is one the top features used by job postings clients.
Open Questions
What are you still looking to resolve?
Complete with Engineering Teams
Effort Size Estimate | M |
---|
Estimated Costs
Direct Financial Costs
Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?
Team Effort
Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.
Team | Effort Estimate (T-shirt sizes) | Jira Link |
---|---|---|
Documents | Small | |
|
|
|