https://economicmodeling.atlassian.net/browse/TX-1406

Target PI	PI#6/7
Created Date	Sep 15, 2023
Target Release	End of 2023
Jira Epic	https://economicmodeling.atlassian.net/browse/DATA-1881 https://economicmodeling.atlassian.net/browse/DATA-2102 https://economicmodeling.atlassian.net/browse/TX-1406
Document Status
Epic Owner	@Hal Bonella @john.miner (Deactivated)
Stakeholder	@Ben Bradley @Dave Wallace (Deactivated) @Lendl Meyer (Deactivated) @Lottes Salter @Tatiana Harrison @Rachael Larsen @Caleb Paul @Gavin Esser
Engineering Team(s) Involved	Documents Micro C&E Analyst Taxonomy Data Solutions NLP ML

Customer/User Job-to-be-Done or Problem

The Scope of the user problem should be narrowed to the scope you are planning to solve in this phase of work. There may be other aspects you are aware of and plan to solve in the future. For now, put those in the Out of Scope section.

As a client accessing Lightcast’s global data, I want to profile data with LOT-based filters for increased granularity and specificity compared to (for example) ONET or SOC. As a recruiter/corporate hr/etc, I want to understand and compare the supply of talent in markets across the globe at a level that describes the job responsibilities of individuals.

JTBD 1: As a user of Pathways data, I want to report my program outcomes in user-friendly job title terminology but at a meaningful level of aggregation, so that I can produce insightful reports without significantly customizing them before distribution. This is something that is currently possible in Lightcast, but at an extra cost to the company.

JTBD 2: As a user of Pathways data, I want to be able to connect my program outcomes to the specialized occupations-based reporting available in other Lightcast products, so that I can gain insights into how my programs are interacting with the regional labor market.

Value to Customers & Users

In the JTBD framework, these are the “pains” and “gains” your solution will address. Other ways to think about it: What’s the rationale for doing this work? Why is it a high priority problem for your customers and how will our solution add value?

Value to Lightcast

Sometimes we do things for our own benefit. List those reasons here.
From Lightcast perspective, tagging global profiles with LOT’s Specialized Occupations would provide granular supply data for our tools and analysis.

Monetary value for Lightcast

Adding LOT to profiles will bring in some more revenue, though how much is yet to be determined (Chris K estimated $4-6m). Part of the revenue will be from SkillScape.

The cost reduction for LOT itself is minimal. Main cost reduction will be due to

removing work/time dedicated to updating title roles for Title releases, Title-SOC mapping, etc.
Out of Scope by removing work/time dedicated to updating ONET tensor flow mapping once we are on the new ONET 2019 tagger (which depends on having LOT on profiles)

There is no NOVA dependency for this dataset.

Target User Role/Client/Client Category

Who are we building this for?

Global BU
Analyst projects, particularly Profile Analytics and any JPA report that also shows Profile data
SkillScape
Alumni Pathways clients who use title roles.

Delivery Mechanism

How will users receive the value?

Scope of PI#6

Must have:

US LOT Profiles classifier working with Global profiles, to produce one LOT value per “experience” point on each profile.
Sense checking a sample of LOT on Global profiles from the postings classifier to assess recall and accuracy - @Lottes Salter ?
QA on determined specific sample of Occ/Title frequencies - @Hal Bonella to co-ordinate with Rachel S (AR team) and @Nathan Triepke
Feedback and improvement loop established and tested
Snowflake to accept LOT on global profiles from the classifier. @Oree Wyatt

Must have:

US LOT Profiles classifier working with Global profiles, to produce one LOT value per “experience” point on each profile.
Sense checking a sample of LOT on Global profiles from the postings classifier to assess recall and accuracy - @Lottes Salter ?
QA on determined specific sample of Occ/Title frequencies - @Hal Bonella to co-ordinate with Rachel S (AR team) and @Nathan Triepke
Feedback and improvement loop established and tested
Snowflake to accept LOT on global profiles from the classifier. @Oree Wyatt

Nice to have:

Not in scope:

Hard launch of LOT in global profile analytics - This will be a PI#7 task.

Success Criteria & Metrics

How will you know you’ve completed the epic? How will you know if you’ve successfully addressed this problem? What usage goals do you have for these new features? How will you measure them?

LOT classifier on profiles is relatively accurate
- It maybe be at a lower rate that postings, but still needs to be 70% accurate
- Should not have any obvious errors in top 100 titles

Aspects that are out of scope (of this phase)

What is explicitly not a part of this epic? List things that have been discussed but will not be included. Things you imagine in a phase 2, etc.

Hard launch of global profiles LOT to clients.

Solution Description

Early UX (wireframes or mockups)

Non-Functional Attributes & Usage Projections

Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements

Dependencies

Is there any work that must precede this? Feature work? Ops work?

Classifier working on US profiles

Legal and Ethical Considerations

Just answer yes or no.

Have you thought through these considerations (e.g. data privacy) and raised any potential concerns with the Legal team?

High-Level Rollout Strategies

Initial rollout to [internal employees|sales demos|1-2 specific beta customers|all customers]
- If specific beta customers, will it be for a specific survey launch date or report availability date
How will this guide the rollout of individual stories in the epic?
The rollout strategy should be discussed with CS, Marketing, and Sales.
How long we would tolerate having a “partial rollout” -- rolled out to some customers but not all

Tag LOT classifier on documents

tag model

Data quality checks

Build classifier

Data quality for LOT classifier approved

Add fields into API and Snowflake

Add fields to Analyst there profile data appears and occupation filters used

Have fields be part of Alumni Outlook data delivery

Make sure fields are compatible with Matcher and AO clients can receive data

Risks

Using the “Postings” classifier for Profiles

Classifier Regex / maintenance:

How do we write a rule for a SpecOcc to tag on profiles and not postings (within the same rule_set)?
- Build additional field for specifying whether the classifier rule should be used on postings data vs profiles data
- Call it is_postings?
What model do we use?
- Postings or Profiles?
  - I think the profiles model could work well since it’s geared for postings data.

Classifier Code / classifier:

If Input requirements remain the same, then no significant change will be needed.
- “experience title” put into the raw_title field
- “experience description” and “company” info put into the body field
The postings classifier does not currently have functionality to specify if a regex rule should be applied to postings data or profiles data
- This is needed because not all occupations exist in postings data (Founder/Owner, Students, Housewives/husbands)
- This is also needed because not all titles for roles are applicable in both postings or profiles
  - “Frogman” may be a military title in profiles data, but a sales associate title for a petstore in postings data.
We need the ability for the classifier to create two separate artifacts, one for postings use and one for profiles use.
- This way downstream teams do not need to customize extra inputs in anyway to use the appropriate classifier.

Open Questions

What are you still looking to resolve?

Complete with Engineering Teams

Effort Size Estimate	L

Estimated Costs

Direct Financial Costs

Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?

Team Effort

Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.

Team	Effort Estimate (T-shirt sizes)	Jira Link	Work to be done

Team	Effort Estimate (T-shirt sizes)	Jira Link	Work to be done
C&E	M?	No specific work anticipated	Alter regex rules to allow for postings functionality Alter artifact creation to allow for postings and profiles artifacts individually Alter change logs to reflect postings rules updates vs profiles rules updates
DS	M		Sense check and report issues as above
WF Taxonomies	S		Turn on profile specific LOT SpecOccs for profiles and turn them off for postings

Data Strategy

LOT classification of Global Profiles