/
LOT classification of Global Profiles

LOT classification of Global Profiles


https://economicmodeling.atlassian.net/browse/TX-1406

Target PI

PI#6/7

Created Date

Sep 15, 2023

Target Release

End of 2023

Jira Epic

https://economicmodeling.atlassian.net/browse/DATA-1881

https://economicmodeling.atlassian.net/browse/DATA-2102

https://economicmodeling.atlassian.net/browse/TX-1406

Document Status

 

Epic Owner

@Hal Bonella @john.miner (Deactivated)

Stakeholder

@Ben Bradley @Dave Wallace (Deactivated) @Lendl Meyer (Deactivated) @Lottes Salter @Tatiana Harrison @Rachael Larsen @Caleb Paul @Gavin Esser

Engineering Team(s) Involved

Documents Micro C&E Analyst Taxonomy Data Solutions NLP ML

Customer/User Job-to-be-Done or Problem

The Scope of the user problem should be narrowed to the scope you are planning to solve in this phase of work. There may be other aspects you are aware of and plan to solve in the future. For now, put those in the Out of Scope section.

As a client accessing Lightcast’s global data, I want to profile data with LOT-based filters for increased granularity and specificity compared to (for example) ONET or SOC. As a recruiter/corporate hr/etc, I want to understand and compare the supply of talent in markets across the globe at a level that describes the job responsibilities of individuals.


JTBD 1: As a user of Pathways data, I want to report my program outcomes in user-friendly job title terminology but at a meaningful level of aggregation, so that I can produce insightful reports without significantly customizing them before distribution. This is something that is currently possible in Lightcast, but at an extra cost to the company.

JTBD 2: As a user of Pathways data, I want to be able to connect my program outcomes to the specialized occupations-based reporting available in other Lightcast products, so that I can gain insights into how my programs are interacting with the regional labor market.

Value to Customers & Users

In the JTBD framework, these are the “pains” and “gains” your solution will address. Other ways to think about it: What’s the rationale for doing this work? Why is it a high priority problem for your customers and how will our solution add value?

 

Value to Lightcast

Sometimes we do things for our own benefit. List those reasons here. 
From Lightcast perspective, tagging global profiles with LOT’s Specialized Occupations would provide granular supply data for our tools and analysis.

Monetary value for Lightcast

Adding LOT to profiles will bring in some more revenue, though how much is yet to be determined (Chris K estimated $4-6m). Part of the revenue will be from SkillScape.

The cost reduction for LOT itself is minimal. Main cost reduction will be due to

  • removing work/time dedicated to updating title roles for Title releases, Title-SOC mapping, etc.

  • Out of Scope by removing work/time dedicated to updating ONET tensor flow mapping once we are on the new ONET 2019 tagger (which depends on having LOT on profiles)

There is no NOVA dependency for this dataset.

Target User Role/Client/Client Category

Who are we building this for?

  • Global BU

  • Analyst projects, particularly Profile Analytics and any JPA report that also shows Profile data

  • SkillScape

  • Alumni Pathways clients who use title roles.

Delivery Mechanism

How will users receive the value?

 

 Scope of PI#6

Must have:

  • US LOT Profiles classifier working with Global profiles, to produce one LOT value per “experience” point on each profile.

  • Sense checking a sample of LOT on Global profiles from the postings classifier to assess recall and accuracy - @Lottes Salter ?

  • QA on determined specific sample of Occ/Title frequencies - @Hal Bonella to co-ordinate with Rachel S (AR team) and @Nathan Triepke

  • Feedback and improvement loop established and tested

  • Snowflake to accept LOT on global profiles from the classifier. @Oree Wyatt

Must have:

  • US LOT Profiles classifier working with Global profiles, to produce one LOT value per “experience” point on each profile.

  • Sense checking a sample of LOT on Global profiles from the postings classifier to assess recall and accuracy - @Lottes Salter ?

  • QA on determined specific sample of Occ/Title frequencies - @Hal Bonella to co-ordinate with Rachel S (AR team) and @Nathan Triepke

  • Feedback and improvement loop established and tested

  • Snowflake to accept LOT on global profiles from the classifier. @Oree Wyatt

Nice to have:

  •  

Not in scope:

  •  Hard launch of LOT in global profile analytics - This will be a PI#7 task.

Success Criteria & Metrics

How will you know you’ve completed the epic? How will you know if you’ve successfully addressed this problem? What usage goals do you have for these new features? How will you measure them?

  • LOT classifier on profiles is relatively accurate

    • It maybe be at a lower rate that postings, but still needs to be 70% accurate

    • Should not have any obvious errors in top 100 titles

Aspects that are out of scope (of this phase)

What is explicitly not a part of this epic? List things that have been discussed but will not be included. Things you imagine in a phase 2, etc.

  • Hard launch of global profiles LOT to clients.

 

Solution Description

Early UX (wireframes or mockups)

<FigmaLink>

 

Non-Functional Attributes & Usage Projections

Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements

 

Dependencies

Is there any work that must precede this? Feature work? Ops work? 

 

  • Classifier working on US profiles

Legal and Ethical Considerations

Just answer yes or no.

Have you thought through these considerations (e.g. data privacy) and raised any potential concerns with the Legal team?

High-Level Rollout Strategies

  • Initial rollout to [internal employees|sales demos|1-2 specific beta customers|all customers]

    • If specific beta customers, will it be for a specific survey launch date or report availability date 

  • How will this guide the rollout of individual stories in the epic?

  • The rollout strategy should be discussed with CS, Marketing, and Sales.

  • How long we would tolerate having a “partial rollout” -- rolled out to some customers but not all

Tag LOT classifier on documents
tag model
Data quality checks
Build classifier
Data quality for LOT classifier approved
Add fields into API and Snowflake
Add fields to Analyst there profile data appears and occupation filters used
Have fields be part of Alumni Outlook data delivery
Make sure fields are compatible with Matcher and AO clients can receive data

Risks

Using the “Postings” classifier for Profiles

Classifier Regex / maintenance:

  • How do we write a rule for a SpecOcc to tag on profiles and not postings (within the same rule_set)?

    • Build additional field for specifying whether the classifier rule should be used on postings data vs profiles data

    • Call it is_postings?

  • What model do we use?

    • Postings or Profiles?

      • I think the profiles model could work well since it’s geared for postings data.

Classifier Code / classifier:

  • If Input requirements remain the same, then no significant change will be needed.

    • “experience title” put into the raw_title field

    • “experience description” and “company” info put into the body field

  • The postings classifier does not currently have functionality to specify if a regex rule should be applied to postings data or profiles data

    • This is needed because not all occupations exist in postings data (Founder/Owner, Students, Housewives/husbands)

    • This is also needed because not all titles for roles are applicable in both postings or profiles

      • “Frogman” may be a military title in profiles data, but a sales associate title for a petstore in postings data.

  • We need the ability for the classifier to create two separate artifacts, one for postings use and one for profiles use.

    • This way downstream teams do not need to customize extra inputs in anyway to use the appropriate classifier.

Open Questions

What are you still looking to resolve?

 


Complete with Engineering Teams

 

Effort Size Estimate

L

Estimated Costs

Direct Financial Costs

Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?

 

Team Effort

Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.

Team

Effort Estimate (T-shirt sizes)

Jira Link

Work to be done

Team

Effort Estimate (T-shirt sizes)

Jira Link

Work to be done

C&E

M?

No specific work anticipated

  • Alter regex rules to allow for postings functionality

  • Alter artifact creation to allow for postings and profiles artifacts individually

  • Alter change logs to reflect postings rules updates vs profiles rules updates

DS

M

 

  • Sense check and report issues as above

WF Taxonomies

S

 

  • Turn on profile specific LOT SpecOccs for profiles and turn them off for postings

 

 

Related content