LOT classification of non-English Postings

LOT classification of non-English Postings

 

Target PI

PI#6/7

Created Date

Sep 15, 2023

Target Release

End of 2023

Jira Epic

 

Document Status

 

Epic Owner

@Hal Bonella @Abby Santos

Stakeholder

@Alexandra Malfant @Tatiana Harrison

Engineering Team(s) Involved

Documents Micro C&E Taxonomy Data Solutions NLP ML

Customer/User Job-to-be-Done or Problem

The Scope of the user problem should be narrowed to the scope you are planning to solve in this phase of work. There may be other aspects you are aware of and plan to solve in the future. For now, put those in the Out of Scope section.

When users use out tools and products to access our global data, I want to non-English postings data to appear when they use LOT-based filters, so I can expect similar results as English postings.

The following languages are to be tested:

Spanish
French
German
Dutch
Italian
Portuguese
Polish
Danish
Czech
Swedish
Romanian

Value to Customers & Users

In the JTBD framework, these are the “pains” and “gains” your solution will address. Other ways to think about it: What’s the rationale for doing this work? Why is it a high priority problem for your customers and how will our solution add value?

 

Value to Lightcast

Sometimes we do things for our own benefit. List those reasons here. 
From Lightcast perspective, tagging global non-English postings with LOT’s Specialized Occupations would provide a greater breadth of information on the global labor force, giving granular demand data for our tools and analysis.

Monetary value for Lightcast

Adding LOT to non-English postings may bring in some more revenue, though how much is yet to be determine. In general, we cannot be a global authority focusing only on English language.

There is no NOVA dependency for this dataset.

Target User Role/Client/Client Category

Who are we building this for?

  • Analyst projects, and any JPA report that shows global postings data

  •  

Delivery Mechanism

How will users receive the value?

 

 Scope of PI#6

Must have:

  • Global LOT Postings classifier working with non-English postings (or translations thereof), to produce one LOT value per posting.

  • Following languages:
    Spanish
    French
    German
    Dutch
    Italian
    Portuguese
    Polish
    Danish
    Czech
    Swedish
    Romanian

Must have:

  • Global LOT Postings classifier working with non-English postings (or translations thereof), to produce one LOT value per posting.

  • Following languages:
    Spanish
    French
    German
    Dutch
    Italian
    Portuguese
    Polish
    Danish
    Czech
    Swedish
    Romanian

Nice to have:

 

Not in scope:

  •  Hard launch of LOT on global non-English postings to all clients.

Success Criteria & Metrics

How will you know you’ve completed the epic? How will you know if you’ve successfully addressed this problem? What usage goals do you have for these new features? How will you measure them?

  • LOT classifier on postings is relatively accurate

    • It maybe be at a lower rate than English postings, but still needs to be 70% accurate

    • Should not have any obvious errors in top 100 titles

Aspects that are out of scope (of this phase)

What is explicitly not a part of this epic? List things that have been discussed but will not be included. Things you imagine in a phase 2, etc.

 

Solution Description

Early UX (wireframes or mockups)

<FigmaLink>

 

Non-Functional Attributes & Usage Projections

Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements

 

Dependencies

Is there any work that must precede this? Feature work? Ops work? 

 

  • Classifier working on US profiles

Legal and Ethical Considerations

Just answer yes or no.

Have you thought through these considerations (e.g. data privacy) and raised any potential concerns with the Legal team?

High-Level Rollout Strategies

  • Initial rollout to [internal employees|sales demos|1-2 specific beta customers|all customers]

    • If specific beta customers, will it be for a specific survey launch date or report availability date 

  • How will this guide the rollout of individual stories in the epic?

  • The rollout strategy should be discussed with CS, Marketing, and Sales.

  • How long we would tolerate having a “partial rollout” -- rolled out to some customers but not all

Tag LOT classifier on documents
tag model
Data quality checks
Build classifier
Data quality for LOT classifier approved
Add fields into API and Snowflake
Add fields to Analyst

Risks

Using translated titles and translated skills to classify postings may reduce recall and accuracy. This needs to be established.

Open Questions

What are you still looking to resolve?

 


Complete with Engineering Teams

 

Effort Size Estimate

L

Estimated Costs

Direct Financial Costs

Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?

 

Team Effort

Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.

Team

Effort Estimate (T-shirt sizes)

Jira Link

Work to be done

Team

Effort Estimate (T-shirt sizes)

Jira Link

Work to be done

C&E

L

 

  •  

  •  

NLP

M

 

  •  

WF Taxonomies

M

 

  •