LOT Specialized Occupations in Italian and Spanish
Created Date | Nov 17, 2022 |
---|---|
Target PI | 1 |
Target Release | |
Jira Epic | |
Document Status | Draft |
Epic Owner | @Duncan Brown (Unlicensed) |
Stakeholder | @Mauro Pelucchi @John Pernsteiner @Hal Bonella @Alexandra Malfant |
Engineering Team(s) Involved | Documents C&E Taxonomy Models |
Customer/User Job-to-be-Done or Problem
The Scope of the user problem should be narrowed to the scope you are planning to solve in this phase of work. There may be other aspects you are aware of and plan to solve in the future. For now, put those in the Out of Scope section.
When analysing Italian data in Global Postings and Global Profiles, I want to be able to categorise by LOTv6, so I can achieve greater granularity and compare to English language postings in LOTv6.
Lightcast’s taxonomies are vital assets adding value to our data, and as we move into non-English language postings we want to make those taxonomies operable. While we are already making significant investments in translations of Lightcast Skills, LOT so far is only operating on English language postings.
This Epic’s goal is to create a replicable methodology and model for the classification of documents by LOTv6 SpOcc in two languages: Italian and Spanish. It builds on previous work by the Global Data Science team to classify LatAm data for the IADB project - this was primarily focused on ESCO occupation classification, but the work was widened to cover SpOcc as well.
The aim here is to:
Identify quality benchmarks for MVP SpOcc classification in Italian and Spanish language
Work to improve classification performance from the previous experiment (by e.g. improving training data, evaluating other models, potentially gaining translated SpOcc titles)
Work with Documents and C&E to ensure that the model developed is ready for production on the Omni pipeline
Document the process as a model to scale across other languages to move towards LOT across all major European languages during 2023
Value to Customers & Users
In the JTBD framework, these are the “pains” and “gains” your solution will address. Other ways to think about it: What’s the rationale for doing this work? Why is it a high priority problem for your customers and how will our solution add value?
LOT is a substantial improvement in granularity over national taxonomies in most geos, allowing users much greater depth of analysis of postings and profiles data
LOT is not available outside of English language documents, and finding a scalable model to expand it into other languages will allow its benefits to be available to a much wider range of use cases
Value to Lightcast
Sometimes we do things for our own benefit. List those reasons here.
Multiplies the value of our investments in LOT as a taxonomy if it can be used across European language postings
Opens the door for LOT to be used to drive national occupation taxonomy classification in a wider range of places, rather than maintaining multiple occupation classifiers
Potentially provides a model for working with other enrichments across languages
Target User Role/Client/Client Category
Who are we building this for?
All users who want to use LOT on Italian and Spanish language data
All users who want to use LOT on non-English language data, if the work is successful in creating a methodology which can be scaled
Delivery Mechanism
How will users receive the value?
A model will be created, ready for production on the Omni pipeline, which can classify Italian or Spanish language documents into LOTv6 SpOccs.
Success Criteria & Metrics
How will you know you’ve completed the epic? How will you know if you’ve successfully addressed this problem? What usage goals do you have for these new features? How will you measure them?
SpOcc classification will meet quality benchmarks to be decided at the start of the Epic
The model is ready to implement on Omni
The model and methodology can be used for other European languages
Aspects that are out of scope (of this phase)
What is explicitly not a part of this epic? List things that have been discussed but will not be included. Things you imagine in a phase 2, etc.
The model will not be implemented, SpOccs will not be published on Italian or Spanish language postings at the end of the Epic
Solution Description
Early UX (wireframes or mockups)
<FigmaLink>
Non-Functional Attributes & Usage Projections
Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements
Dependencies
Is there any work that must precede this? Feature work? Ops work?
Legal and Ethical Considerations
Just answer yes or no.
High-Level Rollout Strategies
Initial rollout to [internal employees|sales demos|1-2 specific beta customers|all customers]
If specific beta customers, will it be for a specific survey launch date or report availability date
How will this guide the rollout of individual stories in the epic?
The rollout strategy should be discussed with CS, Marketing, and Sales.
How long we would tolerate having a “partial rollout” -- rolled out to some customers but not all
Risks
Focus on risks unique to this feature, not overall delivery/execution risks.
Open Questions
What are you still looking to resolve?
Complete with Engineering Teams
Effort Size Estimate |
---|
Estimated Costs
Direct Financial Costs
Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?
Team Effort
Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.
Team | Effort Estimate (T-shirt sizes) | Jira Link |
---|---|---|
C&E | XL | |
|
|
|