NOVA Bypass UK & CA ACME
Created Date | Jan 31, 2023 |
---|---|
Target PI | PI 3 |
Target Release |
|
Jira Epic | https://economicmodeling.atlassian.net/browse/TX-19https://economicmodeling.atlassian.net/browse/DT-3621 https://economicmodeling.atlassian.net/browse/DATA-1231 https://economicmodeling.atlassian.net/browse/TX-20 |
Document Status | Draft |
Epic Owner | @Thomas Worden |
Stakeholder | @Abby Santos @Matt McNair @Tatiana Harrison @Chris Dedels (Deactivated) |
Engineering Team(s) Involved | C&E Taxonomy DATA QUALITYEDAC |
Customer/User Job-to-be-Done or Problem
The legacy Burning Glass company names known as “Canon” are what our classifier is currently ingesting as its version of “raw_name”. Canon names are dependent on NOVA and are a derivative of true raw names which are free of the NOVA constraints.
To-do
Do a coverage loss comparison of the companies classifier where it ingests “raw” vs “canon”
Assess where the company classifier is falling short while ingesting “raw”
Make adjustments to the classifier where possible to bring coverage of “raw” processed to a similar parody to “canon” processed
Where the difference is a granular good change, explain the change
Value to Customers & Users
None
Value to Lightcast
Canada should be prioritised in this PI over UK
Removing a large NOVA/LENS dependency for UK postings
currently UK company values are derived from NOVA/LENS canon_intermediary and canon_employer values
Cost saving
New features can follow post end of above dependency
Target User Role/Client/Client Category
All using JPA & Profiles
Delivery Mechanism
COMPANY_RAW_NAME
field will be populated from scraped/raw company namesCOMPANY_ACME
field will be populated by using ACME and normalizing fromCOMPANY_RAW_NAME
fieldsince values will populate an existing field, no changes are needed from Micro/API, Analyst, Snowflake, etc. Delivery mechanism for clients will be unchanged from their end.
Once switch is made by documents, all other teams will automatically get updated data
Will need to message clients about changes before switch
Success Criteria & Metrics UK
Accuracy of coding for employers must be retained. To be able to ensure we have the right number of postings allocated to each company change must be within -5% to +10% for top 1000 employers across UK
Ensure that each NUTS1 specific top employers (top 25), excluding the 1000 above are within -5% to +10 change range
Documented explanations above +/-10%
Recall to be in line 70% as minimum should be observed
Industry impact needs to be measured, and industry change within 2 digit code, due to this change, needs to be within -5% to +10%
In addition to the above we should go though the client fixes made in 2022 and check that these are still valid as this will annoy clients if they are not, this must be done prior to go live.
Success Criteria & Metrics CAN
Accuracy of coding for companies must be retained. To be able to ensure we have the right number of postings allocated to each company change must be within -5% to +10% for top 1000 companies across CA
Ensure that each Canada Provinces specific top employer (top 25), excluding the 1000 above are are within -5% to +10 change range with explanations for all that are not falling within this criteria
Documented explanations above +/-10%
Recall to be in line 78% as minimum should be observed
Industry impact needs to be measured, and industry change within 2 digit code, due to this change, needs to be within -5% to +10%
In addition to the above we should go though the client fixes made in 2022 and check that these are still valid as this will annoy clients if they are not, this must be done prior to go live.
Definition:
Company = any company
Employer = non-staffing
Aspects that are out of scope (of this phase)
Customer go live for UK/CA ???
How many companies will be outside the tagging criteria (extending or subtracting time from the overall curation process)
How well the company extractor will perform on UK data
High-Level Rollout Strategies
rc_acme
fields.rc_acme
fieldsrc_acme
populate company_acme
field on UK jpa, CA jpaRisks
Focus on risks unique to this feature, not overall delivery/execution risks.
Open Questions
What are you still looking to resolve?
Instructions need to be tightened up:
https://economicmodeling.atlassian.net/wiki/spaces/DQ/pages/2631729390
Complete with Engineering Teams
Team Effort
Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.
Team | Effort Estimate (T-shirt sizes) | Jira Link |
|
---|---|---|---|
EDAC | XL |
|
|
TLC |
L | https://economicmodeling.atlassian.net/browse/TX-19 https://economicmodeling.atlassian.net/browse/TX-20 |
|
Data Solutions | XL |
| |
Documents | S |
| |
Micro |
|
|
|
Analyst |
|
|
|
Snow |
|
|
|