/
NOVA Bypass UK & CA ACME

NOVA Bypass UK & CA ACME

 

Created Date

Jan 31, 2023

Target PI

PI 3

Target Release

 

Jira Epic

https://economicmodeling.atlassian.net/browse/TX-19https://economicmodeling.atlassian.net/browse/DT-3621 https://economicmodeling.atlassian.net/browse/DATA-1231 https://economicmodeling.atlassian.net/browse/TX-20

Document Status

Draft

Epic Owner

@Thomas Worden

Stakeholder

@Abby Santos @Matt McNair @Tatiana Harrison @Chris Dedels (Deactivated)

Engineering Team(s) Involved

C&E Taxonomy DATA QUALITYEDAC

Customer/User Job-to-be-Done or Problem

The legacy Burning Glass company names known as “Canon” are what our classifier is currently ingesting as its version of “raw_name”. Canon names are dependent on NOVA and are a derivative of true raw names which are free of the NOVA constraints.

To-do

  • Do a coverage loss comparison of the companies classifier where it ingests “raw” vs “canon”

    • Assess where the company classifier is falling short while ingesting “raw”

  • Make adjustments to the classifier where possible to bring coverage of “raw” processed to a similar parody to “canon” processed

  • Where the difference is a granular good change, explain the change

 

Value to Customers & Users

None

Value to Lightcast

  • Canada should be prioritised in this PI over UK

  • Removing a large NOVA/LENS dependency for UK postings

    • currently UK company values are derived from NOVA/LENS canon_intermediary and canon_employer values

    • Cost saving

  • New features can follow post end of above dependency

Target User Role/Client/Client Category

All using JPA & Profiles

Delivery Mechanism

  • COMPANY_RAW_NAME field will be populated from scraped/raw company names

  • COMPANY_ACME field will be populated by using ACME and normalizing from COMPANY_RAW_NAME field

  • since values will populate an existing field, no changes are needed from Micro/API, Analyst, Snowflake, etc. Delivery mechanism for clients will be unchanged from their end.

    • Once switch is made by documents, all other teams will automatically get updated data

    • Will need to message clients about changes before switch

Success Criteria & Metrics UK

  • Accuracy of coding for employers must be retained. To be able to ensure we have the right number of postings allocated to each company change must be within -5% to +10% for top 1000 employers across UK

  • Ensure that each NUTS1 specific top employers (top 25), excluding the 1000 above are within -5% to +10 change range

  • Documented explanations above +/-10%

  • Recall to be in line 70% as minimum should be observed

  • Industry impact needs to be measured, and industry change within 2 digit code, due to this change, needs to be within -5% to +10%

  • In addition to the above we should go though the client fixes made in 2022 and check that these are still valid as this will annoy clients if they are not, this must be done prior to go live.

     

Success Criteria & Metrics CAN

  • Accuracy of coding for companies must be retained. To be able to ensure we have the right number of postings allocated to each company change must be within -5% to +10% for top 1000 companies across CA

  • Ensure that each Canada Provinces specific top employer (top 25), excluding the 1000 above are are within -5% to +10 change range with explanations for all that are not falling within this criteria

  • Documented explanations above +/-10%

  • Recall to be in line 78% as minimum should be observed

  • Industry impact needs to be measured, and industry change within 2 digit code, due to this change, needs to be within -5% to +10%

  • In addition to the above we should go though the client fixes made in 2022 and check that these are still valid as this will annoy clients if they are not, this must be done prior to go live.

 

Definition:

Company = any company

Employer = non-staffing

Aspects that are out of scope (of this phase)

  • Customer go live for UK/CA ???

  • How many companies will be outside the tagging criteria (extending or subtracting time from the overall curation process)

  • How well the company extractor will perform on UK data

 

High-Level Rollout Strategies

Tag new ACME values onto documents under rc_acme fields.
Curate data to match success criteria per company for data quality of rc_acme fields
Document all changes that will occur above10% on UK/CA jpa for clients, CS, etc.
document any major changes that will occur for clients, CS, etc.
have values from rc_acme populate company_acme field on UK jpa, CA jpa

Risks

Focus on risks unique to this feature, not overall delivery/execution risks. 

 

Open Questions

What are you still looking to resolve?

Instructions need to be tightened up:

https://economicmodeling.atlassian.net/wiki/spaces/DQ/pages/2631729390


Complete with Engineering Teams

 

Team Effort

Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.

Team

Effort Estimate (T-shirt sizes)

Jira Link