/
Alumni Pathways: Next Generation Data Pipeline

Alumni Pathways: Next Generation Data Pipeline

 

Created Date

Mar 31, 2023

Last Major Update

Aug 24, 2023 (in preparation for PI 6)

Target PI

PI 3: Research (architecture), prioritization, planning, and begin building (not yet user-facing)
PI 4:

  • Docs (Hunter): Continue building with aggressive/stretch target of completing an end-to-end parallel process to what CDOT does today

  • Micro (Chris K.):

    • Add in new NSC fields in old version of the API

    • Finish out the work for schema v2

PI 5 & on: see https://docs.google.com/document/d/1Z__f34WEBTn5_qGuqyeVmv2pHX9ddJG1-ylcEebLkC8/edit

PI-6: see https://docs.google.com/document/d/1K11u825jxt2gmtA2JmgxhRAbiHasYkYekfNnDi6T_JA/edit?usp=sharing

Target Release

Base functionality completed in PI 5 (based on expectations set with customers & stakeholders that these would be released by the end of Q3 ‘23)

PI-6: Release targeted enhancements and improve quality to meet standards

Jira Epics

Documents -

CDOT - https://economicmodeling.atlassian.net/browse/AOD-740

Micro -

Analyst RED -

RAPTOR -

Document Status

REVIEW

Epic Owner

@Gavin Esser

Stakeholder

@Kaleb Trotter @Hunter Burk @Chris Kellogg @Dave Wallace (Deactivated) @Matt McNair

Engineering Team(s) Involved

Documents, Micro, Analyst RED / possibly: CDOT& RAPTOR

PART 1

Customer/User Job-to-be-Done or Problem

Lightcast’s Education business unit is working aggressively to ramp up Alumni Pathways sales and retention while simultaneously increasing data update frequencies. However, the data pipeline for Alumni Pathways was never intended to handle this combined load but instead has significant fragility arising from the scrappy way it was originally built (which heavily relies on Microsoft Excel and manual processes).

We need to build a next-generation data pipeline and swap it out before this becomes a bottleneck and/or breaking point for delivering Alumni Pathways. This is foundational / critical-path work to:

  1. enable more frequent data updates

  2. achieve the scale we are targeting (active customers x update frequency)

  3. support future features plus the new data structures and sources required for them

See this presentation for a visual summary of the vision for the new data pipeline versus its current state.

 

Value to Customers & Users

A next-generation data pipeline will benefit customers through:

  • Faster deliveries (shorter time between sale and when matched data is available in Alumni Pathways)

  • More frequent data updates (in Alumni Pathways)

  • Future software features that are dependent on this new pipeline

  • Future data sources that are dependent on this new pipeline

 

Value to Lightcast

A next-generation data pipeline will benefit Lightcast through:

  • The added value to customers (described above) supports sales acceleration, market penetration, and retention

  • Increased opportunity to add or integrate additional data sources

  • Reduced delivery risk (due to fragility and/or bottlenecks)

  • Reduced work-in-process (due to faster pipeline)

  • Reduced reliance on manual processes (reduced quality risks, reduced labor costs per customer)

 

Target User Role/Client/Client Category

Buyers/users (External):

  • Institutional Research, Academic, Enrollment Marketing, President’s Office, and Advancement/Alumni Relations/Foundation teams across all Education segments

Lightcast Internal Teams

  • Customer Delivery Operations Team (“CDOT”) and Documents are the primary internal users

 

Delivery Mechanism

Both the current and proposed new data pipelines will deliver data to Lightcast applications and/or customers via at least API. Adding delivery via snowflake would be a separate and future consideration.

We are intentionally moving away from delivering data pipeline outputs in static files (Excel/CSV) as part of our strategy to pull users towards using the dynamic software platforms, APIs, and potentially Snowflake instead.

 

Success Criteria & Metrics

Definition of Done

Alumni Pathways is powered by a new, next-generation data pipeline.

 

North Star Metric

Alumni Pathway’s current North Star Metric is Filtered Reports per Month by Account (as a proxy for answering institutional users' questions and supporting their work). We believe this new data pipeline will move that metric via:

  • Getting customers access to matched data (across multiple reports) sooner

  • Providing customers with a reason to regularly return to the tool (because data is updating more frequently)

Our target improvements for this metric are:

  • >10% more Filtered Reports per Month by Account within the first two months after the first feature is released

  • The higher level of Filtered Reports per Month by Account is sustained (e.g. does not drop back to previous levels over time)

  • Significant improvements to CDOT team workload and efficiency

 

Aspects that are out of scope (of this phase)

As currently described, this epic covers a minimum viable product (MVP) release of this new feature. Subsequent extensions and enhancements are not yet defined (or included).

Related Epics:
Each of these focus on optimizing one step/component of the new data pipeline; as such each potentially rolls up into this epic or represents parallel/supporting work for it.

 

PART 2

Solution Description

Documentation

In/for PI 3:

 

Early priority work items:

  1. Load the Profile data using a pre-defined schema.  Ever since the data was changed to JSON the data loads slowly.  Based on testing in other pipelines, we know that we can speed this up to near Parquet speeds by defining a schema when we load the data.  If you need help testing the code, reach out to Skye.  He knows how to run the project which is a little bit different because of how AO works.

  2. Change how we manage the encoding of the school's contact info file.  Talk to Dave Wallace to get more details.


In/for PI 3:
TBD

For PI-6:

  • Get information flowing through the pipeline to power customer data and deliverables (if not complete in PI-5)

  • Significant time spent closing the quality and tech debt gap to improve results for customers. The goal is to reach a similar level of quality to the current process and in later PI’s surpass it. Important measures to be thinking about here are:

    • Breadth: Number of profiles found and surfaced for the customer

    • Confidence/Quality: The level of confidence we have that each record contains correct information and is correctly matched to school records.

    • Depth: The amount of information we can provide about each individual alumni/profile

  • Enchanting and adding to the data points available for profiles in the alumni database to unlock new features and capabilities.

 

Early UX (wireframes or mockups)

N/A

 

Non-Functional Attributes & Usage Projections

Privacy / Security Implications

Customer- (and/or National Student Clearinghouse-) provided student records contain personally identifying information and are protected by:

  • FERPA regulations

  • [If applicable]: National Student Clearinghouse partnership contract terms

Any data transmissions, access, and usage on this project will require special safeguards including special/additional background checks by teams interacting with National Student Clearinghouse data.

 

Localization Requirements

Alumni Pathways is USA-only. Wherever National Student Clearinghouse data is used, development must be restricted to the USA only (due to security requirements in our partnership agreement with the Clearinghouse).

 

Performance Characteristics

As part of this work, the data pipeline performance should be optimized relative to its current state.

 

Dependencies

Finish-to-finish dependencies on:

 

Legal and Ethical Considerations

Just answer yes or no.

Have you thought through these considerations (e.g. data privacy) and raised any potential concerns with the Legal team?

High-Level Rollout Strategies

The new data pipeline will be used when ready and customers will see the new and improved data in the software and other deliverables that are fed by this pipeline (a similar release approach to new versions of Skills or other taxonomies).

Assuming we can achieve our target improvements, we will work with Sales, Marketing, and Success to promote the improvements.

 

Risks

Appropriate safeguards and correct handling of protected information (PII & National Student Clearinghouse)

 

Open Questions

What are you still looking to resolve?

 

 


Complete with Engineering Teams

 

Effort Size Estimate

Estimated Costs

Direct Financial Costs

Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?

  • Small cost (~$75/person) of National Student Clearinghouse-required background checks

Team Effort

Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.

Team

PI + Definition of Done

Effort Estimate (T-shirt sizes)

Jira Link

Notes

Team

PI + Definition of Done

Effort Estimate (T-shirt sizes)

Jira Link

Notes

DOCUMENTS

Completed: PI 3: Research (architecture), prioritization, planning, and begin building (not yet user-facing)

 

PI 4: Continue building with aggressive/stretch target of completing an end-to-end parallel process to what CDOT does today

 

 

 

CDOT

Completed: PI 3: Research (architecture), prioritization, planning, and begin building (not yet user-facing)

Medium (1-2 weeks)

https://economicmodeling.atlassian.net/browse/AOD-740

If stretch goal happens, then we will build NSC-match input revisions into our process

Stretch for PI 4: Test & confirm/finalize the inputs of new end-to-end parallel process

 

 

 

Micro

Completed: PI 3: Research (architecture), prioritization, planning, and begin building (not yet user-facing)

 

 

 

Planned for PI 4:

  • Add in new NSC fields in old version of the API

  • Finish out the work for schema v2

Stretch for PI 4: Test & confirm/finalize the outputs of new end-to-end parallel process

 

 

 

Analyst RED

Completed: PI 3: Research (architecture), prioritization, planning, and begin building (not yet user-facing)

 

 

 

Stretch for PI 4: Test & confirm/finalize the outputs of new end-to-end parallel process

 

 

 

RAPTOR

Completed: PI 3: Research (architecture), prioritization, planning, and begin building (not yet user-facing)

 

 

 

TBD - No work anticipated at this time, but there may be support / collaboration needs from CDOT and/or other teams

 

 

 

 

 

Related content