Target PI	PI2 PI3
Created Date	Oct 31, 2022
Target Release	Q1 2023
Jira Epic	Micro: https://economicmodeling.atlassian.net/browse/MIC-1628 Analyst Red: https://economicmodeling.atlassian.net/browse/ARK-9001 RaPTOR: https://economicmodeling.atlassian.net/browse/RT-2531 CDOT: https://economicmodeling.atlassian.net/browse/AOD-607 , https://economicmodeling.atlassian.net/browse/AOD-608
Document Status	Done
Epic Owner	@Lendl Meyer (Deactivated)
Stakeholder	@Dave Wallace (Deactivated) @Kaleb Trotter
Engineering Team(s) Involved	Micro, Analyst Red, RaPTOR, CDOT

Customer/User Job-to-be-Done or Problem

As a product manager responsible for Alumni Pathways, I want to ensure we are able to collect, process, and connect all the relevant data sources and deliver meaningful data in the Alumni Pathways interface on the Analyst platform.

As a customer of Alumni Pathways, I want access to all the relevant data about each of my alumni including data provided by my institution (contact info, demographics, program, grad year), data provided by NSC (enrollments and completions at my and other institutions), and Lightcast profile data (jobs, employers, and occupations), so that I can measure the efficacy of career-oriented support and interventions being provided by my institution.

Value to Customers & Users

This is a stepping-stone on the way to the full value of Alumni Pathways by upgrading our end-to-end data pipeline for Alumni Pathways. It is the foundation needed to deliver value to a variety of customer roles through new features envisioned for Alumni Pathways. This is foundational, critical-path work that we need to unlock the value to customers and Lightcast outlined in AO 2.0: Build the Alumni Outcomes 2.0 Foundation

Value to Lightcast

See value to Customers & Users section above.

Target User Role/Client/Client Category

Institutional Research, Academic Leadership, Enrollment Marketing, and Advancement/Foundations teams across all current HE segments.

Delivery Mechanism

Multiple – see Solution Description below.

Success Criteria & Metrics

Definition of done:

A. We can work from an NSC-provided file end-to-end to accurate and functioning reports in Analyst

In the event we don’t have NSC-provided data, we can still work end-to-end to accurate and functioning reports in Analyst

B. Accurately handle higher volume of records received by Alumni Pathways

Any calculations (esp. percentages) and/or descriptions that use or refer to the volume of records accurately reflect the higher volume and distinction between matched/unmatched records

C. No broken packets, reports, and/or other functionality on Alumni Pathways

Descriptive error messages if the Alumni Pathways front-end does not receive the expected information

D. Alumni Pathways dynamically shows or hides NSC-specific elements based on whether NSC is a data source

Aspects that are out of scope (of this phase)

We expect to add additional fields (likely from all three sources) over the course of Alumni Pathways development. While our approach should be flexible, only the fields called out in the solution description below are in scope for this epic.

Related Epic:

https://economicmodeling.atlassian.net/wiki/spaces/DPM/pages/2483060740 - Documents team collaboration with CDOT to improve match rate in the CDOT data processing step

Solution Description

Multiple data structure changes are happening that need to be handled end-to-end throughout the entire data flow (as follows):

Customers and/or National Student Clearinghouse (NSC) provide data to CDOT (David Wallace’s team)
CDOT processes the data
CDOT uploads the processed dataset and account information to Micro’s Database + API using Raptor’s upload script(s)
Micro delivers the new dataset and account information to Analyst Red via new (and possibly updated) endpoints
Analyst Red displays the new data in customer-facing reports on the Alumni Pathways vertical (on Analyst)

The types of data structure changes include:

A new data source (NSC) is used for most (but not all) Alumni Pathways customers
New (additional) data fields provided by CDOT on a matched data set (see “Fields from David” below)
A new Database + API structure:
- Supporting greater flexibility on uploading/storing schemas
- Having separate database entities for “Accounts” and “Matched Data Sets”
- Indicating whether an account and/or matched data set was generated using NSC data
More records per matched set
- Both matched (as before) and unmatched (new) records will be included

The Alumni Pathways vertical on the Analyst platform will need to handle the new changes, including (at least):

No broken packets, reports, and/or other existing functionality
Updating any calculations (esp. percentages) and/or descriptions to reflect the higher volume of records
Dynamically showing/hiding NSC-specific elements depending on whether NSC is a data source

This epic captures the Alumni Pathways updates required to handle the new data structure.

Fields from David = New (added) data fields provided by CDOT on a matched data set

inst_rowid = lc_rowid as the key (currently AO 1.0 is probably using match_id field as the key)
inst_studentidemsi = lc_person_id (unique identifier for a person from a particular institution, regardless of data source; also ties back to institution data when we don’t want to use PII)
inst_studentid = inst_student_id_pii can be empty (from institution, so this may be considered PII at most institutions, but may be more complicated when getting it from NSC…it will be coming from their Your Unique ID field, but we have found at least one instance where the institution gave NSC an identifier that didn’t match what the customer gave us as the “student ID”.)
inst_alumniid = inst_alumni_id_pii can be empty (from institution’s alumni database)
profile_id = profile_id (ties back to profile data)
inst_educationstatus = inst_education_status (went on for more education or not or unknown …see questions below)
matchstatus = match_status (currently Boolean or “Match”/“No Match”…see questions below)

Early UX (wireframes or mockups)

The current Alumni Pathways vertical serves as the base case for this work. User experience changes will:

Hide existing elements (e.g. citations of NSC as a data source)
Correct calculations/descriptions that are record-dependent.

Non-Functional Attributes & Usage Projections

Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements

All end-to-end changes will need to comply with our contractual requirements described in a previous epic ( AO 2.0: Meet Contractual Requirements for National Student Clearinghouse Data )

Dependencies

Preceding Alumni Pathways epics:

Legal and Ethical Considerations

Just answer yes or no.

Have you thought through these considerations (e.g. data privacy) and raised any potential concerns with the Legal team?

High-Level Rollout Strategies

Depending on the timeline for this work, it is possible we will have customers in the Alumni Pathways platform. We don’t plan to formally announce these changes, but if they break anything, that will be visible to customers.

Risks

Focus on risks unique to this feature, not overall delivery/execution risks.

All end-to-end changes will need to comply with our contractual requirements described in a previous epic ( AO 2.0: Meet Contractual Requirements for National Student Clearinghouse Data )

Open Questions

What are you still looking to resolve?

The following questions relate to this epic, but are not dependencies for this epic and are expected to be addressed in a subsequent epic to be written:

How do we present “outcomes” or “pathways” to the customer, now that every dataset will have NSC data? The context of “matched”/”not matched” is no longer a clear question of matched profiles, but instead could be extended to NSC records too, as long as we don’t obscure the difference entirely. Factors include questions of what counts as an outcome, including how do we prioritize education (NSC subsequent enrollments, graduations, or both) and Lightcast profile employments, and the timeframes involved (before or during or after grad year, for example; or do we look only at the most recent employment and education and see which one started later?).
One example of possible outcomes status for students (likely based on two other fields, one for employment status and one for education status?), but not finalized (e.g., how much does start date of the most recent employment matter, e.g., before/after graduation? We have an indicator for that, which defaults to only displaying those where it is TRUE):
1. NSC enrollment exists in current academic year = “Currently Enrolled”
2. NSC enrollment does not exist in current academic year, and most recent employment end date in Lightcast profiles is null = “Currently Employed”
3. NSC enrollment does not exist in current academic year, and most recent employment end date in Lightcast profiles is not null = “Previously Employed”
4. NSC enrollment exists in an academic year subsequent to the grad year from the original institution = “Previously Enrolled”
5. Found in Lightcast profiles, but no subsequent enrollment found in NSC and no identifiable employment found = “No Further Education”
6. Not found in Lightcast profiles, but found in NSC with no subsequent enrollment = “Unknown Employment Status”
7. Not found in Lightcast profiles and not found in subsequent enrollment data from NSC = “No Record Found”
What additional NSC data is useful to the customer for JTBD that we plan to support? Current field “inst_educationstatus” is simple and precalculated, and fields for most recent enrollment and most recent graduation may or may not be sufficient to answer key questions. We could provide much more robust reporting if we identified particular questions that we plan to answer and then design a way to upload the data that we need for each, such as:

nscenroll_… (various possible, such as …collegecode, …collegestate, …2or4year, …publicorprivate, …statusname, …begindate, …enddate, …collegesequence, or standardized version of these fields; fields for classlevelname, major, and cip appear to be empty)
nscgrad_… (various possible, such as …collegecode, …collegestate, …2or4year, …publicorprivate, …graduationdate, …degreetitle, …degreemajor, …degreecip, …collegesequence, or standardized version of these fields)

Complete with Engineering Teams

Effort Size Estimate

Estimated Costs

Direct Financial Costs

Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?

Team Effort

Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.

Back-End

Assuming:
- our previous ticket defines boundaries well
- front-end application team knows what they want/need
- we’re focused on research & architecture design
- we have the form of the data in the uploader
- we have the database ready to send info to the front-end team
T-Shirt estimate size: Medium (1-2 weeks) based on https://economicmodeling.atlassian.net/wiki/spaces/DPM/pages/2506981423 (not including large investment of time from Raptor and C-DOT)
Discussions with front-end application team will be needed to identify the desired approach for ease of API consumption, performance, etc.

Team	Effort Estimate (T-shirt sizes)	Jira Link	Other Notes (from 3iab)

Team	Effort Estimate (T-shirt sizes)	Jira Link	Other Notes (from 3iab)
Micro		https://economicmodeling.atlassian.net/browse/MIC-1628 Best case: 1/2 of PI 2 Worst case: all of PI 2 Now until PI 2 Sprint #1 Size: N/A Deliverables: Schema shared with Analyst Red & Lendl None Expected - focused on Engineering Excellence & Planning May pull other deliverables forward in week 2 PI 2 Sprint #1 Size: M (1-2 weeks) Deliverables: Committed for sprint 1: Schema Upload Process (ready for use by Raptor) Data Upload Process (ready for use by Raptor) Best cast in sprint 1 / worst case sprint 2: Populated with dummy data (in partnerships with Raptor team) Application API consumption endpoint changes (ready for use by Analyst Red) Main changes expected are enpoint name changes in API wrappers New fields should just work because it’s in meta PI 2 Sprint #2 Size: M (1-2 weeks) Deliverables: Committed for sprint 2 Analyst able to see new meta & data (start of sprint as best case, end of sprint as worst case) Best cast in sprint 2 / worst case sprint 3: Address needs identified by Raptor and/or Analyst Red (fixes and/or requested changes) PI 2 Sprint #3 Size: M (1-2 weeks) or less Deliverables: Committed for sprint 3 Address needs identified by Raptor and/or Analyst Red (fixes and/or requested changes)	Additions & enhancements to Elastic Search Database & API endpoints Very Rough Estimate: 1-4x 1 week sprints [t-shirt sizing for PI?] Target completion in coming PI, with bulk finished earlier rather than later Dependency on scope of changes Timeline: Chris hoping to have info to David & Jason by end-of-week Dependencies: Sample data (in progress) Guidance on schema (provided last week) Some ad-hoc back and forth with David’s CDOT team to land this Flag PII fields in schema [tickets needed?] communicate & collaborate with CDOT, Raptor, and Analyst Red on API design specs Build new (and possibly updated) API endpoints as needed for Raptor and Analyst Red Verify that new test dataset and new test account information uploaded correctly First schema to be designed by Micro Subsequent tickets back/forth between Analyst Red / Raptor / CDOT Notes from Chris 2/20/23 Chris is only one building this (longer build time) Currently finalizing and testing new endpoints for upload Next step is running sample data through upload process The API now has dummies of all the data After initial upload process, more people from Micro will be able to help Endpoint naming structure updates Minimal changes to to front-end display Accounts, Schema, and Data are agnostic Any account can have any dataset A dataset can use any schema Chris will create and upload the first schema
Analyst Red	Previous estimate (PI 1): Medium Updated (for PI 2): XL (given unknowns on changes)	https://economicmodeling.atlassian.net/browse/ARK-9001 Previous notes (PI 1): Packets should all have already been duplicated Issues: Update Wrapper - M (Small + Risk) To verify: Structural changes? Likely to be breaking changes Research Structural Changes - S	Very Rough Estimate: 2-6x 2 week sprints (1-2 PIs) [t-shirt sizing for PI?] Timeline Dependency on Micro delivery timeline Dependency on scope of changes If Micro finishes in short order, higher likelihood that Analst can complete in PI 2 Dependencies: API spects (to get started) Production modified API (to finish) With representative data (to test with) Product guidance on expected changes Logic for what to show/hide re: NSC Calculations/descriptions [tickets needed?, e.g.] Identify and make front-end API call updates needed to handle new API specs, including testing If we switch URL structure (away from Org ID only), it adds work (lookups, logic, etc.) More vetting & safeguards likely needed. Identify and make front-end calculation and/or description changes needed to accurately reflect more records (incl. unmatched), including testing Identify and build new front-end API calls that show / hide NSC data as applicable, e.g. ARK-9099 (Reports dynamic credits for NSC) ARK-9114 (New homepage section dynamic)
RaPTOR	Extra Large (4-8 weeks)	https://economicmodeling.atlassian.net/browse/RT-2531	T-shirt sizing for PI 2-4x 2-week sprints (confirmed 2/9) Tickets needed Modify upload scripts (https://economicmodeling.atlassian.net/browse/RT-2531 ) Dependencies API specs (to get started) Production modified API (to finish) With representative data (to test with) Note: CDOT collaborating with Raptor weekly
CDOT	Extra Large (4-8 weeks)	https://economicmodeling.atlassian.net/browse/AOD-607 https://economicmodeling.atlassian.net/browse/AOD-608	T-shirt sizing for PI 2-4x 2-week sprints (new estimate 2/9) Tickets needed Alumni ID needs to be deliberately collected and preserved, whether or not it gets uploaded for the first few AP customers https://economicmodeling.atlassian.net/browse/AOD-608 Modify data process to support NSC inputs and generate additional fields (investment in PI 1 https://economicmodeling.atlassian.net/browse/AOD-574 , and expected continued investment in PI 2 to bring up to production grade - https://economicmodeling.atlassian.net/browse/AOD-607 ) Generate and upload representative a test dataset Generate and upload representative new account information Dependencies Ultimately getting NSC data (but have a pretty good template) Demo data will help (but won’t identify exceptions or problems; will need 12-ish to identify and work out subsequent issues) API specs (to get started)
CDOT	Small (1 week)		Build NSC generation of student input file Build NSC generation of NSC fields for standard for standard input files Work with Jason on ensuring clean student IDs and verifying clean alumni IDs.

Data Strategy

Alumni Pathways: New Data & Infrastructure (end-to-end updates)