Alumni Pathways: New Data & Infrastructure (end-to-end updates)
Target PI | PI2 PI3 |
---|---|
Created Date | Oct 31, 2022 |
Target Release | Q1 2023 |
Jira Epic | Micro: https://economicmodeling.atlassian.net/browse/MIC-1628 Analyst Red: https://economicmodeling.atlassian.net/browse/ARK-9001 RaPTOR: https://economicmodeling.atlassian.net/browse/RT-2531 CDOT: https://economicmodeling.atlassian.net/browse/AOD-607 , https://economicmodeling.atlassian.net/browse/AOD-608 |
Document Status | Done |
Epic Owner | @Lendl Meyer (Deactivated) |
Stakeholder | @Dave Wallace (Deactivated) @Kaleb Trotter |
Engineering Team(s) Involved | Micro, Analyst Red, RaPTOR, CDOT |
Customer/User Job-to-be-Done or Problem
As a product manager responsible for Alumni Pathways, I want to ensure we are able to collect, process, and connect all the relevant data sources and deliver meaningful data in the Alumni Pathways interface on the Analyst platform.
As a customer of Alumni Pathways, I want access to all the relevant data about each of my alumni including data provided by my institution (contact info, demographics, program, grad year), data provided by NSC (enrollments and completions at my and other institutions), and Lightcast profile data (jobs, employers, and occupations), so that I can measure the efficacy of career-oriented support and interventions being provided by my institution.
Value to Customers & Users
This is a stepping-stone on the way to the full value of Alumni Pathways by upgrading our end-to-end data pipeline for Alumni Pathways. It is the foundation needed to deliver value to a variety of customer roles through new features envisioned for Alumni Pathways. This is foundational, critical-path work that we need to unlock the value to customers and Lightcast outlined in AO 2.0: Build the Alumni Outcomes 2.0 Foundation
Value to Lightcast
See value to Customers & Users section above.
Target User Role/Client/Client Category
Institutional Research, Academic Leadership, Enrollment Marketing, and Advancement/Foundations teams across all current HE segments.
Delivery Mechanism
Multiple – see Solution Description below.
Success Criteria & Metrics
Definition of done:
A. We can work from an NSC-provided file end-to-end to accurate and functioning reports in Analyst
In the event we don’t have NSC-provided data, we can still work end-to-end to accurate and functioning reports in Analyst
B. Accurately handle higher volume of records received by Alumni Pathways
Any calculations (esp. percentages) and/or descriptions that use or refer to the volume of records accurately reflect the higher volume and distinction between matched/unmatched records
C. No broken packets, reports, and/or other functionality on Alumni Pathways
Descriptive error messages if the Alumni Pathways front-end does not receive the expected information
D. Alumni Pathways dynamically shows or hides NSC-specific elements based on whether NSC is a data source
Aspects that are out of scope (of this phase)
We expect to add additional fields (likely from all three sources) over the course of Alumni Pathways development. While our approach should be flexible, only the fields called out in the solution description below are in scope for this epic.
Related Epic:
https://economicmodeling.atlassian.net/wiki/spaces/DPM/pages/2483060740 - Documents team collaboration with CDOT to improve match rate in the CDOT data processing step
Solution Description
Multiple data structure changes are happening that need to be handled end-to-end throughout the entire data flow (as follows):
Customers and/or National Student Clearinghouse (NSC) provide data to CDOT (David Wallace’s team)
CDOT processes the data
CDOT uploads the processed dataset and account information to Micro’s Database + API using Raptor’s upload script(s)
Micro delivers the new dataset and account information to Analyst Red via new (and possibly updated) endpoints
Analyst Red displays the new data in customer-facing reports on the Alumni Pathways vertical (on Analyst)
The types of data structure changes include:
A new data source (NSC) is used for most (but not all) Alumni Pathways customers
New (additional) data fields provided by CDOT on a matched data set (see “Fields from David” below)
A new Database + API structure:
Supporting greater flexibility on uploading/storing schemas
Having separate database entities for “Accounts” and “Matched Data Sets”
Indicating whether an account and/or matched data set was generated using NSC data
More records per matched set
Both matched (as before) and unmatched (new) records will be included
The Alumni Pathways vertical on the Analyst platform will need to handle the new changes, including (at least):
No broken packets, reports, and/or other existing functionality
Updating any calculations (esp. percentages) and/or descriptions to reflect the higher volume of records
Dynamically showing/hiding NSC-specific elements depending on whether NSC is a data source
This epic captures the Alumni Pathways updates required to handle the new data structure.
Fields from David = New (added) data fields provided by CDOT on a matched data set
inst_rowid = lc_rowid as the key (currently AO 1.0 is probably using match_id field as the key)
inst_studentidemsi = lc_person_id (unique identifier for a person from a particular institution, regardless of data source; also ties back to institution data when we don’t want to use PII)
inst_studentid = inst_student_id_pii can be empty (from institution, so this may be considered PII at most institutions, but may be more complicated when getting it from NSC…it will be coming from their Your Unique ID field, but we have found at least one instance where the institution gave NSC an identifier that didn’t match what the customer gave us as the “student ID”.)
inst_alumniid = inst_alumni_id_pii can be empty (from institution’s alumni database)
profile_id = profile_id (ties back to profile data)
inst_educationstatus = inst_education_status (went on for more education or not or unknown …see questions below)
matchstatus = match_status (currently Boolean or “Match”/“No Match”…see questions below)
Early UX (wireframes or mockups)
The current Alumni Pathways vertical serves as the base case for this work. User experience changes will:
Hide existing elements (e.g. citations of NSC as a data source)
Correct calculations/descriptions that are record-dependent.
Non-Functional Attributes & Usage Projections
Consider performance characteristics, privacy/security implications, localization requirements, mobile requirements, accessibility requirements
All end-to-end changes will need to comply with our contractual requirements described in a previous epic ( AO 2.0: Meet Contractual Requirements for National Student Clearinghouse Data )
Dependencies
Preceding Alumni Pathways epics:
Legal and Ethical Considerations
Just answer yes or no.
High-Level Rollout Strategies
Depending on the timeline for this work, it is possible we will have customers in the Alumni Pathways platform. We don’t plan to formally announce these changes, but if they break anything, that will be visible to customers.
Risks
Focus on risks unique to this feature, not overall delivery/execution risks.
All end-to-end changes will need to comply with our contractual requirements described in a previous epic ( AO 2.0: Meet Contractual Requirements for National Student Clearinghouse Data )
Open Questions
What are you still looking to resolve?
The following questions relate to this epic, but are not dependencies for this epic and are expected to be addressed in a subsequent epic to be written:
How do we present “outcomes” or “pathways” to the customer, now that every dataset will have NSC data? The context of “matched”/”not matched” is no longer a clear question of matched profiles, but instead could be extended to NSC records too, as long as we don’t obscure the difference entirely. Factors include questions of what counts as an outcome, including how do we prioritize education (NSC subsequent enrollments, graduations, or both) and Lightcast profile employments, and the timeframes involved (before or during or after grad year, for example; or do we look only at the most recent employment and education and see which one started later?).
One example of possible outcomes status for students (likely based on two other fields, one for employment status and one for education status?), but not finalized (e.g., how much does start date of the most recent employment matter, e.g., before/after graduation? We have an indicator for that, which defaults to only displaying those where it is TRUE):NSC enrollment exists in current academic year = “Currently Enrolled”
NSC enrollment does not exist in current academic year, and most recent employment end date in Lightcast profiles is null = “Currently Employed”
NSC enrollment does not exist in current academic year, and most recent employment end date in Lightcast profiles is not null = “Previously Employed”
NSC enrollment exists in an academic year subsequent to the grad year from the original institution = “Previously Enrolled”
Found in Lightcast profiles, but no subsequent enrollment found in NSC and no identifiable employment found = “No Further Education”
Not found in Lightcast profiles, but found in NSC with no subsequent enrollment = “Unknown Employment Status”
Not found in Lightcast profiles and not found in subsequent enrollment data from NSC = “No Record Found”
What additional NSC data is useful to the customer for JTBD that we plan to support? Current field “inst_educationstatus” is simple and precalculated, and fields for most recent enrollment and most recent graduation may or may not be sufficient to answer key questions. We could provide much more robust reporting if we identified particular questions that we plan to answer and then design a way to upload the data that we need for each, such as:
nscenroll_… (various possible, such as …collegecode, …collegestate, …2or4year, …publicorprivate, …statusname, …begindate, …enddate, …collegesequence, or standardized version of these fields; fields for classlevelname, major, and cip appear to be empty)
nscgrad_… (various possible, such as …collegecode, …collegestate, …2or4year, …publicorprivate, …graduationdate, …degreetitle, …degreemajor, …degreecip, …collegesequence, or standardized version of these fields)
Complete with Engineering Teams
Effort Size Estimate |
---|
Estimated Costs
Direct Financial Costs
Are there direct costs that this feature entails? Dataset acquisition, server purchasing, software licenses, etc.?
Team Effort
Each team involved should give a general t-shirt size estimate of their work involved. As the epic proceeds, they can add a link to the Jira epic/issue associated with their portion of this work.
Back-End
Assuming:
our previous ticket defines boundaries well
front-end application team knows what they want/need
we’re focused on research & architecture design
we have the form of the data in the uploader
we have the database ready to send info to the front-end team
T-Shirt estimate size: Medium (1-2 weeks) based on https://economicmodeling.atlassian.net/wiki/spaces/DPM/pages/2506981423 (not including large investment of time from Raptor and C-DOT)
Discussions with front-end application team will be needed to identify the desired approach for ease of API consumption, performance, etc.
Team | Effort Estimate (T-shirt sizes) | Jira Link | Other Notes (from 3iab) |
---|---|---|---|
Micro |
| https://economicmodeling.atlassian.net/browse/MIC-1628 Best case: 1/2 of PI 2 Worst case: all of PI 2 Now until PI 2 Sprint #1
PI 2 Sprint #1
PI 2 Sprint #2
PI 2 Sprint #3
| Additions & enhancements to Elastic Search Database & API endpoints
Notes from Chris 2/20/23
|
Analyst Red | Previous estimate (PI 1): Medium
Updated (for PI 2): XL (given unknowns on changes) | https://economicmodeling.atlassian.net/browse/ARK-9001 Previous notes (PI 1):
|
ARK-9114 (New homepage section dynamic) |
RaPTOR | Extra Large |
| |
CDOT | Extra Large | https://economicmodeling.atlassian.net/browse/AOD-607 https://economicmodeling.atlassian.net/browse/AOD-608 |
|
Small (1 week) |
|