Did you know that you can navigate the posts by swiping left and right?
We continued our virtual summer internship program in data science this year, finishing yesterday!
Although we originally intended for the internship to be face to face, we had to move online in 2020 due to COVID. This year, we included survey questions about being online to see how important it was for increasing access as well as the technology problems our interns faced.
All interns surveyed said that being virtual was important for their ability to participate. In fact we had two interns who participated from other cities. Unfortunately, internet connectivity was a problem for about half of our interns. This made it difficult for affected interns to fully participate in bandwidth-heavy activities like Zoom meetings. Going forward, our current plan is to have both face to face and synchronous virtual tracks so we can continue to increase access while providing the best experience for those who can be in person.
We changed our technology stack this year, ditching Slack and OKpy in favor of Discord and direct grading of notebooks on our JupyterHub. Discord made it easier for subgroups of students (and student mentors, a new role we introduced this year) to have simultaneous conversations in whatever format best suited their work, including screen sharing when needed to troubleshoot problems. We split the interns into pods, each with a student mentor who could provide some help in answering questions, and then faculty/graduate students manned the help rooms to answer more challenging problems. Instead of asking interns to upload their work to OKpy, we simply wrote a script that pulled the notebook in question from their home directories and then ran simple checks against it like how much they had completed. This let us identify problem spots, at which point we could open the corresponding notebook to try to see exactly what the problem was. A listing of the technology resources and our schedule is at https://intern.olney.ai/.
We kept the same instructional format as last year but made a few changes to the curriculum and lunch speaker series. On the curriculum, we added a few days of “programming basics” on the theory that this would give students a better grounding. Because the length of the internship stayed the same and we didn’t cut any instructional materials, we had to cut the project phase short by 1 week to compensate. I think this worked, but I’m not sure I’d want to do it this way again, because it made the projects much more time pressured than last year. Our lunch speaker series this year had better industry representation (3/8 were industry) which seems to have helped the interns a lot in terms of potentially pursuing a career in data science, according to our surveys.
This year we were very fortunate to have two community partners share their data with us so that we could have community-based final projects, one in harm reduction for substance abuse and the other in providing resources to survivors of domestic violence. Both partners were great to work with and gave our interns an opportunity to work on real world problems (which the interns found very motivating, according to our surveys) and to see real world data, which is typically very messy. In the case of these two projects, data for one had been manually keyed in from forms, and data for the other was primarily in the form of contact notes with clients. So substantial data cleaning and transformation were necessary for both projects. Because the latter has so much potential alignment with NLP, we also discussed creating a special curriculum module for NLP as part of our larger set of instructional materials.
I was pleased that we could double the number of interns this year and hope that we can grow the program again next year. It was a privilege to work with such talented students!
The DataWhys Project and internship are supported by the National Science Foundation through Grant 1918751 for to the University of Memphis.