Did you know that you can navigate the posts by swiping left and right?

Data Science NLP Course in Python Notebooks

21 Feb 2020 . Category . Comments #research #education-research #datawhys #data-science #machine-learning #programming #statistics

We have created a mini NLP for data science course in Python using JupyterLab notebooks. The course assumes students have already completed our core course. The course is divided into 8 modules, most of which are split into a partial worked example and problem solving notebooks.

The partial worked examples don’t show the solutions but rather give step by step directions to create the code. The problem solving notebooks are largely isomorphic but don’t include the directions.

The notebooks are designed to be used with Blockly but can be used without it. Blockly is a visual programming language primarily designed to teach coding.

The course outline covers core NLP in data science topics:

  • Text as data
    • Text-as-data.ipynb
  • Preprocessing text
    • Preprocessing-text.ipynb
    • Preprocessing-text-PS.ipynb
  • Descriptive statistics: length-based metrics
    • Descriptive-statistics-length
    • Descriptive-statistics-length-PS
  • Descriptive statistics: distribution-based metrics
    • Descriptive-statistics-distribution
    • Descriptive-statistics-distribution-PS
  • Vectorization and weighting
    • Vectorization-weighting
    • Vectorization-weighting-PS
  • Single word transformations
    • Single-word-transformations
  • Multi-word transformations
    • Multiword-transformations
  • Latent variable vectorization
    • Latent-variable-vectorization