Art of Data

2.5 Cleaning and Preparing Data

Once we’ve obtained data from our samples, we have to clean and prepare the data so that we can write programs to analyze it.

Cleaning Data (slides)

What does messy data look like? How do we evaluate the quality of data, and how do we get to good data from messy data?

⊕ validity, accuracy, completeness, consistency, uniformity
⊕ standardization
⊕ data validation
⊕ what to do with: missing data, unwanted data, outliers