Skip to main content
Practice

Responsible Data Use: Ethics and Privacy Protection

Reckless data analysis without considering privacy and fairness can cause significant harm to businesses.

For example, in 2019, Google agreed to pay a $170 million fine to the U.S. Federal Trade Commission (FTC) for collecting data from children on YouTube without obtaining proper consent.

Ethical and responsible data usage is an essential skill for every data analyst.


What Should You Consider for Ethical Data Use?

When analyzing data, always review the following key points:

  • Privacy: Are personally identifiable details safely protected and not exposed?
  • Consent: Was proper consent obtained when collecting the data?
  • Bias: Is the dataset skewed or underrepresenting certain groups?
  • Security: Is the data stored and managed securely?

Sensitive information such as names, emails, and ages must be collected legally and should generally go through anonymization before analysis or sharing.


What Is Anonymization?

When working with sensitive data, analysts often perform anonymization.

Anonymization is the process of removing or masking personally identifiable information so that individuals cannot be identified.


Example: Anonymizing Personal Data

Here's a simple Python example demonstrating how to anonymize names in personal data:

Anonymizing Personal Data
# Example data containing names and ages
data = [
{"name": "Lina", "age": 25},
{"name": "Marcus", "age": 30}
]

# Replace names with a generic placeholder to protect privacy
for person in data:
person["name"] = "REDACTED" # Anonymize the name

# Print anonymized data
print(data)
  • The example dataset includes names and ages collected through a survey.
  • To protect privacy, names are replaced with "REDACTED".
  • This is a common first step before sharing or analyzing sensitive data.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.