Dangus's Blog

Home Page

Fleiss's Kappa - Data Labeling Validation in Machine Learning

Labeling by Agreement Consensus

Fleiss’s Kappa - The Key to Validating Data Labeling in Machine Learning The Problem to Solve Part of the power of Machine Learning relies on how data is labeled. It is crucial that, beforehand, ...

N-Grams - NLP in Python

Beyond single words, text insights with n-grams

Natural Language Processing with Python - N-Grams When we are analyzing a text is crucial to identify the words that are relevant. So we can assume that a word is more relevant if they appear mor...

Data Leakage in Data Science

The model cheats by seeing unwanted information

Data Leakage in Data Science [!cue] What is data leakage? Data Leakage in DS occurs when information from the test (or validation) set inadvertently “leaks” into the training process. It esse...

Gitds-flow - Daily Cheatsheet Work Flow

Cheat sheet to dominate gitds-flow

Gitds-flow Explanation Work Flow Detailed description of the gitds-flow work flow (Init): Repository with two principal branches: ⚛️main (production) and 🧪dev (development). New developme...

Gitds-flow - Data Science Git Work Flow

Introduction to gitds-flow

Gitds-flow (Data science git work flow) [!cue] This is a Git workflow proposed by me Daniel Guitron(aka: @danguitron) to easily have a minimalist way to work in your data science projects. ...

Decision Trees: Your Data’s Choose

Why You Should Too

🌳 Decision Trees: Your Data’s “Choose Your Own Adventure” Book (Spoiler: The Dragon is Overfitting) “Why Decision Trees Are Like Dating Apps” Imagine swiping left/right based on: 🐶 Pet pre...