DocsData Science
Data Science Tools
Tools for data exploration, quality assessment, and understanding your datasets.
Available Tools
| Tool | Description | Access |
|---|---|---|
| Data Profiler | Comprehensive data overview | Tools > Data Science > Data Profiler |
| Correlation Matrix | Feature relationship analysis | Tools > Data Science > Correlation Matrix |
| Missing Value Analysis | Handle missing data | Tools > Data Science > Missing Values |
| Outlier Detection | Find anomalies | Tools > Data Science > Outlier Detection |
Data Profiler
Generate comprehensive reports about your dataset including data types, statistics, distributions, and quality metrics.
Features
- Overview Statistics: Row/column counts, memory usage
- Data Types: Automatic type detection
- Missing Data: Percentage and patterns
- Distributions: Histograms for numeric columns
- Unique Values: For categorical columns
- Warnings: Data quality issues
Correlation Matrix
Visualize relationships between numerical features to identify redundant or highly related variables.
Correlation Methods
| Method | Description | Best For |
|---|---|---|
| Pearson | Linear relationships | Normally distributed |
| Spearman | Monotonic relationships | Ordinal data, non-linear |
| Kendall | Rank-based | Small samples, ties |
Missing Value Analysis
Identify, visualize, and handle missing data in your dataset.
Imputation Methods
| Method | Description | Use When |
|---|---|---|
| Drop | Remove rows | Few missing, MCAR |
| Mean | Replace with mean | Normal distribution |
| Median | Replace with median | Skewed distribution |
| Mode | Most frequent | Categorical data |
| Forward Fill | Previous value | Time series |
Outlier Detection
Identify unusual data points that may be errors or interesting anomalies.
Detection Methods
| Method | Description | Parameters |
|---|---|---|
| Z-Score | Standard deviations from mean | Threshold (default: 3) |
| IQR | Interquartile range | Multiplier (default: 1.5) |
| Isolation Forest | Ensemble anomaly detection | Contamination (0.01-0.5) |
| LOF | Local Outlier Factor | n_neighbors (default: 20) |
Common Workflows
Exploratory Data Analysis (EDA)
- Data Profiler - Get overview
- Missing Values - Check and handle
- Outlier Detection - Identify anomalies
- Correlation Matrix - Find relationships
Data Cleaning Pipeline
- Handle missing values
- Remove or cap outliers
- Check distributions
- Verify correlations