CyxWiz LogoCyxWiz
DocsData Science

Data Science Tools

Tools for data exploration, quality assessment, and understanding your datasets.

Available Tools

ToolDescriptionAccess
Data ProfilerComprehensive data overviewTools > Data Science > Data Profiler
Correlation MatrixFeature relationship analysisTools > Data Science > Correlation Matrix
Missing Value AnalysisHandle missing dataTools > Data Science > Missing Values
Outlier DetectionFind anomaliesTools > Data Science > Outlier Detection

Data Profiler

Generate comprehensive reports about your dataset including data types, statistics, distributions, and quality metrics.

Features

  • Overview Statistics: Row/column counts, memory usage
  • Data Types: Automatic type detection
  • Missing Data: Percentage and patterns
  • Distributions: Histograms for numeric columns
  • Unique Values: For categorical columns
  • Warnings: Data quality issues

Correlation Matrix

Visualize relationships between numerical features to identify redundant or highly related variables.

Correlation Methods

MethodDescriptionBest For
PearsonLinear relationshipsNormally distributed
SpearmanMonotonic relationshipsOrdinal data, non-linear
KendallRank-basedSmall samples, ties

Missing Value Analysis

Identify, visualize, and handle missing data in your dataset.

Imputation Methods

MethodDescriptionUse When
DropRemove rowsFew missing, MCAR
MeanReplace with meanNormal distribution
MedianReplace with medianSkewed distribution
ModeMost frequentCategorical data
Forward FillPrevious valueTime series

Outlier Detection

Identify unusual data points that may be errors or interesting anomalies.

Detection Methods

MethodDescriptionParameters
Z-ScoreStandard deviations from meanThreshold (default: 3)
IQRInterquartile rangeMultiplier (default: 1.5)
Isolation ForestEnsemble anomaly detectionContamination (0.01-0.5)
LOFLocal Outlier Factorn_neighbors (default: 20)

Common Workflows

Exploratory Data Analysis (EDA)
  1. Data Profiler - Get overview
  2. Missing Values - Check and handle
  3. Outlier Detection - Identify anomalies
  4. Correlation Matrix - Find relationships
Data Cleaning Pipeline
  1. Handle missing values
  2. Remove or cap outliers
  3. Check distributions
  4. Verify correlations