Data Input
Data Upload
Data Selection
📊 Data Quality Check
Data Structure
Missing Values by Column
✏️ Data Editor
🔍 Data Filtering
Filter Your Data
Select which rows to keep based on column values:
Transform Tools —style Stack/Unstack/Subsets
🧰 Data Transform
Preview
Unstack: one categorical 'factor' (e.g., Supplier A/B/C), one 'value' column (e.g., Output). Optionally provide an index/key to align rows.Preview
Stack: take multiple measurement columns and stack them into one long column, with an indicator of their origin.Preview
Build simple one-condition subsets quickly. For complex filtering, use your main Filtering box.Sample Size Calculator
Sample Size & Power Calculator
Plan your study with confidence - supports t-tests, ANOVA, proportions, correlation & more
STEP 1: Analysis Type
What are you comparing?
STEP 2: What to Calculate?
Choose your unknown
STEP 3: Basic Parameters
Standard settings
STEP 4: Effect Size
The difference you want to detect
STEP 5: Get Results
Results update automatically
Results calculate automatically as you change inputs
(2 second delay after last change)
RESULTS
Power Curve
Effect Size Guide: What Numbers Should I Use?
What is Effect Size?
Effect size = How big is the difference you want to detect?
Think of it like this: If you're looking for a needle in a haystack...
- Large effect: Looking for a sword (easy to find, need fewer samples)
- Medium effect: Looking for a key (moderate difficulty)
- Small effect: Looking for a needle (hard to find, need MANY samples)
How to Choose Your Effect Size
- Best approach: Use historical data or pilot study to estimate the real difference
- Six Sigma projects: Start with medium effect size for process improvements
- When unsure: Use small effect size (conservative, ensures adequate power)
- Breakthrough changes: You can expect large effects (e.g., new technology vs old)
Effect Size Reference Tables
Cohen's d — For Comparing Means (t-tests)
What it measures: The difference between two means, expressed in standard deviation units.
Formula: d = (Mean₁ - Mean₂) / Standard Deviation
| Size | d value | Real-World Example |
|---|---|---|
| Small | 0.2 | Height difference between 15 and 16 year old girls (~0.5 inch) |
| Medium | 0.5 | Height difference between 14 and 18 year old girls (~1.5 inch) |
| Large | 0.8 | Height difference between adult men and women (~2.5 inch) |
Cohen's f — For ANOVA (3+ Groups)
What it measures: The spread of group means relative to within-group variation.
When to use: Comparing 3 or more groups (e.g., 3 machines, 4 suppliers, 5 shift teams).
| Size | f value | Real-World Example |
|---|---|---|
| Small | 0.10 | Subtle difference between 4 suppliers (hard to notice visually) |
| Medium | 0.25 | Noticeable difference between machines (visible in box plots) |
| Large | 0.40 | Obvious difference between methods (anyone can see it) |
Cohen's w — For Chi-Square (Categorical Data)
What it measures: How much the observed proportions differ from expected.
When to use: Contingency tables, testing independence (e.g., defect type vs shift, pass/fail vs supplier).
| Size | w value | Real-World Example |
|---|---|---|
| Small | 0.10 | Slight preference in customer survey (51% vs 49%) |
| Medium | 0.30 | Clear pattern in defect distribution (60% vs 40%) |
| Large | 0.50 | Strong relationship (e.g., 75% defects from one machine) |
Correlation (r) — For Relationship Strength
What it measures: How strongly two variables move together (ranges from -1 to +1).
When to use: Testing if X and Y are related (e.g., temperature vs yield, training hours vs performance).
| Size | r value | Real-World Example |
|---|---|---|
| Small | ±0.10 | Weak link: coffee consumption vs productivity |
| Medium | ±0.30 | Moderate link: study time vs exam scores |
| Large | ±0.50 | Strong link: height vs weight, practice vs skill |
Cohen's f² — For Regression (R² Significance)
What it measures: How much variance in Y is explained by your predictors (X variables).
Formula: f² = R² / (1 - R²)
| Size | f² value | Equivalent R² | Meaning |
|---|---|---|---|
| Small | 0.02 | ~2% | Model explains little variance (but may still be useful) |
| Medium | 0.15 | ~13% | Model explains moderate variance (typical for social sciences) |
| Large | 0.35 | ~26% | Model explains substantial variance (strong predictive model) |
Quick Decision Guide: Which Effect Size Should I Use?
| Don't know what to expect? | → Use SMALL (conservative, won't under-power your study) | |
| Typical process improvement? | → Use MEDIUM (most common in Six Sigma) | |
| Major change or new technology? | → Use LARGE (breakthrough improvements) | |
| Have pilot data or historical data? | → CALCULATE your actual expected effect size! |
How Effect Size Impacts Sample Size
Example: Two-sample t-test at α=5%, Power=80%
| Effect Size | Cohen's d | n per group | Total N |
|---|---|---|---|
| Small | 0.2 | 393 | 786 |
| Medium | 0.5 | 64 | 128 |
| Large | 0.8 | 26 | 52 |
Notice: Detecting small effects requires 15x more samples than detecting large effects!
Analysis
Statistical Analysis
Six Sigma Inferential Statistics Tool
Enter your sample data to calculate confidence intervals or test hypotheses.
Perfect for DMAIC projects when you have collected measurements.
Step 1: Choose Your Analysis
Step 2: Enter Your Sample Data
Step 3: Analysis Settings
Results Summary
Visual Results
Detailed Analysis
Six Sigma Interpretation
Statistical Assumptions
Data Visualization
📊 Plot Mode
📋 Variable Selection
X-Axis Variable(s)
Y-Axis Variable(s) (Optional)
🎨 Choose Plot Type
📐 Layout Options
✨ Additional Mappings
🔍 Comparison Setup
📊 Distribution Overview
Advanced Visualization
📊 Advanced Plot Setup
📋 Data Requirements
🎯 Map Your Data Columns
🎨 Additional Mappings
📁 Upload Your Data
📋 Data Preview
💾 Export Plot
Statistical Process Control
Control Chart Selection Guide
Control Rules Selection
Select which rules to detect out-of-control conditions:
• Rule 1: Any point beyond control limits
• Rule 2: Process shift or bias detected
• Rule 3: Systematic trend in process
• Rule 4: Excessive variation or overcontrol
• Rule 5: Points near control limits
• Rule 6: Process moving away from center
Download Data Template
Control Charts
Process Statistics
Out of Control Signals
Pareto Analysis
Pareto Analysis Settings
Pareto Chart
Analysis Results
Pareto Summary
80/20 Analysis
Process Capability Analysis
Process Capability Analysis Settings
Process Capability Chart
Capability Metrics
Overall Capability
Potential (Within) Capability
Performance
Z Benchmark
Normal Probability Plot
Process Performance Metrics
Detailed Capability Analysis Results
Non-Normal Capability Analysis
Non-Normal Process Capability Analysis Settings
Non-Normal Process Capability Chart
Non-Normal Capability Analysis Results
Distribution Fitting Details:
About Non-Normal Capability Analysis
Non-normal capability analysis uses fitted distributions to properly calculate capability indices when data doesn't follow a normal distribution. Standard Cp and Cpk indices can lead to incorrect conclusions with non-normal data.
Metrics Provided:
- Z-bench: Calculates process capability from percentiles of the fitted distribution
- Pp(percentile): Process performance index based on percentiles
- Ppk(percentile): Process performance index taking into account process centering
- PPM (Parts Per Million): Expected defect rates based on the fitted distribution
Distribution Selection:
- Auto (Best Fit): Automatically selects the best-fitting distribution using Anderson-Darling statistic
- Manual Selection: Choose a specific distribution that might be appropriate for your process
Non-normal capability analysis is particularly important for processes with natural skewness, such as those with physical boundaries at zero (e.g., diameter, surface roughness).
Distribution Fitting
Distribution Fitting & Identification
Identify the best-fitting distribution for your data - Minitab-style analysis
STEP 1: Select Data
Choose numeric variable to analyze
STEP 2: Select Distributions
Choose distributions to fit (Minitab-style)
Symmetric Distributions
Reliability & Life Data
Right-Skewed Distributions
For Min/Max Data
For Data with Outliers
For Bounded Data
Data Transformations
STEP 3: Options (Optional)
Specification limits & settings
Specification Limits (for Capability)
Advanced Settings
DISTRIBUTION FITTING RESULTS
Distribution Rankings
Legend:
Interpretation Guide
Histogram with Fitted Distributions
Probability Plots
Q-Q Plot (Quantile-Quantile)
P-P Plot (Probability-Probability)
All Distributions - Q-Q Grid
Parameter Estimates
All Parameters Summary:
Percentile Estimates
Inverse Lookup: Find Percentile for a Value
Process Capability (Non-Normal)
Random Data Generation
Generated Data Summary:
Download DataHistogram of Generated Data:
Distribution Selection Guide
Continuous Data
- Normal: Symmetric, bell-shaped
- Lognormal: Right-skewed, positive values
- Gamma: Right-skewed, waiting times
Reliability/Lifetime
- Weibull: Failure times, bathtub curve
- Exponential: Constant failure rate
- Loglogistic: Accelerated life testing
Extreme Values
- Gumbel: Maximum values
- SEV: Minimum values
- Pareto: Heavy-tailed phenomena
Special Cases
- Beta: Bounded [0,1] proportions
- Uniform: Equal probability
- Cauchy: Heavy tails, no mean
Data Transformation
Data Transformation Tools
Before and After Transformation
Transformation Results
Normality Test Results:
Transformed Specification Limits:
Use these values for normal capability calculations:
One-Way ANOVA Settings
• Select a numeric response variable (continuous outcome)
• Select a categorical factor variable (groups to compare)
• Numeric variables with ≤10 unique values are included as potential factors
• For continuous predictors with >10 values, use regression analysis instead
Two-Way ANOVA Settings
• Select a numeric response variable (continuous outcome)
• Select two different categorical factor variables
• Numeric variables with ≤10 unique values are included as potential factors
• For continuous predictors with >10 values, use regression analysis instead
• Interaction term tests if the effect of one factor depends on the other
📊 Generalized ANOVA Settings
📈 Analysis Results
ℹ️ Generalized ANOVA Information
About Generalized ANOVA
Generalized ANOVA allows you to analyze the relationship between one continuous response variable and multiple factors and/or covariates.
- Factors: Categorical variables (groups)
- Covariates: Continuous variables used as controls
- Interactions: Test whether the effect of one variable depends on another
- Model Types: Choose between ANOVA, Linear Model, or Mixed Effects approaches
Model Interpretation
- Main Effects: Individual contribution of each factor/covariate
- Interaction Effects: Combined effects between variables
- F-statistic: Test of significance for each effect
- p-value < 0.05: Statistically significant effect
Gage R&R (Continuous)
Gage R&R Analysis
Data Input
New to Gage R&R?
Download a template file to see the required data format:
Download Template CSVThe template includes:
- Part column: Unique part identifiers
- Operator column: Operator names/IDs
- Measurement column: Numeric measurements
- Multiple measurements per part-operator combination
Nested: Each operator measures different/unique parts
Gage Evaluation
ANOVA Results
QCC Range Chart
QCC Xbar Chart
Generate Management Report
Create a comprehensive HTML report for management presentation.
Generate & Download Report
Report Preview
Click 'Generate & Download Report' to create and download a comprehensive HTML report with all analysis results and charts.
Attribute Agreement Analysis (Gage R&R for Attributes)
Before Regression Analysis
Check your data quality and assumptions before running regression
Regression Diagnostics Tool Educational Edition
This tool helps you understand and improve your linear regression model. Hover over icons for explanations.
📊 Analysis Workflow
Step 1: Upload Your Data
🔍 Diagnostic Tests
⚠️ Influential Observations
These rows have high influence on your model:
📥 Download Influential Rows
📊 Full Statistical Output
complete output:
🔗 Correlation Analysis
Understanding relationships between your variables helps identify multicollinearity and redundant predictors.
📊 Correlation Matrix
Values closer to ±1 indicate stronger linear relationships.
🎯 Interpretation Guide
- 0.0 to ±0.3: Weak correlation
- ±0.3 to ±0.7: Moderate correlation
- ±0.7 to ±1.0: Strong correlation
- Values > ±0.8 between predictors suggest multicollinearity
🌡️ Correlation Heatmap
Color intensity shows correlation strength.
⚠️ High Correlations Alert
📋 Detailed Correlation Matrix
🎓 Understanding Regression Diagnostics
🎯 What is Regression Analysis?
Regression analysis helps you understand relationships between variables and make predictions. It answers questions like "How does X affect Y?" and "What will Y be when X changes?"
📊 Key Assumptions to Check
- Linearity: The relationship is roughly straight-line
- Independence: Observations are unrelated to each other
- Homoscedasticity: Variance stays constant across predictions
- Normality: Residuals follow a bell curve
- No Multicollinearity: Predictors are not too highly correlated
🚨 Red Flags to Watch For
- R-squared below 0.3 (weak model fit)
- VIF values above 10 (severe multicollinearity)
- Significant p-values in diagnostic tests (< 0.05)
- Cook's Distance above 1 (very influential points)
💡 Tips for Better Models
- Start simple - use fewer predictors initially
- Check data quality - look for outliers and errors
- Consider transformations if assumptions are violated
- Always validate on new data when possible
💡 Example Interpretations
Regression and Correlation Analysis
Regression and Correlation Analysis
Correlation Highlighting
Categorical Variables
Ridge Regression Parameters
Advanced Analysis
Correlation Matrix
High Correlation Warnings
Correlation Visualization
Regression Plot
Regression Summary
Variance Inflation Factor (VIF) - Multicollinearity Analysis
VIF values indicate the degree of multicollinearity. Generally:
- VIF = 1: No correlation
- VIF < 5: Moderate correlation (acceptable)
- VIF > 5: High correlation (concerning)
- VIF > 10: Severe multicollinearity (problematic)
Multiple Regression Summary
Variance Inflation Factor (VIF) - Multicollinearity Analysis
VIF values indicate the degree of multicollinearity. Generally:
- VIF = 1: No correlation
- VIF < 5: Moderate correlation (acceptable)
- VIF > 5: High correlation (concerning)
- VIF > 10: Severe multicollinearity (problematic)
Pareto Chart of Standardized Effects
Bars extending beyond the reference line indicate statistically significant predictors at α = 0.05
CI