Troubleshooting Guide
Common Issues and Solutions
Installation Problems
Issue: Virtual Environment Creation Fails
Error: Failed to create virtual environment
Diagnosis: Python virtual environment module not available or insufficient permissions.
Solutions:
# Install virtual environment module
sudo apt-get install python3-venv # Ubuntu/Debian
brew install python3 # macOS
# Alternative: Use system Python directly (not recommended for production)
pip install -r requirements.txt
pip install -e .
Issue: Package Installation Failures
Error: Could not install packages due to an EnvironmentError
Common causes:
Network connectivity issues
Insufficient disk space
Package version conflicts
Solutions:
# Clear pip cache
pip cache purge
# Clean reinstall
./run_pipeline.sh --mode=setup --force
# Manual installation with verbose output
pip install -v -r requirements.txt
Issue: GitHub Access Denied
Error: Repository not found or access denied
Solution: Ensure you have access to the private repository and your GitHub credentials are configured:
# Check GitHub authentication
git config --list | grep user
ssh -T git@github.com
# Configure credentials if needed
git config --global user.name "Your Name"
git config --global user.email "your.email@domain.com"
Runtime Errors
Issue: Pipeline Runs Too Quickly (2-3 seconds)
Pipeline completed in 3.2 seconds - no PyMC progress bars
Diagnosis: Pipeline is running Stage 2 (optimisation) only, not the full Bayesian estimation.
Cause: Using ./run_pipeline.sh --mode=italy which uses pre-computed parameters.
Solution:
# For full end-to-end pipeline with Bayesian estimation
python scripts/run_full_italy_pipeline.py
# This should take 7-8 minutes and show PyMC progress bars
Issue: Memory Errors During MCMC
MemoryError: Unable to allocate array
RuntimeError: PyMC sampling failed
Diagnosis: Insufficient memory for Bayesian model fitting.
Solutions:
Reduce computational requirements:
# In configuration file
model:
mcmc_samples: 1000 # Reduce from default 2000
chains: 2 # Reduce from default 4
data:
sample_fraction: 0.5 # Use 50% of data
System-level solutions:
# Monitor memory usage
top -p $(pgrep -f python)
# Increase swap space if needed (Linux)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Issue: MCMC Convergence Failures
Warning: R-hat values > 1.1 detected
Error: MCMC chains did not converge
Diagnosis: Bayesian model convergence problems.
Solutions:
Increase sampling:
model:
mcmc_samples: 4000 # Double samples
mcmc_tune: 2000 # More tuning
chains: 8 # More chains
Adjust priors:
model:
priors:
beta0_std: 2.0 # Less restrictive priors
beta1_std: 1.0
Data quality checks:
# Check for data issues
python scripts/validate_italy_model.py --diagnostics
Issue: Genetic Algorithm Not Improving
Warning: No improvement in 100 generations
Genetic algorithm terminated early
Solutions:
optimization:
population_size: 200 # Larger population
mutation_rate: 0.15 # Higher mutation
crossover_rate: 0.9 # Higher crossover
convergence_generations: 100 # More patience
Configuration Issues
Issue: Configuration File Not Found
FileNotFoundError: Configuration file not found
Solutions:
# Check file exists and path is correct
ls -la configs/italy_dag_config.yaml
# Use absolute path if needed
python scripts/run_full_italy_pipeline.py --config /full/path/to/config.yaml
Issue: Invalid Configuration Values
ValidationError: Invalid configuration
Common validation errors:
Negative budget values
Invalid probability ranges
Missing required fields
Solution:
# Validate configuration before running
python -c "
import yaml
from src.conversionflow.core.config import validate_config
config = yaml.safe_load(open('configs/your_config.yaml'))
validate_config(config)
print('Configuration is valid')
"
Performance Issues
Issue: Very Slow Execution
Symptoms: Pipeline taking hours instead of minutes
Diagnosis steps:
Check model selection:
# Avoid hurdle model unless necessary
python scripts/run_full_italy_pipeline.py # Uses standard model
# Only use hurdle if data is severely zero-inflated
python scripts/run_full_italy_pipeline.py --use-hurdle
Monitor resource usage:
# Check CPU and memory usage
htop
# or
top -p $(pgrep -f python)
Profile execution:
# Run with profiling enabled
python scripts/run_full_italy_pipeline.py --profile
Solutions:
Use standard model instead of hurdle model
Reduce data sample size for initial testing
Increase CPU cores for parallel processing
Issue: Disk Space Problems
OSError: No space left on device
Solutions:
# Check disk usage
df -h
du -sh results/
# Clean up old results
rm -rf results/italy/*/run_*
# Reduce output verbosity
python scripts/run_full_italy_pipeline.py --formats csv,json # Skip PNG/HTML
Output and Reporting Issues
Issue: Missing Visualisations
Warning: Could not generate charts
ModuleNotFoundError: matplotlib backend
Solutions:
# Install additional graphics libraries
sudo apt-get install python3-tk # Linux
brew install tcl-tk # macOS
# Or disable visualisations if running headless
python scripts/run_full_italy_pipeline.py --no-viz
Issue: Incorrect Results Format
Error: Cannot parse results JSON
Solutions:
# Check output directory permissions
ls -la results/italy/
# Regenerate results with specific format
python scripts/run_full_italy_pipeline.py --formats csv,json
# Manual result inspection
python -c "
import json
with open('results/italy/latest/optimization_results.json') as f:
data = json.load(f)
print(json.dumps(data, indent=2))
"
Diagnostic Commands
System Health Check
# Complete system validation
./run_pipeline.sh --mode=test --verbose
# Check Python environment
python --version
pip list | grep -E "(pymc|numpy|pandas)"
# Verify data access
python scripts/check_italy_db.py
Model Diagnostics
# MCMC convergence diagnostics
python scripts/validate_italy_model.py --convergence
# Parameter export validation
python scripts/validate_italy_model.py --parameters
# Data quality assessment
python scripts/validate_italy_model.py --data-quality
Performance Analysis
# Execution profiling
python scripts/run_full_italy_pipeline.py --profile
# Memory usage monitoring
python -c "
import psutil
import os
process = psutil.Process(os.getpid())
print(f'Memory: {process.memory_info().rss / 1024 / 1024:.1f} MB')
"
Getting Help
Log Analysis
Always check the logs first:
# View recent logs
tail -f logs/conversionflow.log
# Search for specific errors
grep -i error logs/conversionflow.log
grep -i warning logs/conversionflow.log
Debug Mode
Run with verbose output for detailed debugging:
# Enable debug logging
python scripts/run_full_italy_pipeline.py --debug
# Python debugging
python -u scripts/run_full_italy_pipeline.py 2>&1 | tee debug.log
Environment Information
Collect system information for support:
# System information
python -c "
import sys, platform
print(f'Python: {sys.version}')
print(f'Platform: {platform.platform()}')
print(f'Architecture: {platform.architecture()}')
import pandas, numpy, pymc
print(f'Pandas: {pandas.__version__}')
print(f'NumPy: {numpy.__version__}')
print(f'PyMC: {pymc.__version__}')
"
Best Practices for Avoiding Issues
Development Workflow
Start small: Test with sample data before full datasets
Validate early: Check configuration and data format first
Monitor resources: Watch memory and CPU usage during runs
Use version control: Track configuration changes
Production Deployment
Environment consistency: Use identical Python versions
Resource allocation: Ensure adequate memory and storage
Monitoring: Set up automated health checks
Backup procedures: Save successful configurations and results
Data Management
Data validation: Implement systematic data quality checks
Version control: Track data changes and model versions
Documentation: Document data sources and transformations
Testing: Use consistent test datasets for validation
For complex issues not covered here, consult the User Guide and Architecture Documentation.