# Troubleshooting Guide ## Common Issues and Solutions ### Installation Problems #### Issue: Virtual Environment Creation Fails ``` Error: Failed to create virtual environment ``` **Diagnosis:** Python virtual environment module not available or insufficient permissions. **Solutions:** ```bash # Install virtual environment module sudo apt-get install python3-venv # Ubuntu/Debian brew install python3 # macOS # Alternative: Use system Python directly (not recommended for production) pip install -r requirements.txt pip install -e . ``` #### Issue: Package Installation Failures ``` Error: Could not install packages due to an EnvironmentError ``` **Common causes:** - Network connectivity issues - Insufficient disk space - Package version conflicts **Solutions:** ```bash # Clear pip cache pip cache purge # Clean reinstall ./run_pipeline.sh --mode=setup --force # Manual installation with verbose output pip install -v -r requirements.txt ``` #### Issue: GitHub Access Denied ``` Error: Repository not found or access denied ``` **Solution:** Ensure you have access to the private repository and your GitHub credentials are configured: ```bash # Check GitHub authentication git config --list | grep user ssh -T git@github.com # Configure credentials if needed git config --global user.name "Your Name" git config --global user.email "your.email@domain.com" ``` ### Runtime Errors #### Issue: Pipeline Runs Too Quickly (2-3 seconds) ``` Pipeline completed in 3.2 seconds - no PyMC progress bars ``` **Diagnosis:** Pipeline is running Stage 2 (optimisation) only, not the full Bayesian estimation. **Cause:** Using `./run_pipeline.sh --mode=italy` which uses pre-computed parameters. **Solution:** ```bash # For full end-to-end pipeline with Bayesian estimation python scripts/run_full_italy_pipeline.py # This should take 7-8 minutes and show PyMC progress bars ``` #### Issue: Memory Errors During MCMC ``` MemoryError: Unable to allocate array RuntimeError: PyMC sampling failed ``` **Diagnosis:** Insufficient memory for Bayesian model fitting. **Solutions:** 1. **Reduce computational requirements:** ```yaml # In configuration file model: mcmc_samples: 1000 # Reduce from default 2000 chains: 2 # Reduce from default 4 data: sample_fraction: 0.5 # Use 50% of data ``` 2. **System-level solutions:** ```bash # Monitor memory usage top -p $(pgrep -f python) # Increase swap space if needed (Linux) sudo fallocate -l 4G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile ``` #### Issue: MCMC Convergence Failures ``` Warning: R-hat values > 1.1 detected Error: MCMC chains did not converge ``` **Diagnosis:** Bayesian model convergence problems. **Solutions:** 1. **Increase sampling:** ```yaml model: mcmc_samples: 4000 # Double samples mcmc_tune: 2000 # More tuning chains: 8 # More chains ``` 2. **Adjust priors:** ```yaml model: priors: beta0_std: 2.0 # Less restrictive priors beta1_std: 1.0 ``` 3. **Data quality checks:** ```bash # Check for data issues python scripts/validate_italy_model.py --diagnostics ``` #### Issue: Genetic Algorithm Not Improving ``` Warning: No improvement in 100 generations Genetic algorithm terminated early ``` **Solutions:** ```yaml optimization: population_size: 200 # Larger population mutation_rate: 0.15 # Higher mutation crossover_rate: 0.9 # Higher crossover convergence_generations: 100 # More patience ``` ### Data-Related Issues #### Issue: CSV Loading Errors ``` Error: Unable to parse CSV file ``` **Common causes:** - Incorrect delimiter or encoding - Missing or malformed headers - Data type inconsistencies **Solutions:** ```python # Manual data inspection import pandas as pd df = pd.read_csv('data/your_file.csv', nrows=5) print(df.info()) print(df.head()) # Try different encodings df = pd.read_csv('data/your_file.csv', encoding='utf-8-sig') df = pd.read_csv('data/your_file.csv', encoding='latin1') ``` #### Issue: Missing Customer Journey Data ``` Warning: No valid customer journeys found Error: Insufficient data for model fitting ``` **Diagnosis steps:** ```bash # Check data summary python scripts/inspect_csv.py data/your_file.csv # Validate data structure python scripts/validate_italy_model.py --data-check ``` **Solutions:** 1. **Verify data format:** ```csv customer_id,timestamp,touchpoint,conversion CUST_001,2024-01-15T10:30:00,website_visit,0 CUST_001,2024-01-16T14:22:00,test_drive,1 ``` 2. **Check column mapping in configuration:** ```yaml data: customer_column: "customer_id" # Verify correct column names timestamp_column: "timestamp" touchpoint_column: "touchpoint" conversion_column: "conversion" ``` ### Configuration Issues #### Issue: Configuration File Not Found ``` FileNotFoundError: Configuration file not found ``` **Solutions:** ```bash # Check file exists and path is correct ls -la configs/italy_dag_config.yaml # Use absolute path if needed python scripts/run_full_italy_pipeline.py --config /full/path/to/config.yaml ``` #### Issue: Invalid Configuration Values ``` ValidationError: Invalid configuration ``` **Common validation errors:** - Negative budget values - Invalid probability ranges - Missing required fields **Solution:** ```bash # Validate configuration before running python -c " import yaml from src.conversionflow.core.config import validate_config config = yaml.safe_load(open('configs/your_config.yaml')) validate_config(config) print('Configuration is valid') " ``` ### Performance Issues #### Issue: Very Slow Execution **Symptoms:** Pipeline taking hours instead of minutes **Diagnosis steps:** 1. **Check model selection:** ```bash # Avoid hurdle model unless necessary python scripts/run_full_italy_pipeline.py # Uses standard model # Only use hurdle if data is severely zero-inflated python scripts/run_full_italy_pipeline.py --use-hurdle ``` 2. **Monitor resource usage:** ```bash # Check CPU and memory usage htop # or top -p $(pgrep -f python) ``` 3. **Profile execution:** ```bash # Run with profiling enabled python scripts/run_full_italy_pipeline.py --profile ``` **Solutions:** - Use standard model instead of hurdle model - Reduce data sample size for initial testing - Increase CPU cores for parallel processing #### Issue: Disk Space Problems ``` OSError: No space left on device ``` **Solutions:** ```bash # Check disk usage df -h du -sh results/ # Clean up old results rm -rf results/italy/*/run_* # Reduce output verbosity python scripts/run_full_italy_pipeline.py --formats csv,json # Skip PNG/HTML ``` ### Output and Reporting Issues #### Issue: Missing Visualisations ``` Warning: Could not generate charts ModuleNotFoundError: matplotlib backend ``` **Solutions:** ```bash # Install additional graphics libraries sudo apt-get install python3-tk # Linux brew install tcl-tk # macOS # Or disable visualisations if running headless python scripts/run_full_italy_pipeline.py --no-viz ``` #### Issue: Incorrect Results Format ``` Error: Cannot parse results JSON ``` **Solutions:** ```bash # Check output directory permissions ls -la results/italy/ # Regenerate results with specific format python scripts/run_full_italy_pipeline.py --formats csv,json # Manual result inspection python -c " import json with open('results/italy/latest/optimization_results.json') as f: data = json.load(f) print(json.dumps(data, indent=2)) " ``` ## Diagnostic Commands ### System Health Check ```bash # Complete system validation ./run_pipeline.sh --mode=test --verbose # Check Python environment python --version pip list | grep -E "(pymc|numpy|pandas)" # Verify data access python scripts/check_italy_db.py ``` ### Model Diagnostics ```bash # MCMC convergence diagnostics python scripts/validate_italy_model.py --convergence # Parameter export validation python scripts/validate_italy_model.py --parameters # Data quality assessment python scripts/validate_italy_model.py --data-quality ``` ### Performance Analysis ```bash # Execution profiling python scripts/run_full_italy_pipeline.py --profile # Memory usage monitoring python -c " import psutil import os process = psutil.Process(os.getpid()) print(f'Memory: {process.memory_info().rss / 1024 / 1024:.1f} MB') " ``` ## Getting Help ### Log Analysis Always check the logs first: ```bash # View recent logs tail -f logs/conversionflow.log # Search for specific errors grep -i error logs/conversionflow.log grep -i warning logs/conversionflow.log ``` ### Debug Mode Run with verbose output for detailed debugging: ```bash # Enable debug logging python scripts/run_full_italy_pipeline.py --debug # Python debugging python -u scripts/run_full_italy_pipeline.py 2>&1 | tee debug.log ``` ### Environment Information Collect system information for support: ```bash # System information python -c " import sys, platform print(f'Python: {sys.version}') print(f'Platform: {platform.platform()}') print(f'Architecture: {platform.architecture()}') import pandas, numpy, pymc print(f'Pandas: {pandas.__version__}') print(f'NumPy: {numpy.__version__}') print(f'PyMC: {pymc.__version__}') " ``` ## Best Practices for Avoiding Issues ### Development Workflow 1. **Start small:** Test with sample data before full datasets 2. **Validate early:** Check configuration and data format first 3. **Monitor resources:** Watch memory and CPU usage during runs 4. **Use version control:** Track configuration changes ### Production Deployment 1. **Environment consistency:** Use identical Python versions 2. **Resource allocation:** Ensure adequate memory and storage 3. **Monitoring:** Set up automated health checks 4. **Backup procedures:** Save successful configurations and results ### Data Management 1. **Data validation:** Implement systematic data quality checks 2. **Version control:** Track data changes and model versions 3. **Documentation:** Document data sources and transformations 4. **Testing:** Use consistent test datasets for validation For complex issues not covered here, consult the [User Guide](user-guide.md) and [Architecture Documentation](architecture/overview.md).