Mathematical Methodology

Overview

This document provides the complete mathematical specification of the conversionflow-aggregate two-stage pipeline. The methodology implements a hierarchical Bayesian-optimisation framework with conservative attribution reporting for marketing budget allocation.

Stage 1: Bayesian Parameter Estimation

1.1 Problem Formulation

Let \(\mathbf{Y} = \{Y_{ij}\}\) denote the observed count data where:

  • \(i \in \{1, 2, \ldots, T\}\) indexes time periods (days)

  • \(j \in \{1, 2, \ldots, J\}\) indexes marketing touchpoints

  • \(Y_{ij} \in \mathbb{N}_0\) represents the count of events for touchpoint \(j\) on day \(i\)

The customer journey is modelled as a directed acyclic graph (DAG) \(\mathcal{G} = (\mathcal{V}, \mathcal{E})\) where:

  • \(\mathcal{V} = \{v_1, v_2, \ldots, v_J\}\) represents touchpoints

  • \(\mathcal{E} \subseteq \mathcal{V} \times \mathcal{V}\) represents causal relationships

  • \(\text{pa}(j) = \{k : (v_k, v_j) \in \mathcal{E}\}\) denotes parent nodes of touchpoint \(j\)

1.2 Standard Poisson Model

Likelihood Specification

For each touchpoint \(j\) and time period \(i\):

\[Y_{ij} \sim \text{Poisson}(\lambda_{ij})\]

where the rate parameter follows a log-linear specification:

\[\log(\lambda_{ij}) = \alpha_j + \beta_{1j} \log\left(1 + \frac{B_j}{\kappa}\right) + \sum_{k \in \text{pa}(j)} \gamma_{kj} \log(1 + Y_{ik}) + \delta_j \mathbf{w}_i^T\]

Parameter Interpretation:

  • \(\alpha_j\): Baseline log-rate for touchpoint \(j\)

  • \(\beta_{1j}\): Budget sensitivity coefficient (diminishing returns via logarithm)

  • \(B_j\): Budget allocation to touchpoint \(j\)

  • \(\kappa > 0\): Budget scaling factor (default: 1000)

  • \(\gamma_{kj}\): Influence coefficient from parent touchpoint \(k\) to \(j\)

  • \(\delta_j\): Time-varying effect coefficients

  • \(\mathbf{w}_i\): Time covariate vector (e.g., day-of-week indicators)

Prior Specifications

Baseline Effects: $\(\alpha_j \sim \mathcal{N}(\mu_{\alpha,j}, \sigma_{\alpha,j}^2)\)$

Budget Sensitivity: $\(\beta_{1j} \sim \mathcal{N}(\mu_{\beta,j}, \sigma_{\beta,j}^2)\)$

Parent Influences: $\(\gamma_{kj} \sim \mathcal{N}(\mu_{\gamma,kj}, \sigma_{\gamma,kj}^2) \quad \forall k \in \text{pa}(j)\)$

Time Effects: $\(\delta_j \sim \mathcal{N}(\mathbf{0}, \sigma_{\delta}^2 \mathbf{I})\)$

Default Hyperparameters:

  • \(\mu_{\alpha,j} = 3.0, \sigma_{\alpha,j} = 1.5\) (baseline intercepts)

  • \(\mu_{\beta,j} = 1.0, \sigma_{\beta,j} = 0.5\) (budget sensitivity)

  • \(\mu_{\gamma,kj} = 0.0, \sigma_{\gamma,kj} = 1.0\) (parent effects)

  • \(\sigma_{\delta} = 1.5\) (time effects)

1.3 Hurdle Model (Zero-Inflated Poisson)

For count data with excess zeros, we employ a two-stage hurdle model:

Stage 1: Hurdle Component (Bernoulli Process)

\[H_{ij} \sim \text{Bernoulli}(\pi_{ij})\]
\[\text{logit}(\pi_{ij}) = \alpha^{(h)}_j + \sum_{k \in \text{pa}(j)} \gamma^{(h)}_{kj} \mathbb{I}(Y_{ik} > 0) + \delta^{(h)}_j \mathbf{w}_i^T\]

where \(\mathbb{I}(\cdot)\) is the indicator function and \(\pi_{ij}\) represents the probability of any activity occurring.

Stage 2: Count Component (Truncated Poisson)

\[Y_{ij} | H_{ij} = 1 \sim \text{TruncatedPoisson}(\mu_{ij}, \text{lower}=1)\]
\[\log(\mu_{ij}) = \alpha^{(c)}_j + \beta^{(c)}_{1j} \log\left(1 + \frac{B_j}{\kappa}\right) + \sum_{k \in \text{pa}(j)} \gamma^{(c)}_{kj} \log(1 + Y_{ik}) + \delta^{(c)}_j \mathbf{w}_i^T\]

Combined Likelihood

The complete data likelihood becomes:

\[Y_{ij} \sim \text{ZeroInflatedPoisson}(\psi_{ij}, \mu_{ij})\]

where:

  • \(\psi_{ij} = 1 - \pi_{ij}\) (excess zero probability)

  • \(\mu_{ij}\) is the Poisson rate when active

Hurdle Model Priors

Hurdle Component: $\(\alpha^{(h)}_j \sim \mathcal{N}(0, 1.5^2), \quad \gamma^{(h)}_{kj} \sim \mathcal{N}(0, 1^2)\)$

Count Component: $\(\alpha^{(c)}_j \sim \mathcal{N}(2, 1.5^2), \quad \gamma^{(c)}_{kj} \sim \text{HalfCauchy}(5)\)$

1.4 Posterior Inference

MCMC Sampling

Posterior inference uses Hamiltonian Monte Carlo (HMC) via PyMC:

Sampling Configuration:

  • Draws: \(S = 2000\) (production: 4000)

  • Tuning: \(T = 1000\) (production: 2000)

  • Chains: \(C = 4\) (production: 8)

  • Target acceptance rate: \(\rho = 0.9\) (production: 0.95)

  • Maximum tree depth: \(d_{\max} = 15\)

Convergence Diagnostics

R-hat Statistic: $\(\hat{R} = \sqrt{\frac{\hat{V}^+}{\hat{W}}}\)$

where \(\hat{V}^+\) is the posterior variance estimate and \(\hat{W}\) is the within-chain variance.

Convergence Criterion: \(\hat{R} < 1.1\) for all parameters.

Effective Sample Size: $\(\text{ESS} = \frac{CS}{1 + 2\sum_{t=1}^{T} \rho_t}\)$

where \(\rho_t\) is the lag-\(t\) autocorrelation.

Quality Criterion: \(\text{ESS} > 400\) for all parameters.

Model Comparison

Leave-One-Out Cross-Validation (LOO-CV): $\(\text{ELPD}_{\text{LOO}} = \sum_{i=1}^{n} \log p(y_i | y_{-i})\)$

where \(p(y_i | y_{-i})\) is the leave-one-out predictive density approximated using Pareto-smoothed importance sampling.

1.5 Parameter Export

The posterior samples are summarised into point estimates and uncertainty quantification:

For each parameter \(\theta\), we compute:

  • Point Estimate: \(\hat{\theta} = \mathbb{E}[\theta | \mathbf{Y}]\) (posterior mean)

  • Uncertainty: \(\text{SD}(\theta) = \sqrt{\text{Var}[\theta | \mathbf{Y}]}\) (posterior standard deviation)

  • Credible Intervals: \((\theta_{\alpha/2}, \theta_{1-\alpha/2})\) where \(\alpha = 0.05\)

Export Format:

{
  "parameters": {
    "touchpoint_j": {
      "beta0": {"mean": α̂_j, "std": SD(α_j)},
      "beta1": {"mean": β̂_{1j}, "std": SD(β_{1j})},
      "parents": ["touchpoint_k", ...],
      "parent_coeffs": [
        {"mean": γ̂_{kj}, "std": SD(γ_{kj})}, ...
      ],
      "alpha": α̂_j  // Conversion value weight
    }
  },
  "diagnostics": {
    "elpd_loo": ELPD_LOO,
    "rhat_max": max(R̂),
    "ess_min": min(ESS)
  }
}

Stage 2: Genetic Algorithm Optimisation

2.1 Problem Formulation

Decision Variables: Let \(\mathbf{b} = (b_1, b_2, \ldots, b_J)^T\) where \(b_j \geq 0\) represents the budget allocation to touchpoint \(j\).

Budget Constraint: $\(\sum_{j=1}^{J} b_j = B_{\text{total}}\)$

Box Constraints: $\(b_{\text{min},j} \leq b_j \leq b_{\text{max},j} \quad \forall j\)$

2.2 Objective Function

Expected Conversion Calculation

For a given budget allocation \(\mathbf{b}\), the expected conversion probability for touchpoint \(j\) is:

\[p_j(\mathbf{b}) = \sigma\left(\hat{\alpha}_j + \hat{\beta}_{1j} \log\left(1 + \frac{b_j}{\kappa}\right) + \sum_{k \in \text{pa}(j)} \hat{\gamma}_{kj} p_k(\mathbf{b})\right)\]

where \(\sigma(z) = \frac{1}{1 + e^{-z}}\) is the sigmoid function with overflow protection: $\(\sigma(z) = \sigma(\max(-500, \min(500, z)))\)$

Fitness Function

The optimization objective maximises expected total conversion value:

\[f(\mathbf{b}) = \sum_{j=1}^{J} \alpha_j \cdot p_j(\mathbf{b}) - \Phi(\mathbf{b})\]

where:

  • \(\alpha_j\) is the conversion value weight for touchpoint \(j\)

  • \(\Phi(\mathbf{b})\) represents penalty terms for constraint violations

Penalty Function

\[\Phi(\mathbf{b}) = \lambda_{\text{min}} \sum_{j=1}^{J} \max(0, b_{\text{min},j} - b_j) + \lambda_{\text{business}} \Psi_{\text{business}}(\mathbf{b})\]

where:

  • \(\lambda_{\text{min}} > 0\) penalises under-budgeted touchpoints

  • \(\Psi_{\text{business}}(\mathbf{b})\) enforces business-specific constraints

2.3 Genetic Algorithm Specification

Population Representation

Each individual \(\mathbf{x}^{(i)} \in \mathbb{R}^J\) represents a budget allocation satisfying: $\(\mathbf{x}^{(i)} \in \mathcal{F} = \left\{\mathbf{b} \in \mathbb{R}_+^J : \sum_{j=1}^{J} b_j = B_{\text{total}}, \, b_{\text{min},j} \leq b_j \leq b_{\text{max},j}\right\}\)$

Initialization

Importance-Based Sampling: Initial population members are generated as:

\[b_j^{(0)} = \frac{w_j}{\sum_{k=1}^{J} w_k} B_{\text{total}} + \epsilon_j\]

where:

  • \(w_j\) is the importance weight for touchpoint \(j\)

  • \(\epsilon_j \sim \mathcal{N}(0, \sigma_{\text{init}}^2)\) adds diversity

  • The result is projected onto \(\mathcal{F}\) via constraint enforcement

Selection Operator

Tournament Selection: For tournament size \(k\), select parent as: $\(\mathbf{x}^{\text{parent}} = \arg\min_{\mathbf{x} \in \mathcal{T}} f(\mathbf{x})\)$

where \(\mathcal{T}\) is a random subset of size \(k\) from the current population.

Crossover Operator

Uniform Crossover with Constraint Repair: For parents \(\mathbf{x}^{(1)}, \mathbf{x}^{(2)}\), generate offspring:

\[\mathbf{x}^{\text{child}} = \alpha \mathbf{x}^{(1)} + (1-\alpha) \mathbf{x}^{(2)}\]

where \(\alpha \sim \text{Uniform}(0, 1)\).

Constraint Repair: Apply projection \(\Pi_{\mathcal{F}}(\mathbf{x}^{\text{child}}) \in \mathcal{F}\) via:

  1. Bound Enforcement: \(\tilde{b}_j = \max(b_{\text{min},j}, \min(b_{\text{max},j}, b_j))\)

  2. Budget Normalisation: \(b_j^* = \tilde{b}_j \cdot \frac{B_{\text{total}}}{\sum_{k=1}^{J} \tilde{b}_k}\)

  3. Iterative Adjustment: If constraints remain violated, apply iterative rebalancing

Mutation Operator

Budget Reallocation Mutation: With probability \(p_m\), apply:

\[b_j^{\text{new}} = b_j + \Delta_j\]

where \(\sum_{j=1}^{J} \Delta_j = 0\) (budget conservation) and \(\Delta_j\) follows a budget transfer scheme:

  1. Transfer Selection: Choose donor-recipient pairs with probability proportional to current allocations

  2. Transfer Amount: \(|\Delta_j| \sim \text{Uniform}(0.05 b_j, 0.3 b_j)\)

  3. Constraint Repair: Apply \(\Pi_{\mathcal{F}}(\cdot)\)

Evolutionary Parameters

Standard Configuration:

  • Population size: \(N = 100\)

  • Generations: \(G = 200\)

  • Tournament size: \(k = 5\)

  • Crossover rate: \(p_c = 0.8\)

  • Mutation rate: \(p_m = 0.15\)

  • Elite fraction: \(\eta = 0.1\)

2.4 Convergence Criteria

Fitness-Based Stopping: Algorithm terminates when: $\(\frac{f_{\max}^{(g)} - f_{\max}^{(g-h)}}{|f_{\max}^{(g-h)}|} < \epsilon_{\text{conv}}\)$

for \(h\) consecutive generations, where:

  • \(f_{\max}^{(g)}\) is the best fitness in generation \(g\)

  • \(h = 50\) (patience parameter)

  • \(\epsilon_{\text{conv}} = 0.001\) (convergence threshold)

Stage Interface: Parameter Conversion

3.1 Bayesian to GA Parameter Mapping

The Stage 1 posterior estimates are converted to Stage 2 optimization parameters:

Direct Mapping:

  • \(\hat{\alpha}_j \leftarrow \mathbb{E}[\alpha_j | \mathbf{Y}]\) (baseline effects)

  • \(\hat{\beta}_{1j} \leftarrow \mathbb{E}[\beta_{1j} | \mathbf{Y}]\) (budget sensitivities)

  • \(\hat{\gamma}_{kj} \leftarrow \mathbb{E}[\gamma_{kj} | \mathbf{Y}]\) (parent influences)

Uncertainty Propagation: For robust optimization, parameter uncertainty can be incorporated by:

  1. Monte Carlo Sampling: Draw \(\{\theta^{(s)}\}_{s=1}^{S}\) from posterior

  2. Stochastic Fitness: \(f(\mathbf{b}) = \frac{1}{S} \sum_{s=1}^{S} f(\mathbf{b}; \theta^{(s)})\)

3.2 Constraint Specification

Business Constraints:

  • Minimum allocation: \(b_{\text{min},j} = \max(10\text{k}, 0.001 \cdot B_{\text{total}})\)

  • Maximum allocation: \(b_{\text{max},j} = 0.95 \cdot B_{\text{total}}\)

  • Category limits: \(\sum_{j \in \mathcal{C}_k} b_j \leq \beta_k B_{\text{total}}\) for channel categories \(\mathcal{C}_k\)

Data-Grounded Attribution Framework

4.1 Principle of Scoped Projections

A core principle of the conversionflow-aggregate methodology is that all financial projections must be directly and defensibly tied to the scope of the data being analyzed. This ensures analytical integrity and provides credible, realistic business insights.

4.2 The Digital Attribution Challenge

In many real-world scenarios, particularly in markets like luxury automotive, the available digital data (e.g., website interactions, ad clicks) only captures a small fraction of the total customer journey. For the Italy market analysis, this is a critical consideration:

  • Digital Data Scope: The model is built using data from digital touchpoints.

  • Sales Data Scope: This digital data is linked to only ~5% of total vehicle sales. The remaining 95% of sales occur through offline channels (e.g., dealer relationships, walk-ins) that are not present in the dataset.

4.3 Methodological Solution

To avoid making unsupported claims, our methodology strictly aligns the scope of the analysis with the scope of the data:

  1. Model Scope: The Bayesian network is built exclusively on the tracked digital journey data. It learns the conversion probabilities within this digital ecosystem.

  2. Optimization Scope: The genetic algorithm optimizes the marketing budget based on the conversion probabilities learned from the digital-only data. Its goal is to maximize conversions within the population of digitally-engaged users.

  3. Business Impact Scope: Consequently, all financial projections, such as the “Expected additional revenue” from the uncertainty analysis, are calculated based on the portion of sales that can be reasonably attributed to these digital journeys.

Example Calculation:

  • Total Annual Sales: 5,067 units

  • Digitally Attributable Sales (Analysis Scope): 5067 * 0.05 = ~253 units

  • Optimization Result: A 5.65% improvement in digital conversion efficiency.

  • Business Impact Calculation: The 5.65% improvement is applied to the revenue from ~253 cars, not the total 5,067 cars.

This approach ensures that the system provides a realistic estimate of the value generated by optimizing the digital marketing spend, rather than making speculative claims about its impact on the entire sales landscape.

Mathematical Assumptions

4.3 Key Modelling Assumptions

  1. DAG Structure: Customer journeys follow a directed acyclic graph with no cycles

  2. Poisson Counts: Event counts are Poisson-distributed conditional on rate parameters

  3. Log-Linear Effects: Budget and parent influences enter log-linearly

  4. Diminishing Returns: Budget effects follow \(\log(1 + b/\kappa)\) form

  5. Independence: Conditional independence of counts given parameters and structure

  6. Stationarity: Parameters are constant within the modelling period

  7. Additive Effects: Parent influences combine additively in log-rate

4.4 Convergence Guarantees

MCMC Convergence: Under regularity conditions (log-concave posteriors, bounded parameter spaces), HMC converges to the target posterior distribution.

GA Convergence: The genetic algorithm converges to a local optimum with probability 1 under:

  • Positive mutation rates

  • Elite preservation

  • Finite feasible region

Global Optimality: No guarantee of global optimum due to non-convex objective function. Multiple runs with different random seeds recommended for robustness.

Implementation Notes

4.5 Numerical Stability

Overflow Protection:

  • Sigmoid function clipped to \([-500, 500]\)

  • Log-sum-exp tricks for stable probability calculations

  • Regularization terms for near-singular matrices

Constraint Handling:

  • Iterative projection algorithms for budget conservation

  • Feasibility restoration via quadratic programming

  • Numerical tolerance: \(\epsilon_{\text{tol}} = 10^{-9}\)

4.6 Computational Complexity

Stage 1 (MCMC): \(\mathcal{O}(S \cdot C \cdot J^2 \cdot T)\) where \(S\) is samples, \(C\) is chains, \(J\) is touchpoints, \(T\) is time periods

Stage 2 (GA): \(\mathcal{O}(G \cdot N \cdot J^2)\) where \(G\) is generations, \(N\) is population size

Total Pipeline: Dominated by MCMC sampling (typically ~7 minutes vs ~3 seconds for GA)