$$\def\RR{\bf R} \def\real{\mathbb{R}} \def\bold#1{\bf #1} \def\d{\mbox{Cord}} \def\hd{\widehat \mbox{Cord}} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cor}{cor} \newcommand{\ac}[1]{\left\{#1\right\}}$$

## Estimating Brain Information Flow via Pathway Lasso

### Yi Zhao, Xi (Rossi) Luo

Brown University
Department of Biostatistics
Center for Statistical Sciences
Computation in Brain and Mind
Brown Institute for Brain Science

CMStatistics, Seville, Spain
December 11, 2016

Funding: NIH R01EB022911; NSF/DMS (BD2K) 1557467; NIH P20GM103645, P01AA019072, P30AI042853; AHA

## Authors

Yi Zhao
(4th Yr PhD Student, on the job market this year)

Xi Luo
Brown University

• Story vs Math task: listen to story (treatment stimulus) or math questions (control), eye closed
• Not resting-state: "rest" in scanner

Goal: how brain processes story/math differently?

fMRI data: blood-oxygen-level dependent (BOLD) signals from each cube/voxel (~millimeters), $10^5$ ~ $10^6$ voxels in total.

## fMRI Studies

Sub 1, Sess 1

Time 1

2

~200

Sub i, Sess j

Sub ~100, Sess ~4

This talk: one subject, two sessions (to test replicability)

## Network Model with Stimulus

Question: quantify red, blue, and other pathways
from stimulus to orange outcome circle/region Heim et al, 09

## Pathway=Activation+Connectivity

• Activation: stimulus $\rightarrow$ brain region activity
• Connectivity: one brain region $\rightarrow$ another region
• Whether not two or more brain regions "correlate"
• Pathway: stimulus $\rightarrow$ brain region A $\rightarrow$ region B
• Strong path: strong activation and strong conn
• Zero path: zero activation or zero conn, including
• Zero activation + strong conn = zero
• Strong activation + zero conn = zero

# Model

## Mediation Analysis and SEM

\begin{align*}M &= Z a + {U + \epsilon_1}\\ R &= Z c + M b + {U g + \epsilon_2}\end{align*}
• Pathway effect: $a \times b$indirect; residual: $c$direct
• Mediation analysis
• Baron&Kenny, 86; Sobel, 82; Holland 88; Preacher&Hayes 08; Imai et al, 10; VanderWeele, 15;...

## Mediation Analysis in fMRI

• Parametric Wager et al, 09 and functional Lindquist, 12 mediation, under (approx.) independent errors
• Stimulus $\rightarrow$ brain $\rightarrow$ user reported ratings, one mediator
• Usual assumption: $U=0$ and $\epsilon_1 \bot \epsilon_2$
• Parametric and multilevel mediation Yi and Luo, 15, with correlated errors for two brain regions
• Stimulus $\rightarrow$ brain region A $\rightarrow$ brain region B, one mediator
• Correlations between $\epsilon_1$ and $\epsilon_2$
• This talk: multiple mediator and multiple pathways
• High dimensional: more mediators than sample size
• Alt: dimension reduction by arXiv1511.09354 Chen, Crainiceanu, Ogburn, Caffo, Wager, Lindquist, 15

## Full Pathway Model

• Stimulus $Z$, $K$ mediating brain regions $M_1, \dotsc, M_K$, Outcome region $R$
• Strength of activation ($a_k$) and connectivity ($b_k$, $d_{ij}$)
• Too complex, even for small $K = 2$ Daniel et al, 14

## Reduced Pathway Model

\begin{align}M_k & = Z A_k + E_{1k},\, k=1,\dotsc, K\\ R & = Z C + \sum_{k=1}^{K} M_k B_k + E_2 \end{align}
• $A_k$: total inflow to mediator $M_k$; $B_k$: total conn
• Pathway effect: $A_k \times B_k$; Residual: $C$

## Relation to Full Model

• Proposition: Our "total" parameters has explicit forms of "individual" flow parameters in the full model
• Proposition: Our $E_k$'s are correlated, but won't affect estimation (will affect variance)
• Reduced model: a step to select spatial mediators
• Strong overall inflow and strong conn flow
• Favor reduced: challenging to determine the temporal order because of low temporal resolution

# Method

## Regularized Regression

• Minimize the penalized least squares criterion
$$\sum_{k=1}^K \| M_k - Z A_k \|_2^2 + \| R - Z C - \sum_k M_k B_k \|_2^2 + \mbox{Pen}(A, B)$$
The choice of penalty $\mbox{Pen}(\cdot)$ to be discussed
• All data are normalized (mean=0, sd=1)
• Want to select sparse pathways for high-dim $K$
• Alternative approach: two-stage LASSO Tibshirani, 96 to select sparse inflow and connection separately: $$\sum_{k=1}^K \| M_k - Z A_k \|_2^2 + \lambda \sum_k | A_k | \\ \| R - Z C - \sum_k M_k B_k \|_2^2 + \lambda \sum_k |B_k|$$

## Penalty: Pathway LASSO

• Select strong pathways effects: $A_k \times B_k$
• TS-LASSO: shrink to zero when $A$&$B$ moderate but $A\times B$ large
• Penalty (prototype) $$\lambda \sum_{k=1}^K |A_k B_k|$$
• Non-convex in $A_k$ and $B_k$
• Computationally heavy and non-unique solutions
• Hard to prove theory
• We propose the following general class of penalties$$\lambda \sum_{k=1}^K ( |A_k B_k| + \phi A_k^2 + \phi B_k^2)$$
Theorem $$v(a,b) = |a b| + \phi (a^2 + b^2)$$ is convex if and only if $\phi\ge 1/2$. Strictly convex if $\phi > 1/2$.

Contour Plot of Different Penalties

• Non-differentiable at points when $a\times b = 0$
• Shrink $a\times b$ to zero
• Special cases: $\ell_1$ or $\ell_2$
• TS-LASSO: different $|ab|$ effects though $|a|+|b|$ same

• SEM/regression loss: $u$; Non-differnetiable penalty: $v$
• ADMM to address differentiability \begin{aligned} \text{minimize} \quad & u(\Theta,D)+v(\alpha,\beta) \\ \text{subject to} \quad & \Theta=\alpha, \\ & D=\beta, \\ & \Theta e_{1}=1, \end{aligned}
• Augmented Lagrangian for multiple constraints
• Iteratively update the parameters
• We derive theorem on explicit (not simple) updates

## Complications

• Mixed norm penalty $$\mbox{PathLasso} + \omega \sum_k (|A_k| + |B_k|)$$
• Tuning parameter selection by cross validation
• Reduce false positives via thresholding Johnston and Lu, 09
• Inference/CI: bootstrap after refitting
• Remove false positives with CIs covering zero Bunea et al, 10

# Simulations

• Our PathLasso compares with TSLasso
• Simulate with varying error correlations
• Tuning-free comparison: performance vs tuning parameter (estimated effect size)
• PathLasso outperforms under CV

## Pathway Recovery

ROC

F1 Score

MSE

Our PathLasso (red) outperforms two-stage Lasso (blue)

Other curves: variants of PathLasso and correlation settings

# Real Data: HCP

## Data: Human Connectome Project

• Two sessions (LR/RL), story/math task Binder et al, 11
• gICA reduces voxel dimensions to 76 brain maps
• ROIs/clusters after thresholding
• Apply to two sess separately, compare replicability
• Jaccard: whether selected pathways in two runs overlap
• $\ell_2$ diff: difference between estimated path effects
• Tuning-free comparisons

Jaccard

$\ell_2$ Diff

Regardless of tuning, our PathLasso (red) has smaller cross-sess diff (selection and estimation) than TSLasso (blue)

Stim-M25-R and Stim-M65-R significant shown largest weight areas

• M65 responsible for language processing, larger flow under story
• M25 responsible for uncertainty, larger flow under math

## Summary

• High dimensional pathway model
• Penalized SEM for pathway selection and estimation
• Convex optimization for non-convex products
• Sufficient and necessary condition
• Algorithmic development for complex optimization
• Improved estimation and selection accuracy
• Higher replicability in HCP data
• Manuscript: Pathway Lasso (arXiv 1603.07749)