3Dshear

A Fast Algorithm for Spherical Grid Rotations and its Application to Singular Quadrature
Z. Gimbutas and S. Veerapaneni
Submitted, Nov 2012.

We present a fast and accurate algorithm for evaluating singular integral operators on smooth surfaces that are globally parametrized by
spherical coordinates. Problems of this type arise, for example, in simulating Stokes flows with particulate suspensions and in multi-particle scattering calculations. For smooth surfaces, spherical harmonic expansions are commonly used for geometry representation and the evaluation of the singular integrals is carried out with a spectrally accurate quadrature rule on a set of rotated spherical grids. We propose a new algorithm that interpolates function values on the rotated spherical grids via hybrid nonuniform FFTs. The algorithm is nearly optimal in computational complexity and reduces
the cost of applying the quadrature rule from O(p^5 ) to O(p^4 log p) for a spherical harmonic expansion of degree p.


 

Integral Equation Methods for Unsteady Stokes Flow in Two Dimensions
S. Jiang, S. Veerapaneni and L. Greengard
SIAM Journal on Scientific Computing, Vol. 34(4), 2012.

We present an integral equation formulation for the unsteady Stokes equations in two dimensions. This problem is of interest in its own right, as a model for slow viscous flow, but perhaps more importantly, as an ingredient in the solution of the full, incompressible Navier-Stokes equations. Using the unsteady Green's function, the velocity evolves analytically as a divergence-free vector field. This avoids the need for either the solution of coupled field equations (as in fully implicit PDE-based marching schemes) or the projection of the velocity field onto a divergence free field at each time step (as in operator splitting methods). In addition to discussing the analytic properties of the operators that arise in the integral formulation, we describe a family of high-order accurate numerical schemes and illustrate their performance with several examples.


3Dshear

Dynamics of a compound vesicle in shear flow
S. Veerapaneni, Y.-N. Young, P. M. Vlahovska and J. Blawzdzeiwicz
Physical Review Letters, Vol. 106(15), 2011.

The dynamics of a compound vesicle (a lipid bilayer membrane enclosing a fluid with a suspended particle) in shear flow is investigated using both numerical simulations and theoretical analysis. We find that the non-linear hydrodynamic interaction between the inclusion and the confining membrane gives rise to new features of the vesicle dynamics: Transition from tank--treading to tumbling can occur in the absence of any viscosity mismatch, and a vesicle can swing if the enclosed particle is non-spherical. Our results highlight the complex effects of internal cellular structures have on cell dynamics in microcirculatory flows. For example, parasites in malaria-infected erythrocytes increase cytoplasmic viscosity, which leads to increase in blood viscosity.


3Dshear

Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures
A. Rahimian, I. Lashuk, S. Veerapaneni, A. Chandramowlishwaran, D. Malhotra, L. Moon, R. Sampath, A. Shringarpure, J. Vetter, R.Vuduc, D. Zorin and G. Biros
ACM/IEEE Conference on Supercomputing, 2010. *Gordon Bell Prize*

We present a high fidelity numerical simulation of blood flow by directly resolving the interactions of 200 million deformable red blood cells flowing in plasma. This simulation amounts to 90 billion unknowns in space, with numerical experiments typically requiring O(1000) time steps. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of 14 thousand cells. This breakthrough is based on novel algorithms that we designed to enable distributed memory, shared memory, and vectorized/streaming parallelism. We present results on CPU and hybrid CPU-GPU platforms, including the new NVIDIA Fermi architecture and 200,000 cores of ORNL's Jaguar system. For the latter, we achieve over 0.7 Petaflop/s sustained performance. Our work demonstrates the successful simulation of complex phenomena using heterogeneous architectures and programming models at the petascale.


3Dshear

Parallel fast Gauss transform
R. S. Sampath, H. Sundar and S. Veerapaneni
ACM/IEEE Conference on Supercomputing, 2010. *Best Paper finalist*

We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N^2) time. The parallel time complexity estimates for our algorithms are O(N/n_p) for uniform point distributions and O(N/n_p log (N/n_p) + n_p log n_p) for nonuniform distributions using n_p CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits ``diagonal translation''. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle nonuniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer at the Oak Ridge National Laboratory.

Our implementation is kernel-independent and can handle other ``Gaussian-type'' kernels even when an explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.


3Dshear

A fast algorithm for simulating vesicle flows in three dimensions
S. Veerapaneni, A. Rahimian, G. Biros and D. Zorin
Journal of Computational Physics, Vol. 230(14), 2011.

Vesicles are locally-inextensible fluid membranes that can sustain bending. In this paper, we extend ``A numerical method for simulating the dynamics of 3{D} axisymmetric vesicles suspended in viscous flows'', Veerapaneni et al. Journal of Computational Physics, 228(19), 2009 to general non-axisymmetric vesicle flows in three dimensions.

Although the main components of the algorithm are similar in spirit to the axisymmetric case (spectral approximation in space, semi-implicit time-stepping scheme), important new elements need to be introduced for a full 3D method. In particular, spatial quantities are discretized using spherical harmonics, and quadrature rules for singular surface integrals need to be adapted to this case; an algorithm for surface reparameterization is neeed to ensure sufficient of the time-stepping scheme, and spectral filtering is introduced to maintain reasonable accuracy while minimizing computational costs. To characterize the stability of the scheme and to construct preconditioners for the iterative linear system solvers used in the semi-implicit time-stepping scheme, we perform a spectral analysis of the evolutions operator on the unit sphere.

By introducing these algorithmic components, we obtain a time-stepping scheme that experimentally is unconditionally stable and has a low cost per time step. We present numerical results to analyze the cost and convergence rates of the scheme. To illustrate the applicability of method, we consider a few vesicle-flow interaction problems: a single vesicle in relaxation, sedimentation, and shear flows, and many-vesicle flows.


sweep

The fast generalized Gauss transform
M. Spivak, S. Veerapaneni and L. Greengard
SIAM Journal on Scientific Computing, Vol. 32(5), 2010.

The fast Gauss transform allows for the calculation of the sum of N Gaussians at M points in O(N + M) time. Here, we extend the algorithm to a wider class of kernels, motivated by quadrature issues that arise in using integral equation methods for solving the heat equation on moving domains. In particular, robust high-order product integration methods require convolution with O(q) distinct Gaussian-type kernels in order to obtain qth order accuracy in time. The generalized Gauss transform permits the computation of each of these kernels and, thus, the construction of fast solvers with optimal computational complexity.

We also develop plane-wave representations of these Gaussian-type fields, permitting the ``diagonal translation" version of the Gauss transform to be applied. When the sources and targets lie on a uniform grid, or a hierarchy of uniform grids, we show that the curse of dimensionality (the exponential growth of complexity constants with dimension) can be avoided. Under these conditions, our implementation has a lower operation count than the fast Fourier transform (FFT).


confined flow

Dynamic Simulation of Locally Inextensible Vesicles Suspended in an Arbitrary Two-dimensional Domain, a Boundary Integral Method
A. Rahimian, S. Veerapaneni and G. Biros
Journal of Computational Physics, Vol. 229(18), 2010.

...In this paper, we consider two important generalizations: (i) confined flows within arbitrary-shaped stationary/moving geometries and (ii) contrast flows, in which, the fluids enclosed by the vesicles and the ambient fluid can be different from one another. These two problems require solving additional integral
equations and cause modifications to the previous numerical scheme that are nontrivial. Our method does not have severe time-step stability constraints and its computational cost per time step is comparable to that of an explicit scheme. The discretization is pseudo-spectral in space, and multistep BDF in time. We
conduct numerical experiments to investigate the stability, accuracy and the computational cost of the algorithm. Overall, our method achieves several orders of magnitude speed-up compared to standard explicit schemes..



A numerical method for simulating the dynamics of 3D axisymmetric vesicles suspended in viscous flow
S. Veerapaneni, D. Gueyffier, G. Biros and D. Zorin
Journal of Computational Physics, Vol. 228(19), 2009.

We extend ``A boundary integral method for simulating the dynamics of inextensible vesicles suspended in a viscous fluid in 2D'', Veerapaneni et al. Journal of Computational Physics, 228(7), 2009 to the case of three dimensional axisymmetric vesicles of spherical or toroidal topology immersed in viscous flows. Although the main components of the algorithm are similar in spirit to the 2D case---spectral approximation in space, semi-implicit time-stepping scheme---the main differences are that the bending and viscous force require new analysis, the linearization for the semi-implicit schemes must be rederived, a fully implicit scheme must be used for the toroidal topology to eliminate a CFL-type restriction, and a novel numerical scheme for the evaluation of the 3D Stokes single-layer potential on an axisymmetric surface is necessary to speed up the calculations. By introducing these novel components, we obtain a time-scheme that experimentally is unconditionally stable, has low cost per time step, and is third-order accurate in time. We present numerical results to analyze the cost and convergence rates of the scheme. To verify the solver, we compare it to a constrained variational approach to compute equilibrium shapes that does not involve interactions with a viscous fluid. To illustrate the applicability of method, we consider a few vesicle-flow interaction problems: the sedimentation of a vesicle, interactions of one and three vesicles with a background Poiseuille flow.


A boundary integral method for simulating the dynamics of inextensible vesicles suspended in a viscous fluid in 2D
S. Veerapaneni, D. Gueyffier, D. Zorin and G. Biros
Journal of Computational Physics, Vol. 228(7), 2009.

We present a new method for the evolution of inextensible vesicles immersed in a Stokesian fluid. We use a boundary integral formulation
for the fluid that results in a set of nonlinear integro-differential equations for the vesicle dynamics. The motion of the vesicles is
determined by balancing the nonlocal hydrodynamic forces with the elastic forces due to bending and tension. Numerical simulations of
such vesicle motions are quite challenging. On one hand, explicit time-stepping schemes suffer from a severe stability constraint due to
the stiffness related to high-order spatial derivatives and a milder constraint due to a transport-like stability condition. On the other
hand, an implicit scheme can be expensive because it requires the solution of a set of nonlinear equations at each time step. We
present two semi-implicit schemes that circumvent the severe stability constraints on the time step and whose computational cost per time
step is comparable to that of an explicit scheme. We discretize the equations by using a spectral method in space, and a multistep
third-order accurate scheme in time. We use the fast multipole method (FMM) to efficiently compute vesicle-vesicle interaction forces in a
suspension with a large number of vesicles. We report results from numerical experiments that demonstrate the convergence and algorithmic complexity properties of our scheme.


Analytical and numerical solutions for shapes of quiescent 2D vesicles
S. Veerapaneni, R. Raj, G. Biros and P. K. Purohit
International Journal of Non-linear Mechanics, Vol. 44(3), 2009.

We describe an analytic method for the computation of equilibrium shapes for two-dimensional vesicles characterized by a Helfrich elastic energy. We derive boundary value problems and solve them analytically in terms of elliptic functions and elliptic integrals. We derive solutions by prescribing length and area, or displacements and angle boundary conditions. The solutions are compared to solutions obtained by a boundary integral equation-based numerical scheme. Our method enables the identification of different configurations of deformable vesicles and accurate calculation of their shape, bending moments, tension, and the pressure jump across the vesicle membrane. Furthermore, we perform numerical experiments that indicate that all these configurations are stable minima.


The Chebyshev fast Gauss and nonuniform fast Fourier transforms and their application to the evaluation of distributed heat potentials
S. Veerapaneni and G. Biros
Journal of Computational Physics, Vol. 227 (16), 2008.

We present a method for the fast and accurate computation of distributed heat potentials in two dimensions. The distributed
source is assumed to be given in terms of piecewise space–time Chebyshev polynomials. We discretize uniformly in
time, whereas in space the polynomials are defined on the leaf nodes of a quadtree. The quadtree can vary at each time
step. We combine a product integration rule with fast algorithms (fast heat potentials, nonuniform FFT, fast Gauss transform)
to obtain a high-order accurate method with optimal complexity. If N is the number of time steps, M is the maximum
number of leaf nodes over all the time steps and the input contains a qth-order polynomial representation of f, then,
our method requires O(q^3 MN logM) work to evaluate the heat potential at arbitrary MN space–time target locations. The
overall convergence rate of the method is of order q. We present numerical experiments for q = 4, 8, and 16, and we verify
the theoretical convergence rate of the method. When the solution is sufficiently smooth, the 16th-order variant results in
significant computational savings, even in the case in which we require only a few digits of accuracy.


A fast high-order integral equation solver for the heat equation with moving boundaries in 1D
S. Veerapaneni and G. Biros
SIAM Journal on Scientific Computing, Vol. 29(1), 2007.

We describe a fast high-order accurate method for the solution of the heat equation in domains with moving Dirichlet or Neumann boundaries and distributed forces. We assume that the motion of the boundary is prescribed. Our method extends the work of L. Greengard and J. Strain, "A fast algorithm for the evaluation of heat potentials", Comm. Pure & Applied Math. 1990. Our scheme is based on a time-space Chebyshev pseudo-spectral collocation discretization, which is combined with a recursive product quadrature rule to accurately and efficiently approximate convolutions with the Green's function for the heat equation. We present numerical results that exhibit up to eighth-order convergence rates. Assuming N time steps and M spatial discretization points, the evaluation of the solution of the heat equation at the same number of points in space-time requires O(NM logM) work. Thus, our scheme can be characterized as "fast", that is, it is work-optimal up to a logarithmic factor.