DocsLinear Algebra

Linear Algebra Tools

GPU-accelerated linear algebra operations powered by ArrayFire.

Overview

Matrix Operations

Multiplication, inversion, decomposition

Eigenvalue Problems

Eigenvalues, eigenvectors, SVD

Solvers

Linear systems, least squares

Utilities

Norms, ranks, conditions

Matrix Operations

import cyxwiz.linalg as la

# Matrix multiplication
C = la.matmul(A, B)
C = A @ B  # Operator overload

# Transpose
At = la.transpose(A)
At = A.T

# Inverse
A_inv = la.inv(A)

# Determinant
det = la.det(A)

# Trace
tr = la.trace(A)

Matrix Decompositions

# SVD (Singular Value Decomposition)
U, S, Vt = la.svd(A)
U, S, Vt = la.svd(A, full_matrices=False)  # Economy

# Eigendecomposition
eigenvalues, eigenvectors = la.eig(A)
eigenvalues = la.eigvals(A)  # Values only

# LU Decomposition
P, L, U = la.lu(A)

# QR Decomposition
Q, R = la.qr(A)

# Cholesky (for positive definite matrices)
L = la.cholesky(A)

Linear System Solvers

# Direct solve (Ax = b)
x = la.solve(A, b)

# Least squares
x, residuals, rank, s = la.lstsq(A, b)

# Pseudo-inverse
A_pinv = la.pinv(A)
x = A_pinv @ b

Norms and Properties

# Norms
fro_norm = la.norm(A, 'fro')  # Frobenius
spectral_norm = la.norm(A, 2)  # Spectral
nuclear_norm = la.norm(A, 'nuc')  # Nuclear

# Properties
rank = la.matrix_rank(A)
cond = la.cond(A)  # Condition number

Performance Comparison

Benchmarks for 1000x1000 matrix multiplication:

Backend	Time	Speedup
CPU (single-threaded)	2,500 ms	1x
CPU (multi-threaded)	450 ms	5.6x
OpenCL (Intel UHD)	120 ms	20.8x
CUDA (RTX 3060)	15 ms	166x
CUDA (RTX 4090)	5 ms	500x

Node Editor Integration

Node	Inputs	Outputs	GPU
MatMul	A, B	C = A @ B	Yes
Transpose	A	A^T	Yes
Inverse	A	A^-1	Yes
SVD	A	U, S, V	Yes
Solve	A, b	x	Yes

Best Practices

Numerical Stability

Check condition number before solving
Use appropriate decomposition
Prefer SVD for rank-deficient matrices
Use Cholesky for positive definite (2x faster)

Memory Management

Use in-place operations when possible
Release intermediate results promptly
Batch operations to minimize transfers
Use appropriate precision (float32 vs float64)