ROOT /

History

Enable build support by adding .onedev-buildspec.yml

asset	Loading last commit info...
.clang-format
.gitignore
CSRMatrix.h
CSRMatrixHelper.h
DirectSolver.cpp
DirectSolver.h
Laplacian.cpp
Laplacian.h
Makefile
Parameters.h
README.md
README.pdf
Timer.h
Utilities.cpp
Utilities.h
main.cpp
report.pdf

README.md

COMP SCI 557 Homework 5

Compile Environment

CPU: 11th Gen Intel(R) Core(TM) i5-11500 @ 2.70GHz
Memory: 16 GB
Compiler: icc (ICC) 19.0.5.281 20190815
OS: Ubuntu 22.04.5 LTS (Jammy Jellyfish)
MKL: /s/intelcompilers-2019/amd64_rhel6/compilers_and_libraries_2019.5.281/linux/mkl

Compile commands (See Makefile for details):

$ make -j
  icc Laplacian.cpp main.cpp DirectSolver.cpp Utilities.cpp -Wall -O2 -qopenmp -Wl,--start-group /s/intelcompilers-2019/amd64_rhel6/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64/libmkl_intel_lp64.a /s/intelcompilers-2019/amd64_rhel6/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64/libmkl_core.a /s/intelcompilers-2019/amd64_rhel6/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -o main

Runtime Environment

CPU: AMD Ryzen 7 5800H with Radeon Graphics
CPU Cache:
- L1d: 256 KiB (8 instances)
- L1i: 256 KiB (8 instances)
- L2: 4 MiB (8 instances)
- L3: 16 MiB (1 instance)
Memory: Configured Memory Speed: 3200 MT/s, dual channel
OS: Ubuntu 24.04.2 LTS (Noble Numbat)
Required Libraries: libomp.so.5
- Install command: sudo apt install libomp-dev
Runtime Environment Selection Explanation: Since CSL machine cannot output steady timing result, I chose to use my own server to run the program. Since I only have machines with AMD CPU, the improvement of replacing hand-code kernel call with MKL library call might not be too obvious.

Task A

What was the resulting sparsity of the L factor (PARDISO reports it as the "number of non-zeros in L") ?

Default (Nested Dissection): 92,983,874 non-zeros
Minimum Degree: 187,686,049 non-zeros
No Permutation: 901,928,763 non-zeros

How many operations are needed for the factorization itself (PARDISO reports it as the "gflops for the numerical factorization") ?

Default (Nested Dissection): 241.855850 GFLOP
Minimum Degree: 981.144287 GFLOP
No Permutation: 3449.176270 GFLOP

What was the bandwidth achieved during factorization (PARDISO reports it at the "gflops/s for the numerical factorization") ?

Default (Nested Dissection): 165.799622 GFLOP/S
Minimum Degree: 187.779114 GFLOP/S
No Permutation: 11.031159 GFLOP/S

What was the run-time (in seconds) for the solve step (forward/backward substitution)? PARDISO reports this as "Time spent in direct solver at solve step (solve)" in Phase 3.

Default (Nested Dissection): 0.021083 seconds
Minimum Degree: 0.059186 seconds
No Permutation: 0.830340 seconds

Matrix (64 * 64 * 64)	Sparsity of the L Factor	Operations Needed for Factorization	Bandwidth Achieved during Factorization	Runtime for the Solve Step
Default (Nested Dissection)	92983874	241.855850	165.799622	0.021083
Minimum Degree Algorithm	187686049	981.144287	187.779114	0.059186
No Permutation	901928763	3449.176270	11.031159	0.830340

Commentary on Task A

As we can see from the table, nested dissection partitions the domain to keep each pivot block as localized as possible, so it dramatically reduces fill.

Minimum degree tries to pick pivots that minimally increase the degree (connectivity) of the remaining matrix, reducing fill over naive methods.

No permutation leaves the domain points in a naive order, so any pivot can “connect” distant rows/columns during factorization, causing large amounts of fill.

Task B

	Runtime(s)
k = 1	0.020741
k = 10	0.109536
k = 100	0.463952

Graph for k and runtime

Commentary on Task B

In this example, the cost is NOT increasing sub-linearly, this is likely due to the overhead of starting the timer.

However, if we increase k to 1000, we could easily the increase of the cost is indeed sub-linear.

Verifying Correctness

The correctness of my implementation is verified by inspecting if the output image ./x.0000.pgm is the same as the original version from the course repository.