Enable build support by adding .onedev-buildspec.yml
asset Loading last commit info...
.clang-format
.gitignore
CSRMatrix.h
CSRMatrixHelper.h
DirectSolver.cpp
DirectSolver.h
Laplacian.cpp
Laplacian.h
Makefile
Parameters.h
README.md
Timer.h
Utilities.cpp
Utilities.h
main.cpp
report.pdf
README.md

COMP SCI 557 Homework 5

Compile Environment

  • CPU: 11th Gen Intel(R) Core(TM) i5-11500 @ 2.70GHz

  • Memory: 16 GB

  • Compiler: icc (ICC) 19.0.5.281 20190815

  • OS: Ubuntu 22.04.5 LTS (Jammy Jellyfish)

  • MKL: /s/intelcompilers-2019/amd64_rhel6/compilers_and_libraries_2019.5.281/linux/mkl

  • Compile commands (See Makefile for details):

    $ make -j
      icc Laplacian.cpp main.cpp DirectSolver.cpp Utilities.cpp -Wall -O2 -qopenmp -Wl,--start-group /s/intelcompilers-2019/amd64_rhel6/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64/libmkl_intel_lp64.a /s/intelcompilers-2019/amd64_rhel6/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64/libmkl_core.a /s/intelcompilers-2019/amd64_rhel6/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -o main
    

Runtime Environment

  • CPU: AMD Ryzen 7 5800H with Radeon Graphics
  • CPU Cache:
    • L1d: 256 KiB (8 instances)
    • L1i: 256 KiB (8 instances)
    • L2: 4 MiB (8 instances)
    • L3: 16 MiB (1 instance)
  • Memory: Configured Memory Speed: 3200 MT/s, dual channel
  • OS: Ubuntu 24.04.2 LTS (Noble Numbat)
  • Required Libraries: libomp.so.5
    • Install command: sudo apt install libomp-dev
  • Runtime Environment Selection Explanation: Since CSL machine cannot output steady timing result, I chose to use my own server to run the program. Since I only have machines with AMD CPU, the improvement of replacing hand-code kernel call with MKL library call might not be too obvious.

Task A

Matrix (64 * 64 * 64)Sparsity of the L FactorOperations Needed for FactorizationBandwidth Achieved during FactorizationRuntime for the Solve Step
Default (Nested Dissection)92983874241.855850165.7996220.021083
Minimum Degree Algorithm187686049981.144287187.7791140.059186
No Permutation9019287633449.17627011.0311590.830340

Commentary on Task A

As we can see from the table, nested dissection partitions the domain to keep each pivot block as localized as possible, so it dramatically reduces fill.

Minimum degree tries to pick pivots that minimally increase the degree (connectivity) of the remaining matrix, reducing fill over naive methods.

No permutation leaves the domain points in a naive order, so any pivot can “connect” distant rows/columns during factorization, causing large amounts of fill.


Task B

Runtime(s)
k = 10.020741
k = 100.109536
k = 1000.463952

Graph for k and runtime

Commentary on Task B

In this example, the cost is NOT increasing sub-linearly, this is likely due to the overhead of starting the timer.

However, if we increase k to 1000, we could easily the increase of the cost is indeed sub-linear.

Verifying Correctness

The correctness of my implementation is verified by inspecting if the output image ./x.0000.pgm is the same as the original version from the course repository.

Please wait...
Page is in error, reload to recover