Tuesday, July 30, 2013

GSoC Part 2

Now follows the second part of the Google Summer of Code. Here is what I planned to do in that time.

Goals for the Final Evaluation 2013-09-16 (Monday, "Suggested 'pencils down' date") :

  • Implementations for IC(0) and ICT completed
  • Wrapper file for MATLAB compatibility (ichol.m) finished
  • Small test cases for all incomplete-factorizations
  • Documentation of incomplete LU-factorization types
  • Comprehensive test cases for all types of incomplete-factorization types (also  proofing compatibility to MATLAB)
  • (optional) Complex version for the incomplete-factorizations

TODOs:

Monday, July 29, 2013

Midterm summary

Today it is time for the midterm and to summarize what has already been done. Here is again my favorite picture to show my progress and afterwards some text.


















Goals for the Midterm:

  • Integration of ITSOL completed (iluk.cc, ilut.cc, ilutp.cc, iluc.cc), callable from Octave
  • Wrapper file for MATLAB compatibility (ilu.m)
  • Small test cases for incomplete LU-factorization, showing their correctness
  • Documentation of incomplete LU-factorization types
  • Comprehensive test cases for all types of incomplete LU-factorization types (also  proofing compatibility to MATLAB)
  • (optional) Complex version for the incomplete-factorizations
All in all I'm satisfied with the first part of this GSoC. I wrote about 2374 SLOC (rough wc -l estimate), patched two libraries and modified some config files. The major part of my work except the integration into the Octave build system can be downloaded from here to try it out without needing a development version.

Going from the bottom up in the picture, I'm not an expert, but very familiar with ITSOL and ZITSOL now. I created patches for both libraries (also included into my archive above) that one can use them on a unix-like system, but so far they are only tested on my *buntu systems. A special part of this work is, that I patched a whole ZILUC function into ZITSOL, which should be used with care. All other complex version are mainly implemented by switching the data type. For small examples this approach works at ILUC as well, but some differences where observable.

With patched libraries one can go an level up, where I created interfaces to use the most relevant function from the preconditioner libraries. For mastering this task I needed to get to know the different matrix storage formats of Octave and (Z)ITSOL and had some trouble with the matrix conversion. Building upon Wei Jins initial work I successfully interfaced ILUK, ILUT and ILUC + their complex versions. They all deliver comparable results to Matlab. Sadly the authors of (Z)ITSOL seem to be busy at this moment, so I still don't have any clue how to get ILUTP to run. This will tried to be solved in the second part of GSoC, too.

When you have an interface, it still needs to be integrated in the big Octave code base. Therefore I got familiar with the Octave build system and made all the sufficient changes, including checking for the presence of ITSOL and ZITSOL and making sure that all necessary files are build by default.

On the last level there is "only" one m-file ensuring the compatibility to MATLABs ilu-function. As one can see from the picture, I couldn't make everything green because of two reasons. First ILUTP is missing and second I cannot include an efficient MILU (modified ILU, see http://www.mathworks.de/de/help/matlab/ref/ilu.html) without changing the code of ITSOL entirely. The following two examples work in Octave and MATLAB the same, especially the last example reveals the benefit of preconditioning.

Example 1

A = gallery('neumann', 1600) + speye(1600);
setup.type = 'nofill';
nnz(A)
ans = 7840

nnz(lu(A))
ans = 126478

nnz(ilu(A,setup))
ans = 7840

Example 2

A = gallery ('wathen', 10, 10);
b = sum (A,2);
tol = 1e-8;
maxit = 50;
opts.type = 'crout';
opts.droptol = 1e-4;
[L, U] = ilu (A, opts);
x = bicg (A, b, tol, maxit, L, U);
norm(A * x - b, inf)

All written functions contain much documentation, callable with doc ilu* and comprehensive test cases, which you can try out from the archive with
test ilu
test src/iluc
test src/iluk
test src/ilut
because *.oct file and source file are in separate directories.

Besides coding, I additionally had the pleasure to join this years OctConf in Milan. I was given the opportunity to know better the community and held a short presentation about my project.

Finally I want to thank the Octave community for supporting me a lot during the initial part of my project work and I really look forward for the second part and the time after this!

Sunday, July 21, 2013

A little spring-cleaning

In order not to introduce too many new files for the ILU preconditioners in libinterp/dldfcn I made heavy use of C++ templates and merged real and complex versions into one file. This decision has been supported by the fact, that nearly all steps for real and complex input are the same, just function names and data types have to be changed accordingly. I hope not to be the only one agreeing on this trade-off between code duplication and clarity :-/

The integration of ITSOL and ZITSOL into GNU/Octave so far can be summarized as follows:

         ITSOL         ZITSOL
ILUK*
yes
yes
ILUT
yes
yes
ILUC*
yes
 experimental** 
ILUTP*
no
no
* Function is important for the MATLAB compatibility wrapper ilu.m
** Own complex implementation analogous to real version

The current interface can be obtained from this archive (does not require a development version of GNU/Octave) or from my repository.

Friday, July 19, 2013

Becoming complex with ZTISOL

Two open tasks are remaining for me until the midterm next week. I want to write more comprehensive tests and I have to get ILUTP to work. For the latter I don't think to succeed without further support by the authors of ITSOL, which I already contacted.

While waiting for a response, Nir and I decided to focus on my optional goal of complex ILU versions which are realized in the ZITSOL library. It is straight forward to interface them analogous to the real versions. After tiding up my code, I will submit the changes to my development repository this weekend. As a first step I spend some time getting ZITSOL in a appropriate library format analog to the ITSOL library.

Build the ZITSOL-library from source

  1. Get the source tar-archive
  2. Extract the archive: tar -xvf ZITSOL_1.tar.gz
  3. Get my patch
  4. Apply the patch to the extracted archive:
    • cd ZITSOL_1
    • patch -p1 < itsol.patch
  5. Build the library using: make
    • Now the shared and static library are created in ZITSOL_1
  6. Copy the four header files from ZITSOL_1 and ZITSOL_1/LIB to /usr/include/zitsol (root rights may be required)
  7. Copy the two libraries from ZITSOL_1 to /usr/lib
  8. Make symbolic links
    • ln -s libzitsol.so.1.0.0 libzitsol.so.1
    • ln -s libzitsol.so.1 libzitsol.so

Sunday, July 14, 2013

A MATLAB wrapper

Now there exists a wrapper ilu.m which should guarantee compatibility to calls of the ilu function from MATLAB. After an input checking the types are assigned as follows:

  • 'nofill' --> iluk()
  • 'crout' --> iluc()
  • 'ilutp' --> ilut() // the real ilutp still doesn't work for me...
ilu.m still needs lots of test cases. To create them, I want to compare MATLABs behavior, which I have access to in my university. A little weird is, that the second test case taken from the MATLAB ilu description, which I posted earlier about, doesn't seem to work any longer. Maybe that has something to do with the conversion changes? I'll check this. Additionally the crout version example does not produce the same output, as MATLAB does. This might have the reason, that I cannot offer the modified ILU (MILU) option presented there.

To get an overview of my work, see the updated schedule.

Monday, July 8, 2013

The transposition "problem"

A little late, but here is an excerpt of what I did last week. During the interface development I was confronted with the problem of transposing the input and output matrices, as described in my previous blog entry.

With some help of my Master Thesis mentor we finally figured out how to use ITSOL without transpositions.

1. What does ITSOL internally?

ITSOL doesn't perform a "classical" LU-decomposition

$ \mathbf{A} = \mathbf{L}\mathbf{U} $,

it performs a memory efficient decomposition of the form:

$ \mathbf{L'} = \begin{pmatrix}
0 & & & \\
& 0 & & \\
& x & \dots & \\
x & x & & 0 \\
\end{pmatrix} $

$ \mathbf{U'} = \begin{pmatrix}
0 & & & x \\
& 0 & & x \\
&  & \dots & \\
 &  & & 0 \\
\end{pmatrix} $

$ \mathbf{D'} = \begin{pmatrix}
d_1 \\
d_2 \\
\dots \\
d_n \\
\end{pmatrix} $

Keep in mind, that only the non-zero elements $ \mathbf{x} \in \mathbb{R} \neq 0 $ are explicitly stored in the ITSOL compressed row storage format. To obtain the "classical" LU-decomposition, one has to add a diagonal of ones to L' and the diagonal D' to U'.

$ A = \underbrace{\begin{pmatrix}
1 & & & \\
& 1 & & \\
& x & \dots & \\
x & x & & 1 \\
\end{pmatrix}}_{\mathbf{L}}\cdot \underbrace{\begin{pmatrix}
d_1 & & & x \\
& d_2 & & x \\
&  & \dots & \\
 &  & & d_n \\
\end{pmatrix}}_{\mathbf{U}} $

That is what Wei Jin has already done in his first version of the ITSOL-interface. The drawback of his approach is, that one has to convert the input Matrix A first into the coordinate format (COO) and afterwards into the ITSOL compressed row storage format.

2. How to avoid format conversions and transpositions?

Now comes the tricky part. We found out, that the values of L' and U' are scaled with D' during ITSOLs factorization process. This is important to know, when we do an implicit transposition while converting from Octaves column storage format to ITSOLs row storage format and vice versa.

Using ITSOL standalone we would get a decomposition like

$ \left[\mathbf{L'}, \mathbf{D'},\mathbf{U'}\right] = iluk(\mathbf{A}) $

and I just described how to get from there to the "classical" LU-decomposition. With implicitly transposition of A by converting the input from Octaves format, we get

$ \left[\mathbf{L''}, \mathbf{D'},\mathbf{U''}\right] = iluk(\mathbf{A^{T}}) $.

Note that the diagonal D' is the same in both cases and L'' and U'' are scaled. While converting L'', D' and U'' back to Octaves format, we perform another implicit transposition, thus one will receive $ \mathbf{L''^{T}} $ and $ \mathbf{U''^{T}} $ back in the Octave workspace.

Now the final recipe:

  1. Ignore implicit transposition of input Matrix A (second decomposition case).
  2. Scale $\quad \mathbf{L'} = diag(\mathbf{D'}) \cdot \mathbf{L''} \quad$  and $\quad \mathbf{U'} = \mathbf{U''} \cdot diag(\mathbf{1/D'})  \quad$
  3. Swap $\quad \mathbf{L'} = \mathbf{U'} \quad$ and $\quad \mathbf{U'} = \mathbf{L'} \quad$. This has the effect, that with the implicit transposition back to Octave the triangular shape of both matrices is correct.
  4. Add the diagonal of ones and the diagonal D' like described above.

3. Implementation details

Now some final words where to find this in my code. In the file itsol_util.h one can find the methods for fast conversion from the Octave to ITSOL format itsol_convert_to_cs() itsol_util.h:71 that does nothing more than copying the values to the ITSOL data structure.

The conversion back to Octave resulted in the functions itsol_convert_upper_cs_to_octave() itsol_util.h:135 and itsol_convert_lower_cs_to_octave() itsol_util.h:205. The scaling is not performed as a full matrix multiplication. It is quickly done while converting to the Octave format in itsol_util.h:230 for example. So the overhead is minimal compared to a matrix transposition. The swapping is realized by just changing the names. For example iluk.cc:109:

SparseMatrix L = itsol_convert_lower_cs_to_octave (lu->U, scale, add_diag);

4. The results



more off
clc

A = sprand(4000,4000,1e-3) + speye(4000);
iter = 1000;
% For MATLAB only
%opt.type = 'nofill';
%opt.milu = 'off';

% Time for a transposition
time = cputime();
for i = 1:iter
B=A';
end
time = (cputime() - time) / iter;
fprintf(1, 'Time for transposition:\t%e\n', time);

% Time for the factorization
time = cputime();
for i = 1:iter
% For Octave only
[L, U] = iluk(A);
% For MATLAB only
%[L, U] = ilu(A, opt);
end
time = (cputime() - time) / iter;
fprintf(1, 'Time for iluk:\t\t%e\n', time);

GNU/Octave (3.7.5):

  • Time for transposition: 
    • 3.000190e-04 s
  • Time for iluk (with transposition):
    • 6.028377e-03 s
  • Time for iluk (without transposition):
    • 4.480280e-03 s

MATLAB R2012b:

  • Time for transposition: 
    • 3.200000e-04 s
  • Time for iluk (without transposition):
    • 6.200000e-04 s