Monday, July 8, 2013

The transposition "problem"

A little late, but here is an excerpt of what I did last week. During the interface development I was confronted with the problem of transposing the input and output matrices, as described in my previous blog entry.

With some help of my Master Thesis mentor we finally figured out how to use ITSOL without transpositions.

1. What does ITSOL internally?

ITSOL doesn't perform a "classical" LU-decomposition

$ \mathbf{A} = \mathbf{L}\mathbf{U} $,

it performs a memory efficient decomposition of the form:

$ \mathbf{L'} = \begin{pmatrix}
0 & & & \\
& 0 & & \\
& x & \dots & \\
x & x & & 0 \\
\end{pmatrix} $

$ \mathbf{U'} = \begin{pmatrix}
0 & & & x \\
& 0 & & x \\
&  & \dots & \\
 &  & & 0 \\
\end{pmatrix} $

$ \mathbf{D'} = \begin{pmatrix}
d_1 \\
d_2 \\
\dots \\
d_n \\
\end{pmatrix} $

Keep in mind, that only the non-zero elements $ \mathbf{x} \in \mathbb{R} \neq 0 $ are explicitly stored in the ITSOL compressed row storage format. To obtain the "classical" LU-decomposition, one has to add a diagonal of ones to L' and the diagonal D' to U'.

$ A = \underbrace{\begin{pmatrix}
1 & & & \\
& 1 & & \\
& x & \dots & \\
x & x & & 1 \\
\end{pmatrix}}_{\mathbf{L}}\cdot \underbrace{\begin{pmatrix}
d_1 & & & x \\
& d_2 & & x \\
&  & \dots & \\
 &  & & d_n \\
\end{pmatrix}}_{\mathbf{U}} $

That is what Wei Jin has already done in his first version of the ITSOL-interface. The drawback of his approach is, that one has to convert the input Matrix A first into the coordinate format (COO) and afterwards into the ITSOL compressed row storage format.

2. How to avoid format conversions and transpositions?

Now comes the tricky part. We found out, that the values of L' and U' are scaled with D' during ITSOLs factorization process. This is important to know, when we do an implicit transposition while converting from Octaves column storage format to ITSOLs row storage format and vice versa.

Using ITSOL standalone we would get a decomposition like

$ \left[\mathbf{L'}, \mathbf{D'},\mathbf{U'}\right] = iluk(\mathbf{A}) $

and I just described how to get from there to the "classical" LU-decomposition. With implicitly transposition of A by converting the input from Octaves format, we get

$ \left[\mathbf{L''}, \mathbf{D'},\mathbf{U''}\right] = iluk(\mathbf{A^{T}}) $.

Note that the diagonal D' is the same in both cases and L'' and U'' are scaled. While converting L'', D' and U'' back to Octaves format, we perform another implicit transposition, thus one will receive $ \mathbf{L''^{T}} $ and $ \mathbf{U''^{T}} $ back in the Octave workspace.

Now the final recipe:

  1. Ignore implicit transposition of input Matrix A (second decomposition case).
  2. Scale $\quad \mathbf{L'} = diag(\mathbf{D'}) \cdot \mathbf{L''} \quad$  and $\quad \mathbf{U'} = \mathbf{U''} \cdot diag(\mathbf{1/D'})  \quad$
  3. Swap $\quad \mathbf{L'} = \mathbf{U'} \quad$ and $\quad \mathbf{U'} = \mathbf{L'} \quad$. This has the effect, that with the implicit transposition back to Octave the triangular shape of both matrices is correct.
  4. Add the diagonal of ones and the diagonal D' like described above.

3. Implementation details

Now some final words where to find this in my code. In the file itsol_util.h one can find the methods for fast conversion from the Octave to ITSOL format itsol_convert_to_cs() itsol_util.h:71 that does nothing more than copying the values to the ITSOL data structure.

The conversion back to Octave resulted in the functions itsol_convert_upper_cs_to_octave() itsol_util.h:135 and itsol_convert_lower_cs_to_octave() itsol_util.h:205. The scaling is not performed as a full matrix multiplication. It is quickly done while converting to the Octave format in itsol_util.h:230 for example. So the overhead is minimal compared to a matrix transposition. The swapping is realized by just changing the names. For example iluk.cc:109:

SparseMatrix L = itsol_convert_lower_cs_to_octave (lu->U, scale, add_diag);

4. The results



more off
clc

A = sprand(4000,4000,1e-3) + speye(4000);
iter = 1000;
% For MATLAB only
%opt.type = 'nofill';
%opt.milu = 'off';

% Time for a transposition
time = cputime();
for i = 1:iter
B=A';
end
time = (cputime() - time) / iter;
fprintf(1, 'Time for transposition:\t%e\n', time);

% Time for the factorization
time = cputime();
for i = 1:iter
% For Octave only
[L, U] = iluk(A);
% For MATLAB only
%[L, U] = ilu(A, opt);
end
time = (cputime() - time) / iter;
fprintf(1, 'Time for iluk:\t\t%e\n', time);

GNU/Octave (3.7.5):

  • Time for transposition: 
    • 3.000190e-04 s
  • Time for iluk (with transposition):
    • 6.028377e-03 s
  • Time for iluk (without transposition):
    • 4.480280e-03 s

MATLAB R2012b:

  • Time for transposition: 
    • 3.200000e-04 s
  • Time for iluk (without transposition):
    • 6.200000e-04 s

No comments:

Post a Comment