1
00:00:00,500 --> 00:00:01,950
The following
content is provided
2
00:00:01,950 --> 00:00:06,090
by MIT OpenCourseWare under
a Creative Commons license.
3
00:00:06,090 --> 00:00:08,330
Additional information
about our license
4
00:00:08,330 --> 00:00:10,470
and MIT OpenCourseWare
in general,
5
00:00:10,470 --> 00:00:11,930
is available at ocw.mit.edu.
6
00:00:17,100 --> 00:00:20,310
PROFESSOR: Up on the web
now are lots of sections,
7
00:00:20,310 --> 00:00:22,280
including this
conjugate gradients,
8
00:00:22,280 --> 00:00:27,960
so today is the end of the
conjugate gradients discussion,
9
00:00:27,960 --> 00:00:33,000
and it's kind of a
difficult algorithm.
10
00:00:33,000 --> 00:00:37,590
It's beautiful and
slightly tricky,
11
00:00:37,590 --> 00:00:42,610
but if I just see what the
point is of conjugate gradients,
12
00:00:42,610 --> 00:00:47,590
we don't have to check that the
exact formulas that I've listed
13
00:00:47,590 --> 00:00:52,440
here -- so that's one step
of conjugate gradients:
14
00:00:52,440 --> 00:01:00,480
going from k minus 1 to k, and
we'll look at each of the five
15
00:01:00,480 --> 00:01:06,650
steps in the cycle, without
patiently checking all those
16
00:01:06,650 --> 00:01:07,590
formulas.
17
00:01:07,590 --> 00:01:09,780
We'll just, sort of, see
how much work is involved.
18
00:01:09,780 --> 00:01:17,040
But let me begin where we were
last time, which was Arnoldi.
19
00:01:17,040 --> 00:01:20,250
So Arnoldi was a
Gram-Schmidt-like algorithm,
20
00:01:20,250 --> 00:01:26,350
not exactly Gram-Schmidt, that
produced an orthogonal basis,
21
00:01:26,350 --> 00:01:30,430
and that basis is the basis
for the Krylov subspace.
22
00:01:30,430 --> 00:01:38,620
So each new basis vector
in the Krylov subspace
23
00:01:38,620 --> 00:01:42,170
is found by multiplying
by another power of A,
24
00:01:42,170 --> 00:01:44,620
but that's a terrible basis.
25
00:01:44,620 --> 00:01:50,920
So the basis that we get from
these guys has to be improved,
26
00:01:50,920 --> 00:01:54,320
orthogonalized, and
that's what Arnoldi does.
27
00:01:54,320 --> 00:01:58,790
And if we look to see
what it does matrix-wise,
28
00:01:58,790 --> 00:02:02,900
it turns out to be exactly
-- have that nice equation,
29
00:02:02,900 --> 00:02:08,490
that we're creating a matrix
Q with the orthogonal vectors,
30
00:02:08,490 --> 00:02:14,320
and a matrix H that tells
us how they were connected
31
00:02:14,320 --> 00:02:19,880
to the matrix A. OK.
32
00:02:19,880 --> 00:02:33,270
So that's the central equation,
and the properties of H
33
00:02:33,270 --> 00:02:35,310
are important.
34
00:02:35,310 --> 00:02:42,900
So we saw that the key property
was, in general, we've got --
35
00:02:42,900 --> 00:02:45,620
it's full up in
that upper corner,
36
00:02:45,620 --> 00:02:49,320
just the way
Gram-Schmidt would be.
37
00:02:49,320 --> 00:02:53,830
But in case A is a
symmetric matrix,
38
00:02:53,830 --> 00:02:58,180
then we can check that H will
be symmetric from this formula.
39
00:02:58,180 --> 00:03:00,040
And the fact that
it's symmetric means
40
00:03:00,040 --> 00:03:03,290
that those are zeros
up there, because we
41
00:03:03,290 --> 00:03:05,750
know there are zeros down here.
42
00:03:05,750 --> 00:03:07,940
We know there's
only one diagonal
43
00:03:07,940 --> 00:03:09,570
below the main diagonal.
44
00:03:12,290 --> 00:03:14,050
And if it's symmetric,
then there's
45
00:03:14,050 --> 00:03:19,740
only one diagonal above,
and that's the great case,
46
00:03:19,740 --> 00:03:24,540
the symmetric case.
47
00:03:24,540 --> 00:03:29,120
So I'm going to take
two seconds just to say,
48
00:03:29,120 --> 00:03:31,820
here is the best way
to find eigenvalues.
49
00:03:31,820 --> 00:03:37,210
We haven't spoken one word
about computing eigenvalues,
50
00:03:37,210 --> 00:03:42,170
but this is the moment
to say that word.
51
00:03:42,170 --> 00:03:45,710
If I have a symmetric matrix,
eigenvalues are way, way easier
52
00:03:45,710 --> 00:03:48,320
for symmetric matrices.
53
00:03:48,320 --> 00:03:51,940
Linear systems are also
easier, but eigenvalues
54
00:03:51,940 --> 00:03:56,700
are an order of magnitude
easier for symmetric matrices.
55
00:03:56,700 --> 00:04:00,300
And here is a really good
way to find the eigenvalues
56
00:04:00,300 --> 00:04:02,420
of large matrices.
57
00:04:02,420 --> 00:04:05,440
I assume that you don't
want all the eigenvalues.
58
00:04:05,440 --> 00:04:12,160
I mean, that is very unusual
to need many, many eigenvalues
59
00:04:12,160 --> 00:04:14,090
of a big matrix.
60
00:04:14,090 --> 00:04:20,500
You're looking -- probably those
high eigenvalues are not really
61
00:04:20,500 --> 00:04:25,300
physically close, good
representations of the actual
62
00:04:25,300 --> 00:04:28,330
eigenvalues of the
continuous problem.
63
00:04:28,330 --> 00:04:30,480
So, we have continuous
and discrete
64
00:04:30,480 --> 00:04:34,650
that are close for
the low eigenvalues,
65
00:04:34,650 --> 00:04:40,810
because that's where
the error is smooth.
66
00:04:40,810 --> 00:04:45,690
High eigenvalues
are not so reliable.
67
00:04:45,690 --> 00:04:51,450
So for the low eigenvalues,
here is a good way to do them.
68
00:04:51,450 --> 00:04:59,540
Do Arnoldi, and it gets named
also, now, after Lanczos,
69
00:04:59,540 --> 00:05:03,210
because he had this
eigenvalue idea.
70
00:05:03,210 --> 00:05:07,560
Run that for a while.
71
00:05:07,560 --> 00:05:13,430
Say if you want 10 eigenvalues,
maybe go up to j equal to 20.
72
00:05:13,430 --> 00:05:16,490
So 20 Arnoldi steps,
quite reasonable.
73
00:05:16,490 --> 00:05:22,150
And then look at the 20
by 20 submatrix of H,
74
00:05:22,150 --> 00:05:25,910
so you're not going to
go to the end, to H_n.
75
00:05:25,910 --> 00:05:28,600
You're going to
stop at some point,
76
00:05:28,600 --> 00:05:31,180
and you have this nice matrix.
77
00:05:34,770 --> 00:05:38,060
It'll be tri-diagonal,
symmetric tri-diagonal,
78
00:05:38,060 --> 00:05:41,500
and that's the kind of
matrix we know how to find
79
00:05:41,500 --> 00:05:46,920
the eigenvalues of; so this
is symmetric and tri-diagonal,
80
00:05:46,920 --> 00:05:56,050
and eigenvalues of that class
of matrices are very nice,
81
00:05:56,050 --> 00:06:00,110
and they're very meaningful,
and they're very close to --
82
00:06:00,110 --> 00:06:04,800
in many cases; I won't try
to give a theory here --
83
00:06:04,800 --> 00:06:07,330
in many cases, you get
very good approximations.
84
00:06:07,330 --> 00:06:10,030
The eigenvalues
of this submatrix
85
00:06:10,030 --> 00:06:12,630
are sometimes
called Ritz values,
86
00:06:12,630 --> 00:06:18,550
and they use the eigenvalues
of this submatrix,
87
00:06:18,550 --> 00:06:24,430
and those are often
called Ritz values.
88
00:06:24,430 --> 00:06:29,170
And so the first 10 eigenvalues
of that 20 by 20 matrix,
89
00:06:29,170 --> 00:06:35,990
would be in many, many
problems very fast, very
90
00:06:35,990 --> 00:06:40,850
efficient approximations
to the low eigenvalues
91
00:06:40,850 --> 00:06:44,270
of the big matrix A
that you start with.
92
00:06:44,270 --> 00:06:48,790
OK, the reason is
that eigenvalues --
93
00:06:48,790 --> 00:06:51,380
do you all recognize
that eigenvalues --
94
00:06:51,380 --> 00:06:55,940
that this is the operation, when
I bring Q inverse over here,
95
00:06:55,940 --> 00:07:00,890
that I would call these
matrices similar --
96
00:07:00,890 --> 00:07:08,200
two matrices are similar if
there's a Q that connects them
97
00:07:08,200 --> 00:07:09,270
this way.
98
00:07:09,270 --> 00:07:14,860
And similar matrices
have the same lambdas,
99
00:07:14,860 --> 00:07:16,120
the same eigenvalues.
100
00:07:18,710 --> 00:07:22,790
So the full matrix Q, if we
went all the way to the end,
101
00:07:22,790 --> 00:07:26,070
we could find its eigenvalues,
but the whole point
102
00:07:26,070 --> 00:07:30,760
of all this, of this
month, is algorithms
103
00:07:30,760 --> 00:07:32,900
that you can stop
partway, and you still
104
00:07:32,900 --> 00:07:35,180
have something really good.
105
00:07:35,180 --> 00:07:42,560
OK, so that's -- I think
we can't do everything,
106
00:07:42,560 --> 00:07:45,570
so I won't try to do more with
eigenvalues except to mention
107
00:07:45,570 --> 00:07:46,070
that.
108
00:07:46,070 --> 00:07:50,370
Someday you may want the
eigenvalues of a large matrix,
109
00:07:50,370 --> 00:08:00,340
and I guess ARPACK would
be the Arnoldi pack, that
110
00:08:00,340 --> 00:08:06,030
would be the package to go
to for the Arnoldi iteration,
111
00:08:06,030 --> 00:08:09,640
and then to use for
the eigenvalues.
112
00:08:09,640 --> 00:08:17,930
OK, so that's a side point,
just since linear algebra is --
113
00:08:17,930 --> 00:08:20,990
half of it's linear systems
and half of it's eigenvalue
114
00:08:20,990 --> 00:08:26,490
problems, I didn't think I
could miss the chance to do that
115
00:08:26,490 --> 00:08:28,420
second half, the
eigenvalue part.
116
00:08:28,420 --> 00:08:33,750
OK, now I'm going to
tackle conjugate gradients,
117
00:08:33,750 --> 00:08:35,330
to get the idea.
118
00:08:35,330 --> 00:08:38,035
OK, so the idea,
well, of course, A
119
00:08:38,035 --> 00:08:42,900
is symmetric positive definite.
120
00:08:42,900 --> 00:08:47,460
If it's not, you could
try these formulas,
121
00:08:47,460 --> 00:08:52,780
but you'd be taking
a pretty big chance.
122
00:08:52,780 --> 00:08:56,530
If it is, these are just right.
123
00:08:56,530 --> 00:08:59,850
OK, so what's the
main point about --
124
00:08:59,850 --> 00:09:08,300
here's the rule that
decides what x_k will be.
125
00:09:08,300 --> 00:09:15,490
x_k will be the vector in
the Krylov -- of course,
126
00:09:15,490 --> 00:09:22,440
x_k is in this Krylov space K_k.
127
00:09:25,040 --> 00:09:32,490
So it'll be -- we can
create x_k recursively,
128
00:09:32,490 --> 00:09:35,290
and we should need,
just, if we do it right,
129
00:09:35,290 --> 00:09:38,710
should need just one
multiplication by A at every
130
00:09:38,710 --> 00:09:42,970
step, and then
some inner product.
131
00:09:42,970 --> 00:09:47,140
OK, so now, what do
we know from this?
132
00:09:47,140 --> 00:09:53,090
This vector, since it has
this, it starts with --
133
00:09:53,090 --> 00:09:56,900
it has an x_k that's in
the k-th Krylov subspace,
134
00:09:56,900 --> 00:10:01,950
but now it's multiplied by A, so
that's up to the k plus first.
135
00:10:01,950 --> 00:10:08,520
And b is in K_1, it's in
all the Krylov subspaces,
136
00:10:08,520 --> 00:10:18,980
so this is in -- and therefore
r_k is in -- K_(k+1), right?
137
00:10:18,980 --> 00:10:22,070
This is in the next space
up, because there's a
138
00:10:22,070 --> 00:10:25,660
multiplication by
A. And therefore,
139
00:10:25,660 --> 00:10:30,580
if we choose to determine
it by this rule,
140
00:10:30,580 --> 00:10:36,120
we learn that this
vector -- I mean,
141
00:10:36,120 --> 00:10:44,340
we already know from Arnoldi
that q_(k+1) is orthogonal
142
00:10:44,340 --> 00:10:46,470
to this space.
143
00:10:46,470 --> 00:10:53,680
This is the new k plus first
vector that's in K_(k+1),
144
00:10:53,680 --> 00:10:56,710
so this is just where
this is to be found.
145
00:10:56,710 --> 00:11:02,160
So this vector r_k is found
in the next Krylov subspace;
146
00:11:02,160 --> 00:11:05,240
this one is in the
next Krylov subspace.
147
00:11:05,240 --> 00:11:09,060
They're both orthogonal to all
the previous Krylov subspaces,
148
00:11:09,060 --> 00:11:11,150
so one's a multiple
of the other.
149
00:11:14,180 --> 00:11:17,820
So, that will mean
that automatically,
150
00:11:17,820 --> 00:11:22,430
that the r's have the
same property that Arnoldi
151
00:11:22,430 --> 00:11:25,550
created for the q's:
they're orthogonal.
152
00:11:25,550 --> 00:11:29,920
So the residuals are orthogonal.
153
00:11:29,920 --> 00:11:32,540
So, orthogonal residuals.
154
00:11:32,540 --> 00:11:39,850
Orthogonal residuals, good.
155
00:11:43,970 --> 00:11:48,050
I better say, by the way, when
we do conjugate gradients,
156
00:11:48,050 --> 00:11:52,590
we don't also do Arnoldi, so
Arnoldi's finding these q's;
157
00:11:52,590 --> 00:11:55,740
we're leaving him
behind for a while,
158
00:11:55,740 --> 00:11:57,590
because we're going
to go directly
159
00:11:57,590 --> 00:12:00,430
to the x's and r's here.
160
00:12:00,430 --> 00:12:05,420
So, I won't know the q's but
I know a lot about the q,
161
00:12:05,420 --> 00:12:10,720
so it's natural to write this
simple fact: that each residual
162
00:12:10,720 --> 00:12:13,750
is a multiple of the new
q, and therefore, it's
163
00:12:13,750 --> 00:12:16,770
orthogonal to all the old ones.
164
00:12:16,770 --> 00:12:23,410
And now I want to -- this is the
other property that determines
165
00:12:23,410 --> 00:12:25,190
the new x.
166
00:12:25,190 --> 00:12:29,670
So the new x should be
-- what do I see here,
167
00:12:29,670 --> 00:12:36,620
I see a sort of delta x, the
change in x, the correction --
168
00:12:36,620 --> 00:12:40,620
which is what I want to
compute -- at step k,
169
00:12:40,620 --> 00:12:42,750
that's what I have to find.
170
00:12:42,750 --> 00:12:46,640
And I could look at the
corrections at the old steps,
171
00:12:46,640 --> 00:12:50,010
and notice there's
an A in the middle.
172
00:12:50,010 --> 00:12:52,270
So the corrections
to the x's are
173
00:12:52,270 --> 00:12:58,410
orthogonal in the
A inner products,
174
00:12:58,410 --> 00:13:01,680
and that's where the name
conjugate comes from.
175
00:13:01,680 --> 00:13:11,190
So this is conjugate directions.
176
00:13:11,190 --> 00:13:15,650
Why do I say directions,
and why do I say conjugate?
177
00:13:15,650 --> 00:13:17,650
Conjugate gradients,
even, I could say.
178
00:13:17,650 --> 00:13:19,790
Why did I --
conjugate directions,
179
00:13:19,790 --> 00:13:22,430
I maybe shouldn't
have erased that,
180
00:13:22,430 --> 00:13:28,390
but I'll also say
conjugate gradients.
181
00:13:28,390 --> 00:13:30,690
OK.
182
00:13:30,690 --> 00:13:34,920
So conjugate, all I
mean by conjugate,
183
00:13:34,920 --> 00:13:42,060
is orthogonal in the natural
inner product that goes with
184
00:13:42,060 --> 00:13:44,370
the problem, and
that's given by A.
185
00:13:44,370 --> 00:13:47,900
So the inner product given
by the problem is the inner
186
00:13:47,900 --> 00:13:54,070
product of x and y,
is x transposed A*y.
187
00:13:54,070 --> 00:14:00,080
And that's a good inner product,
and it's the one that comes
188
00:14:00,080 --> 00:14:04,750
with the problem and
so, natural to look
189
00:14:04,750 --> 00:14:09,570
at orthogonality in that
sense, and that's what we have.
190
00:14:09,570 --> 00:14:12,750
So these are orthogonal
in the usual sense,
191
00:14:12,750 --> 00:14:15,580
because they have an A
already built into them,
192
00:14:15,580 --> 00:14:19,160
and these are orthogonal
in the A inner product.
193
00:14:19,160 --> 00:14:21,510
OK.
194
00:14:21,510 --> 00:14:26,720
So I wrote down two
requirements on the new x.
195
00:14:26,720 --> 00:14:29,060
Well, actually you might
say, I wrote down a whole lot
196
00:14:29,060 --> 00:14:32,010
of requirements, because I'm
saying that this should hold
197
00:14:32,010 --> 00:14:33,960
for all the previous i's.
198
00:14:33,960 --> 00:14:37,420
And this should hold for
all the previous i's.
199
00:14:37,420 --> 00:14:39,200
And you see indices
are getting in here,
200
00:14:39,200 --> 00:14:45,310
and I'm asking
for your patience.
201
00:14:45,310 --> 00:14:47,890
What's the point that
I want to make here?
202
00:14:47,890 --> 00:14:52,470
It's this short recurrence
point, this point that we had,
203
00:14:52,470 --> 00:14:56,760
over here, when h was
just tri-diagonal,
204
00:14:56,760 --> 00:14:59,220
so we only needed
-- at the new step,
205
00:14:59,220 --> 00:15:06,080
the only things we had
to find were the --
206
00:15:06,080 --> 00:15:10,650
say at the second step, we just
had to find that h and that h,
207
00:15:10,650 --> 00:15:14,200
because this one was
the same as that.
208
00:15:14,200 --> 00:15:21,410
And then the next one, we --
maybe I didn't say that right.
209
00:15:21,410 --> 00:15:24,870
I guess when we --
let me start again.
210
00:15:24,870 --> 00:15:29,540
At the first step I need
to find two numbers.
211
00:15:29,540 --> 00:15:31,430
Now because H is
symmetric, that's
212
00:15:31,430 --> 00:15:32,900
already the same as that.
213
00:15:32,900 --> 00:15:35,560
So at the next step I
need these two numbers,
214
00:15:35,560 --> 00:15:37,600
and then that number
is already that.
215
00:15:37,600 --> 00:15:42,560
So, two numbers at every
step and big zeros up here.
216
00:15:42,560 --> 00:15:46,490
That's that's the key
point when A is symmetric.
217
00:15:46,490 --> 00:15:49,480
So the same will
be true over here.
218
00:15:49,480 --> 00:15:58,190
We have two numbers, and
they're called alpha and beta,
219
00:15:58,190 --> 00:15:59,820
so they're a little
different numbers,
220
00:15:59,820 --> 00:16:04,020
because we're working with the
x's now and not with the q's,
221
00:16:04,020 --> 00:16:09,330
so you don't see any q's written
down for conjugate gradients.
222
00:16:09,330 --> 00:16:11,060
OK.
223
00:16:11,060 --> 00:16:15,940
All right, let me just look
at the cycle of five steps.
224
00:16:18,660 --> 00:16:28,170
So I have to give it a start,
so I start with d_0 as b and x_0
225
00:16:28,170 --> 00:16:36,940
as 0, and I guess, r_0 which,
of course, is b minus A*x_0.
226
00:16:36,940 --> 00:16:40,300
If that's 0, it's also b.
227
00:16:40,300 --> 00:16:41,460
So those are the starts.
228
00:16:44,630 --> 00:16:47,460
So I'm ready to take a step.
229
00:16:47,460 --> 00:16:51,570
I know the ones
with subscript zero,
230
00:16:51,570 --> 00:16:54,110
and I'm ready for the next step.
231
00:16:54,110 --> 00:16:56,540
I know the ones with
subscript k minus 1,
232
00:16:56,540 --> 00:17:01,050
and I'm ready for this
step, this cycle, k.
233
00:17:04,760 --> 00:17:06,370
So, here's a number.
234
00:17:09,610 --> 00:17:12,900
So I'm coming into this -- well
let me say maybe how I should
235
00:17:12,900 --> 00:17:13,490
say it.
236
00:17:13,490 --> 00:17:18,460
I'm coming into that cycle
with a new d, now what is a d?
237
00:17:18,460 --> 00:17:20,650
It's a search direction.
238
00:17:20,650 --> 00:17:22,570
It's the way I'm going to go.
239
00:17:22,570 --> 00:17:26,260
It's the direction in which
this delta x is going to go.
240
00:17:26,260 --> 00:17:29,640
This delta x is going to be,
probably I'll see it here.
241
00:17:29,640 --> 00:17:34,950
Yeah, there is delta x,
is a multiple of d, right?
242
00:17:34,950 --> 00:17:39,650
If I put this on this
side, that's delta x,
243
00:17:39,650 --> 00:17:42,520
the change in x is
a multiple of d,
244
00:17:42,520 --> 00:17:46,130
so d is the direction
in n-dimensional space
245
00:17:46,130 --> 00:17:49,840
that I'm looking
for the correction,
246
00:17:49,840 --> 00:17:53,210
and alpha is the
number, is how far
247
00:17:53,210 --> 00:17:58,650
I go along d for the best,
for the right, delta x.
248
00:17:58,650 --> 00:18:06,190
So alpha will be determined
by, probably by that.
249
00:18:06,190 --> 00:18:11,740
Probably by that, yeah, it
involves an inner product,
250
00:18:11,740 --> 00:18:13,950
picking off a component.
251
00:18:13,950 --> 00:18:14,730
OK?
252
00:18:14,730 --> 00:18:18,000
So the first step is
to find the number;
253
00:18:18,000 --> 00:18:22,170
the next step is to take
that update, delta x is
254
00:18:22,170 --> 00:18:24,740
the right multiple of
the search direction
255
00:18:24,740 --> 00:18:26,460
that I'm coming in with.
256
00:18:26,460 --> 00:18:30,020
Then correcting r is easy;
I'll come back to that;
257
00:18:30,020 --> 00:18:31,810
if I know the
change in x, I know
258
00:18:31,810 --> 00:18:35,390
the change in r right away.
259
00:18:35,390 --> 00:18:39,600
So that's given me,
I'm now updated.
260
00:18:39,600 --> 00:18:43,040
But now I have to look ahead.
261
00:18:43,040 --> 00:18:47,390
What direction am I
going to travel next?
262
00:18:47,390 --> 00:18:54,390
So the next steps are to find a
number beta and then the search
263
00:18:54,390 --> 00:19:01,620
direction, which isn't r; it's r
again with just one correction.
264
00:19:01,620 --> 00:19:05,310
So it's a beautiful set
of formulas, beautiful.
265
00:19:05,310 --> 00:19:09,610
And I could write -- you know,
because I know x's and r's, I
266
00:19:09,610 --> 00:19:16,250
could write the formulas --
there are different expressions
267
00:19:16,250 --> 00:19:20,895
for the same alpha and beta,
but this is a very satisfactory
268
00:19:20,895 --> 00:19:21,550
one.
269
00:19:21,550 --> 00:19:22,730
OK.
270
00:19:22,730 --> 00:19:26,820
Have a look at that
cycle just to see what
271
00:19:26,820 --> 00:19:28,840
how much work is involved.
272
00:19:28,840 --> 00:19:31,380
How much work is
involved in that cycle?
273
00:19:31,380 --> 00:19:35,490
OK, well, there's
a multiplication
274
00:19:35,490 --> 00:19:38,340
by A, no surprise.
275
00:19:38,340 --> 00:19:40,050
It's that
multiplication by A that
276
00:19:40,050 --> 00:19:43,480
takes us to the
next Krylov space,
277
00:19:43,480 --> 00:19:46,600
takes us to the next update.
278
00:19:46,600 --> 00:19:50,030
So I see a multiplication by
A, and also here, but it's
279
00:19:50,030 --> 00:19:51,940
the same multiplication.
280
00:19:51,940 --> 00:19:59,090
So I see one multiplication
by A. So, easy to list.
281
00:19:59,090 --> 00:20:05,500
So each cycle, there's
one multiplication A
282
00:20:05,500 --> 00:20:12,460
times the search direction, and
because A is a sparse matrix,
283
00:20:12,460 --> 00:20:18,450
the cost there is the number
of non-zeros in the matrix.
284
00:20:18,450 --> 00:20:20,260
OK.
285
00:20:20,260 --> 00:20:24,520
Then, what other computations
do you have to do?
286
00:20:24,520 --> 00:20:32,020
I see an inner product there,
and I'm going to use it again.
287
00:20:32,020 --> 00:20:35,560
I guess, and here it
is at the next step,
288
00:20:35,560 --> 00:20:42,040
so I guess per cycle, I have
to do one inner product of r
289
00:20:42,040 --> 00:20:47,030
with itself, and I have to do
one of these inner products.
290
00:20:47,030 --> 00:20:53,270
So I see two inner products,
and of course, these
291
00:20:53,270 --> 00:20:54,720
are vectors of length n.
292
00:20:54,720 --> 00:21:02,550
so that's 2n multiplications,
2n additions.
293
00:21:02,550 --> 00:21:08,930
OK, this was a multiple of
n, depending how sparse A is,
294
00:21:08,930 --> 00:21:14,326
this is 2n adds and 2n
multiplies for these two
295
00:21:14,326 --> 00:21:14,950
inner products.
296
00:21:14,950 --> 00:21:18,050
OK.
297
00:21:18,050 --> 00:21:18,550
Yeah.
298
00:21:18,550 --> 00:21:23,450
I guess you could
say take them there.
299
00:21:23,450 --> 00:21:25,400
And then what else
do we have to do?
300
00:21:25,400 --> 00:21:27,980
Oh, we have these
updates, so I have
301
00:21:27,980 --> 00:21:32,230
to multiply a vector
by that scalar and add.
302
00:21:32,230 --> 00:21:35,480
And again here, I multiply
that vector by that scalar
303
00:21:35,480 --> 00:21:37,610
and subtract.
304
00:21:37,610 --> 00:21:40,310
Oh, and do I have
to do it again here?
305
00:21:40,310 --> 00:21:41,550
Have I got three?
306
00:21:41,550 --> 00:21:44,560
Somehow, I thought I
might only have two.
307
00:21:44,560 --> 00:21:51,670
Oh, yeah, let me say two
or three vector updates.
308
00:21:58,410 --> 00:22:07,120
People are much better than I
am at spotting the right way
309
00:22:07,120 --> 00:22:09,890
to organize those calculations.
310
00:22:09,890 --> 00:22:17,470
But that's really a very
small price to pay per cycle,
311
00:22:17,470 --> 00:22:24,980
and so if it converges
quickly, as it normally does,
312
00:22:24,980 --> 00:22:26,630
this is going to be
a good algorithm.
313
00:22:26,630 --> 00:22:30,020
Maybe I'll -- maybe having
introduced the question of how
314
00:22:30,020 --> 00:22:33,130
quickly does it converge, I
should maybe say something
315
00:22:33,130 --> 00:22:34,950
about that.
316
00:22:34,950 --> 00:22:40,520
So what's the error,
after k steps?
317
00:22:40,520 --> 00:22:43,900
How is it related to
the error at the start?
318
00:22:46,830 --> 00:22:49,810
So, I have to first say, what
do I mean by this measure
319
00:22:49,810 --> 00:22:57,150
of error, and then I
have to say what's the --
320
00:22:57,150 --> 00:23:01,620
I hope I see a factor, less
than 1, to the k power.
321
00:23:01,620 --> 00:23:05,780
Yeah, so I'm hoping some factor
to the k power will be there,
322
00:23:05,780 --> 00:23:09,090
and I sure hope
it's less than 1.
323
00:23:09,090 --> 00:23:10,770
Let me say what it is.
324
00:23:10,770 --> 00:23:15,080
I think -- so this is maybe
not the very best estimate,
325
00:23:15,080 --> 00:23:17,080
but it shows the main point.
326
00:23:17,080 --> 00:23:21,060
It's the square
root of lambda_max
327
00:23:21,060 --> 00:23:24,920
minus the square
root of lambda_min,
328
00:23:24,920 --> 00:23:26,980
divided by the
square root of that
329
00:23:26,980 --> 00:23:30,110
plus the square root of that.
330
00:23:30,110 --> 00:23:34,260
And maybe there's a
factor 2 somewhere.
331
00:23:34,260 --> 00:23:39,130
So this is one estimate
for the convergence factor,
332
00:23:39,130 --> 00:23:44,060
and you see what it depends
on, I could divide everything
333
00:23:44,060 --> 00:23:48,940
by lambda_min there, so that I
could write that as the square
334
00:23:48,940 --> 00:23:54,120
root of the condition number --
I'll divide by that -- minus 1,
335
00:23:54,120 --> 00:23:58,100
divided by the square root of
the condition number plus 1,
336
00:23:58,100 --> 00:23:59,150
to the k power.
337
00:24:01,950 --> 00:24:08,660
So the condition number has a
part in this error estimate,
338
00:24:08,660 --> 00:24:14,620
and I won't derive that,
even in the notes, I won't.
339
00:24:14,620 --> 00:24:18,450
And other people work out
other estimates for the error,
340
00:24:18,450 --> 00:24:20,590
it's an interesting problem.
341
00:24:20,590 --> 00:24:24,680
And I do have to say
that this norm is
342
00:24:24,680 --> 00:24:33,380
the natural inner
product, e_k squared
343
00:24:33,380 --> 00:24:37,110
is the inner product
of e_k with itself,
344
00:24:37,110 --> 00:24:39,970
but again with A in the middle.
345
00:24:39,970 --> 00:24:44,470
I suppose that a little
part of the message here,
346
00:24:44,470 --> 00:24:51,250
and on that middle board, is
that it's just natural to see
347
00:24:51,250 --> 00:24:54,620
the matrix A playing
a part in the --
348
00:24:54,620 --> 00:24:57,370
it just comes in naturally.
349
00:24:57,370 --> 00:25:00,810
Conjugate gradients
brings it in naturally.
350
00:25:00,810 --> 00:25:07,650
These formulas are just
created so that this will work.
351
00:25:07,650 --> 00:25:09,990
OK.
352
00:25:09,990 --> 00:25:13,150
So I suppose I'm
thinking there's
353
00:25:13,150 --> 00:25:17,160
one word I haven't
really justified,
354
00:25:17,160 --> 00:25:18,690
and that's the word gradient.
355
00:25:18,690 --> 00:25:21,590
Why do we call it
conjugate gradient?
356
00:25:21,590 --> 00:25:25,480
Here I erased --
"conjugate directions"
357
00:25:25,480 --> 00:25:28,720
I was quite happy
with, the directions
358
00:25:28,720 --> 00:25:38,450
or the d's, because this is
the direction in which we moved
359
00:25:38,450 --> 00:25:43,100
from x, so conjugate
directions would be just fine,
360
00:25:43,100 --> 00:25:49,830
and those words come into this
part of numerical mathematics,
361
00:25:49,830 --> 00:25:51,990
but where do gradients come in?
362
00:25:51,990 --> 00:25:53,430
OK.
363
00:25:53,430 --> 00:25:54,880
So, let me say,
gradient of what?
364
00:25:57,860 --> 00:26:02,570
Well, what is --
so first of all,
365
00:26:02,570 --> 00:26:15,580
gradient of what is in these
linear problems where A*x equal
366
00:26:15,580 --> 00:26:16,780
b?
367
00:26:16,780 --> 00:26:19,380
Those come from the
gradient of the energy.
368
00:26:19,380 --> 00:26:22,470
So can I say E of
x for the energy?
369
00:26:22,470 --> 00:26:27,940
As 1/2 x transpose A*x
minus b transpose x?
370
00:26:35,030 --> 00:26:36,700
You might say, where
did that come from?
371
00:26:41,240 --> 00:26:43,860
Normally, we've been talking
about linear systems,
372
00:26:43,860 --> 00:26:47,530
everything's been in terms of
A*x equal b, and now suddenly,
373
00:26:47,530 --> 00:26:52,220
I'm changing to a different
part of mathematics.
374
00:26:52,220 --> 00:26:56,520
I'm looking at minimizing
-- so optimization,
375
00:26:56,520 --> 00:27:02,440
the part that's coming in
April, is minimizing energy,
376
00:27:02,440 --> 00:27:07,040
minimizing function,
and what's the link?
377
00:27:07,040 --> 00:27:13,120
Well, if I minimize that,
the gradient of this is --
378
00:27:13,120 --> 00:27:18,560
how shall I write? grad
E, the gradient of E,
379
00:27:18,560 --> 00:27:28,030
the list of the derivatives
of E with respect to the x_i,
380
00:27:28,030 --> 00:27:33,650
and if we work that out -- well,
why not pretend it's scalar
381
00:27:33,650 --> 00:27:36,310
for the moment, so
these are just numbers.
382
00:27:36,310 --> 00:27:43,250
Then this ordinary derivative
of 1/2 A x squared is A*x,
383
00:27:43,250 --> 00:27:51,950
and the derivative of b*x is
b, and that rule continues
384
00:27:51,950 --> 00:27:53,740
into the vector case.
385
00:27:53,740 --> 00:27:57,910
So what have we got here?
386
00:27:57,910 --> 00:28:02,190
We've computed, we've
introduced the natural energy,
387
00:28:02,190 --> 00:28:05,650
so this is a natural
energy for the problem,
388
00:28:05,650 --> 00:28:12,470
and the reason it's natural
is when we minimize it,
389
00:28:12,470 --> 00:28:19,240
set the derivatives to zero, we
got the equation A*x equal b,
390
00:28:19,240 --> 00:28:21,820
so that would be
zero at a minimum.
391
00:28:21,820 --> 00:28:27,190
This would equal zero at a min,
so what I'm doing right now
392
00:28:27,190 --> 00:28:29,350
is pretty serious.
393
00:28:29,350 --> 00:28:35,860
On the other hand, I'll repeat
it again in April when we do
394
00:28:35,860 --> 00:28:39,500
minimizations,
but this is the --
395
00:28:39,500 --> 00:28:45,160
this quadratic energy
has a linear gradient,
396
00:28:45,160 --> 00:28:49,440
and its minimum is
exactly A*x equal b.
397
00:28:49,440 --> 00:28:52,340
The minimum point
is A*x equal b.
398
00:28:52,340 --> 00:28:55,570
In other words, solving
the linear equation
399
00:28:55,570 --> 00:28:59,260
and minimizing the energy
are one and the same, one
400
00:28:59,260 --> 00:29:00,740
and the same.
401
00:29:00,740 --> 00:29:04,940
And I'm allowed to
use the word min,
402
00:29:04,940 --> 00:29:11,420
because A is positive definite,
so the positive definiteness
403
00:29:11,420 --> 00:29:15,930
of A is what means that
this is really a minimum
404
00:29:15,930 --> 00:29:18,790
and not a maximum
or a saddle point.
405
00:29:18,790 --> 00:29:19,560
OK.
406
00:29:19,560 --> 00:29:25,280
So I was just -- I wanted
just to think about how do you
407
00:29:25,280 --> 00:29:26,700
minimize a function?
408
00:29:29,250 --> 00:29:34,230
I guess any sensible
person, given a function
409
00:29:34,230 --> 00:29:39,310
and looking for the minimum,
would figure out the gradient
410
00:29:39,310 --> 00:29:43,380
and move that way, move,
well, not up the gradient,
411
00:29:43,380 --> 00:29:47,090
move down the gradient, so
I imagine this minimization
412
00:29:47,090 --> 00:29:54,170
problem, I imagine here I'm
in x, this is the x_1, x_2,
413
00:29:54,170 --> 00:30:00,920
the x's; here's the E, E of
x is something like that,
414
00:30:00,920 --> 00:30:03,240
maybe its minimum is there.
415
00:30:03,240 --> 00:30:09,410
This is the graph of E of x, and
I'm trying to find that point,
416
00:30:09,410 --> 00:30:14,600
and I make as first
guess, somewhere there.
417
00:30:14,600 --> 00:30:16,980
Somewhere on that surface.
418
00:30:16,980 --> 00:30:25,580
Maybe, so to help your
eye, this is like a bowl,
419
00:30:25,580 --> 00:30:28,610
and maybe there's
a point, so that's
420
00:30:28,610 --> 00:30:31,480
on the surface of the bowl.
421
00:30:31,480 --> 00:30:34,920
That's x_0, let's say.
422
00:30:34,920 --> 00:30:41,640
I'm trying to get to the bottom
of the bowl, and as I said,
423
00:30:41,640 --> 00:30:43,900
any sensible person
would figure out, well,
424
00:30:43,900 --> 00:30:46,340
what's the steepest way down.
425
00:30:46,340 --> 00:30:48,960
The steepest way down is the
direction of the gradient.
426
00:30:51,670 --> 00:30:59,470
So I could call this gradient
g, if I wanted, and then I
427
00:30:59,470 --> 00:31:01,380
would go in the
direction of minus g,
428
00:31:01,380 --> 00:31:03,420
because I want to
go down and not up.
429
00:31:03,420 --> 00:31:07,310
I mean, we're climbing,
I shouldn't say climbing,
430
00:31:07,310 --> 00:31:14,150
we're descending a mountain,
and trying to get to the --
431
00:31:14,150 --> 00:31:15,870
or descending a valley, sorry.
432
00:31:15,870 --> 00:31:17,790
We're descending into
a valley and trying
433
00:31:17,790 --> 00:31:20,400
to get to the bottom.
434
00:31:20,400 --> 00:31:22,430
And I'm assuming
that this valley
435
00:31:22,430 --> 00:31:33,310
is this nice shape given by
just a second-degree polynomial.
436
00:31:33,310 --> 00:31:38,220
Of course, when we
get to optimization,
437
00:31:38,220 --> 00:31:40,670
this will be the simplest
problem, but not the only one.
438
00:31:40,670 --> 00:31:42,560
Here it's our problem.
439
00:31:42,560 --> 00:31:47,400
Now, of course -- so
what happens if I move
440
00:31:47,400 --> 00:31:54,570
in the direction of -- the
direction water would flow.
441
00:31:54,570 --> 00:31:58,270
If I tip a flask of
water on the ground,
442
00:31:58,270 --> 00:32:01,730
it's going to flow in
the steepest direction,
443
00:32:01,730 --> 00:32:05,860
and that would be the
natural direction to go.
444
00:32:09,490 --> 00:32:12,430
But it's not the best direction.
445
00:32:12,430 --> 00:32:18,960
Maybe the first step is, but
-- so I'm talking now about
446
00:32:18,960 --> 00:32:23,940
conjugate gradient seen
as a minimization problem,
447
00:32:23,940 --> 00:32:26,780
because that's where the
word gradient is justified.
448
00:32:26,780 --> 00:32:31,310
This is the gradient,
and of course, this
449
00:32:31,310 --> 00:32:35,890
is just minus the
residual, so this is
450
00:32:35,890 --> 00:32:37,340
the direction of the residual.
451
00:32:37,340 --> 00:32:39,410
So that's the
question, should I just
452
00:32:39,410 --> 00:32:42,980
go in the direction of
the residual all the time?
453
00:32:42,980 --> 00:32:46,600
At every step -- and this is
what the method of steepest
454
00:32:46,600 --> 00:32:50,900
descent will do, so let
me make the contrast.
455
00:32:50,900 --> 00:32:55,790
Steepest descent
is the first thing
456
00:32:55,790 --> 00:33:01,300
you would think
of, direction is r.
457
00:33:05,630 --> 00:33:08,820
That's the gradient direction,
or the negative gradient
458
00:33:08,820 --> 00:33:09,720
direction.
459
00:33:09,720 --> 00:33:16,130
Follow r until, in that
direction, you've hit bottom.
460
00:33:16,130 --> 00:33:18,900
So you'll go down this,
you see what happens,
461
00:33:18,900 --> 00:33:22,920
you'll be going down one
side, and you'll hit bottom,
462
00:33:22,920 --> 00:33:28,040
and then you will come up
again on a plane that's
463
00:33:28,040 --> 00:33:30,370
cutting through the surface.
464
00:33:30,370 --> 00:33:34,000
But you're not going to
hit that, first shot.
465
00:33:34,000 --> 00:33:36,110
When you start here,
see water would --
466
00:33:36,110 --> 00:33:40,710
the actual steepest direction,
the gradient direction,
467
00:33:40,710 --> 00:33:44,210
changes as the water
flows down to the bottom,
468
00:33:44,210 --> 00:33:48,610
but here we've chosen
a fixed direction;
469
00:33:48,610 --> 00:33:52,170
we're following it as
long as we're going down,
470
00:33:52,170 --> 00:33:54,060
but then it'll start up again.
471
00:33:54,060 --> 00:34:02,920
So we stop there and find
the gradient again there.
472
00:34:02,920 --> 00:34:05,520
Follow that down,
until it goes up again;
473
00:34:05,520 --> 00:34:07,650
follow that down,
until it goes up again,
474
00:34:07,650 --> 00:34:14,260
and the trouble is
convergence isn't great.
475
00:34:14,260 --> 00:34:16,800
The trouble is that
we end up -- even
476
00:34:16,800 --> 00:34:19,400
if we're in
100-dimensional space,
477
00:34:19,400 --> 00:34:29,080
we end up sort of just -- our
repeated gradients are not
478
00:34:29,080 --> 00:34:31,260
really in good, new directions.
479
00:34:31,260 --> 00:34:38,110
We find ourselves doing a lot of
work to make a little progress,
480
00:34:38,110 --> 00:34:46,670
and the reason is that these
r's don't have the property
481
00:34:46,670 --> 00:34:52,300
that you're always looking for
in computations, orthogonality.
482
00:34:52,300 --> 00:34:56,010
Somehow orthogonality tells
you that you're really
483
00:34:56,010 --> 00:35:01,450
getting something new, if it's
orthogonal to all the old ones.
484
00:35:01,450 --> 00:35:02,360
OK.
485
00:35:02,360 --> 00:35:09,570
Now, so this, the
direction r is not great.
486
00:35:14,870 --> 00:35:17,780
The better one is to
have some orthogonality.
487
00:35:17,780 --> 00:35:31,410
Take a direction d that
-- well, here's the point.
488
00:35:34,590 --> 00:35:36,810
This is the right direction;
this is the direction
489
00:35:36,810 --> 00:35:38,420
conjugate gradients choose.
490
00:35:38,420 --> 00:35:41,550
This is where --
so it's a gradient,
491
00:35:41,550 --> 00:35:47,340
but then it removes the
component in the direction we
492
00:35:47,340 --> 00:35:50,010
just took.
493
00:35:50,010 --> 00:35:52,120
That's what that beta will be.
494
00:35:52,120 --> 00:35:58,170
When we compute that beta,
and I'm a little surprised
495
00:35:58,170 --> 00:36:00,840
that it isn't a minus sign.
496
00:36:00,840 --> 00:36:04,360
No, it's OK.
497
00:36:04,360 --> 00:36:07,780
It turns out that
way, let me just
498
00:36:07,780 --> 00:36:10,860
be sure that I wrote
the formula down right,
499
00:36:10,860 --> 00:36:16,600
at least wrote it down
as I have it here, yep.
500
00:36:16,600 --> 00:36:19,410
Again, I'm not
checking the formulas,
501
00:36:19,410 --> 00:36:22,910
but just describing the ideas.
502
00:36:22,910 --> 00:36:25,980
So the idea is that
this formula gives us
503
00:36:25,980 --> 00:36:33,400
a new direction, which has this
property two that we're after.
504
00:36:33,400 --> 00:36:36,440
That the direction, it's
not actually orthogonal,
505
00:36:36,440 --> 00:36:40,520
it's A-orthogonal to
the previous direction.
506
00:36:40,520 --> 00:36:44,230
OK, I'm going to draw one
picture of what -- whoops,
507
00:36:44,230 --> 00:36:47,740
not on that -- of
these directions,
508
00:36:47,740 --> 00:36:54,170
and then that's my best I can
do with conjugate gradients.
509
00:36:54,170 --> 00:37:00,940
I want to try to
draw this search
510
00:37:00,940 --> 00:37:05,330
for the bottom of the valley.
511
00:37:05,330 --> 00:37:08,000
So I'm going to draw
that search by drawing
512
00:37:08,000 --> 00:37:11,250
the level sets of the function.
513
00:37:11,250 --> 00:37:17,450
I'm going to -- yeah, somehow
the contours of my function
514
00:37:17,450 --> 00:37:26,320
are, since my function here is
quadratic, all the contours,
515
00:37:26,320 --> 00:37:28,660
the level sets will be ellipses.
516
00:37:28,660 --> 00:37:31,070
They'll all be, sort of,
like similar ellipses,
517
00:37:31,070 --> 00:37:33,360
and there is the one I want.
518
00:37:33,360 --> 00:37:35,530
That's the bottom of the valley.
519
00:37:35,530 --> 00:37:38,150
So this is the contour
where energy three,
520
00:37:38,150 --> 00:37:42,120
this is the contour energy two,
this is the contour energy one,
521
00:37:42,120 --> 00:37:45,650
and there's the point where
the energy is a minimum.
522
00:37:45,650 --> 00:37:50,280
Well, it might be
negative, whatever,
523
00:37:50,280 --> 00:37:51,960
but it's the smallest.
524
00:37:51,960 --> 00:37:52,810
OK.
525
00:37:52,810 --> 00:37:59,850
So what does -- can I compare
steepest descent with conjugate
526
00:37:59,850 --> 00:38:00,730
gradient?
527
00:38:00,730 --> 00:38:03,110
Let me draw one more contour,
just so I have enough
528
00:38:03,110 --> 00:38:06,880
to make this point.
529
00:38:06,880 --> 00:38:14,000
OK, so steepest descent starts
somewhere, OK, starts there,
530
00:38:14,000 --> 00:38:15,890
let's say.
531
00:38:15,890 --> 00:38:17,610
Suppose that's my
starting point.
532
00:38:17,610 --> 00:38:22,230
Steepest descent goes,
if I draw it right,
533
00:38:22,230 --> 00:38:25,640
it'll go perpendicular, right?
534
00:38:25,640 --> 00:38:30,350
It goes perpendicular
to the level contour.
535
00:38:30,350 --> 00:38:35,340
You carry along until, as long
as the contours are going down,
536
00:38:35,340 --> 00:38:40,510
and then if I continue here,
I'm climbing back up the valley.
537
00:38:40,510 --> 00:38:41,430
So I'll stop here.
538
00:38:41,430 --> 00:38:47,200
Let's suppose that that happened
to be tangent right there, OK.
539
00:38:47,200 --> 00:38:52,800
Now, that was the first
step of steepest descent.
540
00:38:52,800 --> 00:38:56,090
Now I'm at this point,
I look at this contour,
541
00:38:56,090 --> 00:39:00,130
and well, if I've drawn
it reasonably well,
542
00:39:00,130 --> 00:39:02,650
I'm following maybe
perpendicular,
543
00:39:02,650 --> 00:39:06,600
and now again I'm perpendicular,
steepest descent says
544
00:39:06,600 --> 00:39:11,650
go perpendicular to the contour,
until you've got to the bottom
545
00:39:11,650 --> 00:39:13,230
there.
546
00:39:13,230 --> 00:39:23,110
Then down, then up, then do
you see this frustrating crawl
547
00:39:23,110 --> 00:39:26,220
across the valley.
548
00:39:26,220 --> 00:39:29,380
It wouldn't happen if these
contours were circles.
549
00:39:29,380 --> 00:39:33,260
If these contours -- if A
was the identity matrix,
550
00:39:33,260 --> 00:39:37,000
if A was the identity matrix,
these would be perfect circles,
551
00:39:37,000 --> 00:39:39,450
and you'd go to the
center right away.
552
00:39:39,450 --> 00:39:44,800
Well that only says that A*x
equal b is easy to solve when A
553
00:39:44,800 --> 00:39:46,970
is the identity matrix.
554
00:39:46,970 --> 00:39:55,070
So this is steepest descent,
and its rate of convergence
555
00:39:55,070 --> 00:40:00,800
is slow, because the longer
and thinner the valley,
556
00:40:00,800 --> 00:40:06,260
the worse it is here, OK.
557
00:40:06,260 --> 00:40:08,280
What about conjugate gradients?
558
00:40:08,280 --> 00:40:18,800
Well, I'm just going
to draw magic --
559
00:40:18,800 --> 00:40:20,580
I'm going to make
it look like magic.
560
00:40:20,580 --> 00:40:25,640
I'm going to start at the
same place, the first step,
561
00:40:25,640 --> 00:40:29,090
because I'm starting with
zero, the best thing I can do
562
00:40:29,090 --> 00:40:33,180
is follow the
gradients to there,
563
00:40:33,180 --> 00:40:39,810
and now, so that was the
first step, and now what?
564
00:40:39,810 --> 00:40:46,230
Well, the next direction
-- so these are the search
565
00:40:46,230 --> 00:40:47,550
directions.
566
00:40:47,550 --> 00:40:51,790
That's the search direction
d_1, d_2, d_3, d_4.
567
00:40:51,790 --> 00:40:56,060
Here is d_1, and what
does d_2 look like?
568
00:40:56,060 --> 00:40:59,580
May I cheat and go
straight to the center?
569
00:40:59,580 --> 00:41:03,690
It's a little bit of a cheat,
I'll go near the center.
570
00:41:03,690 --> 00:41:09,280
The next direction is
much more like that
571
00:41:09,280 --> 00:41:12,480
and then goes to
the bottom, which
572
00:41:12,480 --> 00:41:16,250
would be somewhere like
there inside that level set,
573
00:41:16,250 --> 00:41:20,490
and then the next direction
would probably be very close.
574
00:41:20,490 --> 00:41:27,520
So this is not 90 degrees unless
you're in the A inner product.
575
00:41:27,520 --> 00:41:37,350
So this is a 90 degree angle
in the A inner product.
576
00:41:37,350 --> 00:41:40,040
And what does the
A inner product do?
577
00:41:40,040 --> 00:41:49,080
In the A inner product,
in which level sets
578
00:41:49,080 --> 00:41:58,040
are circles, or spheres or
whatever, in whatever dimension
579
00:41:58,040 --> 00:41:58,540
we want.
580
00:41:58,540 --> 00:42:02,630
Well, I'm not going
to be, I don't want
581
00:42:02,630 --> 00:42:06,100
to be held to everything here.
582
00:42:06,100 --> 00:42:11,700
If you see this
picture, then we've
583
00:42:11,700 --> 00:42:15,090
not only made headway with
the conjugate gradient
584
00:42:15,090 --> 00:42:19,150
method, which is a big deal
for solving linear systems,
585
00:42:19,150 --> 00:42:22,170
but also we've made headway with
the conjugate gradient method
586
00:42:22,170 --> 00:42:25,290
for minimizing functions.
587
00:42:25,290 --> 00:42:28,590
And if the function
wasn't quadratic,
588
00:42:28,590 --> 00:42:31,330
and our equations
weren't linear,
589
00:42:31,330 --> 00:42:34,880
the conjugate gradient idea
would still be available.
590
00:42:34,880 --> 00:42:39,760
Non-linear conjugate
gradients would somehow
591
00:42:39,760 --> 00:42:48,310
follow the same idea of
A-orthogonal search directions.
592
00:42:48,310 --> 00:42:51,750
So these are not the gradients,
they're the search directions
593
00:42:51,750 --> 00:42:53,120
here.
594
00:42:53,120 --> 00:42:55,940
OK.
595
00:42:55,940 --> 00:43:03,530
That's sort of my speech about
the conjugate gradient method.
596
00:43:03,530 --> 00:43:07,460
Without having checked all
the formulas in the cycle,
597
00:43:07,460 --> 00:43:12,020
we know how much work is
involved, we know what we get,
598
00:43:12,020 --> 00:43:15,950
and we see a bit
of the geometry.
599
00:43:15,950 --> 00:43:17,130
OK.
600
00:43:17,130 --> 00:43:21,590
And we see it as a
linear systems solver
601
00:43:21,590 --> 00:43:26,660
and also as an energy
minimizer, and those
602
00:43:26,660 --> 00:43:31,330
are both important ways to
think of conjugate gradients.
603
00:43:31,330 --> 00:43:32,660
OK.
604
00:43:32,660 --> 00:43:40,340
I'll just take a few final
minutes to say a word about
605
00:43:40,340 --> 00:43:44,730
other -- what if our problem
is not symmetric positive
606
00:43:44,730 --> 00:43:46,460
definite?
607
00:43:46,460 --> 00:43:50,220
What if we have an
un-symmetric problem,
608
00:43:50,220 --> 00:43:53,460
or a symmetric, but not
positive definite problem?
609
00:43:53,460 --> 00:44:01,870
Well, in that case, these
denominators could be zero.
610
00:44:01,870 --> 00:44:05,500
I mean, if we don't know that
A is positive definite, that
611
00:44:05,500 --> 00:44:08,040
could be a zero, it
could be anything,
612
00:44:08,040 --> 00:44:11,500
and that method is broken.
613
00:44:11,500 --> 00:44:18,030
So I want to suggest a safer,
but more expensive method now
614
00:44:18,030 --> 00:44:24,850
for -- so this would be
MINRES, minimum residual.
615
00:44:24,850 --> 00:44:29,230
So the idea is
minimize, choose x_k
616
00:44:29,230 --> 00:44:35,080
to minimize the norm of
r_k, that's the residual.
617
00:44:35,080 --> 00:44:44,230
So minimize the norm
of b minus A*x_k.
618
00:44:44,230 --> 00:44:47,380
So that's my choice of
x_k, and the question is,
619
00:44:47,380 --> 00:44:51,830
how quickly can I do it?
620
00:44:51,830 --> 00:44:55,400
So this is MINRES, OK.
621
00:44:55,400 --> 00:44:58,690
And there are other
related methods,
622
00:44:58,690 --> 00:45:00,430
there's a whole
family of methods.
623
00:45:00,430 --> 00:45:03,750
What do I want to
say about MINRES?
624
00:45:03,750 --> 00:45:08,430
Let me just find MINRES.
625
00:45:08,430 --> 00:45:10,440
Well, yeah.
626
00:45:14,650 --> 00:45:18,840
I guess, for this I
will start with Arnoldi.
627
00:45:28,980 --> 00:45:34,180
So for this different algorithm,
which comes up with a different
628
00:45:34,180 --> 00:45:37,090
x_k, I'm going to
-- and of course,
629
00:45:37,090 --> 00:45:43,600
the x_k is going to be
in the Krylov space K_k,
630
00:45:43,600 --> 00:45:46,870
and I'm going to
use this good basis.
631
00:45:46,870 --> 00:45:52,890
This gives me the
great basis of q's.
632
00:45:52,890 --> 00:45:56,710
So I turn my problem,
I convert the problem.
633
00:45:56,710 --> 00:46:10,790
I convert this minimization --
I look for x as a combination
634
00:46:10,790 --> 00:46:15,190
of the of the q's, right?
635
00:46:15,190 --> 00:46:18,980
Is that how you
see x equals Q*y?
636
00:46:18,980 --> 00:46:22,200
So instead of looking
for the vector x,
637
00:46:22,200 --> 00:46:24,100
I'm looking for
the vector y, which
638
00:46:24,100 --> 00:46:30,110
tells me what combination of
these columns, the q's, x is.
639
00:46:30,110 --> 00:46:34,760
In other words, I'm working
in the q system, the q basis.
640
00:46:34,760 --> 00:46:36,330
So I'm working with y.
641
00:46:36,330 --> 00:46:43,890
So I would, naturally, write
this as a minimum of b minus
642
00:46:43,890 --> 00:46:44,610
A*Q*y_k.
643
00:46:47,360 --> 00:46:49,120
Same problem.
644
00:46:49,120 --> 00:46:51,620
Same problem, I'm
just deciding, OK, I'm
645
00:46:51,620 --> 00:46:55,930
going to find this x as
a combination of the q's,
646
00:46:55,930 --> 00:46:59,200
because those q's
are orthogonal,
647
00:46:59,200 --> 00:47:02,810
and that will make this
calculation a lot nicer.
648
00:47:02,810 --> 00:47:04,660
OK, what's going to happen here?
649
00:47:04,660 --> 00:47:10,680
You know that I'm going to use
the property of the q's that
650
00:47:10,680 --> 00:47:15,730
A*Q is Q*H.
651
00:47:15,730 --> 00:47:19,970
That was the key point
that Arnoldi achieved.
652
00:47:19,970 --> 00:47:24,070
And the net result is
going to be, of course,
653
00:47:24,070 --> 00:47:28,400
I have to do this at level
k, so I'm chopping it off,
654
00:47:28,400 --> 00:47:31,380
and I have to
patiently, and the notes
655
00:47:31,380 --> 00:47:33,940
do that, patiently
see, OK, what's
656
00:47:33,940 --> 00:47:38,580
happening when we chop it
off, chop off these matrices.
657
00:47:38,580 --> 00:47:41,340
We have to watch where
the non-zeros appear.
658
00:47:41,340 --> 00:47:45,930
In the end, I get to a
minimization problem for y,
659
00:47:45,930 --> 00:47:52,310
so we get to a minimum
problem for the matrix
660
00:47:52,310 --> 00:48:01,920
H that leads me to to
the y that I'm after.
661
00:48:01,920 --> 00:48:05,810
So it's a minimum problem
for a piece, sorry,
662
00:48:05,810 --> 00:48:15,400
I should really say H_k, some
piece of it, like this piece.
663
00:48:15,400 --> 00:48:22,770
In fact, exactly that piece.
664
00:48:22,770 --> 00:48:27,960
So if I ask -- all right,
suppose I have a least squares
665
00:48:27,960 --> 00:48:29,520
problem.
666
00:48:29,520 --> 00:48:34,200
This is least squares because
my norm is just the square,
667
00:48:34,200 --> 00:48:36,660
the norm squared.
668
00:48:36,660 --> 00:48:39,420
I'm just working with l^2 norm.
669
00:48:39,420 --> 00:48:41,740
And here's my matrix.
670
00:48:45,820 --> 00:48:52,210
It's 3 by 2, so I'm looking
at least squares problems,
671
00:48:52,210 --> 00:48:57,900
in that -- matrices
of that type,
672
00:48:57,900 --> 00:49:00,650
and they're easy to solve.
673
00:49:00,650 --> 00:49:04,540
The notes will give the
algorithm that solves the least
674
00:49:04,540 --> 00:49:05,540
squares problem.
675
00:49:05,540 --> 00:49:09,410
I guess what I'm interested in
here is just is it expensive
676
00:49:09,410 --> 00:49:10,780
or not?
677
00:49:10,780 --> 00:49:15,910
Well, the least squares
problem will not be expensive;
678
00:49:15,910 --> 00:49:18,480
it's only got one extra
row, compared to the number
679
00:49:18,480 --> 00:49:20,850
of columns, that's not bad.
680
00:49:20,850 --> 00:49:27,690
What's expensive is the fact
that our H is not tri-diagonal
681
00:49:27,690 --> 00:49:31,510
anymore, these are
not zeros anymore.
682
00:49:31,510 --> 00:49:37,050
I have to compute all
these H's, so the hard work
683
00:49:37,050 --> 00:49:40,330
is back in Arnoldi.
684
00:49:40,330 --> 00:49:46,650
So it's Arnoldi
leading to MINRES,
685
00:49:46,650 --> 00:49:50,640
and we have serious
work already in Arnoldi.
686
00:49:50,640 --> 00:49:54,170
We don't have short recurrences,
we've got all these h's.
687
00:49:54,170 --> 00:49:56,630
OK, enough.
688
00:49:56,630 --> 00:50:08,200
Because ARPACK would be a good
code to call to do MINRES.
689
00:50:08,200 --> 00:50:15,140
OK that's my shot
at the key Krylov
690
00:50:15,140 --> 00:50:17,440
algorithms of numerical
linear algebra,
691
00:50:17,440 --> 00:50:24,270
and you see that they're
more subtle than Jacobi.
692
00:50:24,270 --> 00:50:26,530
Jacobi and
Gauss-Seidel were just
693
00:50:26,530 --> 00:50:29,060
straight simple
iterations; these
694
00:50:29,060 --> 00:50:35,500
have recurrence that
produces orthogonality,
695
00:50:35,500 --> 00:50:39,510
and you pay very little,
and you get so much.
696
00:50:39,510 --> 00:50:45,570
OK, so that's Krylov methods,
and that section of notes
697
00:50:45,570 --> 00:50:51,640
is up on the web, and of
course, at some point,
698
00:50:51,640 --> 00:50:58,330
numerical comparisons of how
fast is conjugate gradients,
699
00:50:58,330 --> 00:51:02,970
and is it faster than
direct elimination?
700
00:51:02,970 --> 00:51:06,000
You know, because that's a
big choice you have to make.
701
00:51:06,000 --> 00:51:09,820
Suppose you do have a symmetric
positive definite matrix.
702
00:51:09,820 --> 00:51:17,150
Do you do Gauss elimination with
re-ordering, minimum degree,
703
00:51:17,150 --> 00:51:20,440
or do you make the
completely different choice
704
00:51:20,440 --> 00:51:24,600
of iterative methods
like conjugate gradients?
705
00:51:24,600 --> 00:51:29,210
And that's a decision you
have to make when you're
706
00:51:29,210 --> 00:51:31,540
faced with a large
matrix problem,
707
00:51:31,540 --> 00:51:39,510
and only experiments can compare
two such widely different
708
00:51:39,510 --> 00:51:41,290
directions to take.
709
00:51:41,290 --> 00:51:50,890
So I like to end up by
emphasizing that there's still
710
00:51:50,890 --> 00:51:53,640
a big question.
711
00:51:53,640 --> 00:51:58,010
Do this or do minimum degree?
712
00:51:58,010 --> 00:52:04,860
And only some experiments
will show which is the better.
713
00:52:04,860 --> 00:52:08,780
OK, so that's good for
today, and next time
714
00:52:08,780 --> 00:52:13,620
will be direct
solution, using the FFT.
715
00:52:13,620 --> 00:52:16,310
Good, thanks.