Rohan Verma

Learning Machine Learning and other programming stuff

View on GitHub

Linear Algebra

Linear algebra is a branch of mathematics, but the truth of it is that linear algebra is the mathematics of data. Matrices and vectors are the language of data. Linear algebra is about linear combinations. That is, using arithmetic on columns of numbers called vectors and arrays of numbers called matrices, to create new columns and arrays of numbers. Linear algebra is the study of lines and planes, vector spaces and mappings that are required for linear transforms.

Matrices

In mathematics, a matrix (plural matrices) is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns. In simpler terms these are 2-dimensional arrays. Here are 2 examples of a matrix A and B :

A
1 2 3
4 5 6
7 8 9
B
1 5
2 6
3 7
4 8

Dimension of a Matrix

The size of a matrix is defined by the number of rows and columns that it contains. A matrix with m rows and n columns is called an m × n matrix, or m-by-n matrix, while m and n are called its dimensions. For example, the matrix A above is a 3 × 3 matrix and B is a 4 × 2 matrix.

Vectors

Vectors are special matrix with 1 column and many rows. So, vectors are subsets of matrices and the dimension is always n × 1 .

y
1
4
7

So the dimension of the above vector is 3 × 1 .

Some Notations

  1. Aij = It refers to the element in the ith row and jth column of matrix A. So A12 = 2.
  2. A vector with ‘n’ rows is referred as n-dimensional vector.
  3. yk = It refers to the element at kth row of vector.
  4. In general, all matrix and vector are 1-indexed.
  5. Generally, matrices are denoted by uppercase letter and vector as lowercase letter.
  6. Scalar means object is a single value, not a vector or matrix.

Addition and Scalar Multiplication

Addition and subtraction are element-wise, So we simply add or subtract each corresponding element. For eg :

A
a b
c d

B
w x
y z

A + B
a+w b+x
c+y d+z

A - B
a-w b-x
c-y d-z

To add or substract two matrices their dimensions must be the same.
In scalar multiplication we simply multiply every element by scalar value.

A x g
a x g b x g
c x g d x g

In scalar division we simply divide every element by scalar value.

A / g
a / g b / g
c / g d / g

Eg : Let matrix A be :

1 0
2 5
3 1

So now 3 x A will be

1x3 0x3 = 3 0
2x3 5x3 = 6 15
3x3 1x3 = 9 3

Matrix Vector Multiplication

We map the column of the vector onto each row of the matrix, multiplying each element and summing the result. Suppose if we want to multiple Matrix (A) with vector (x) to generate vector (y). To get yi, multiply A’s ith row with elements of vector x and add them up. So syntax will be :

A
a b
c d
e f
x
x
y
A * x = y
a b
c d * x
e f * y
y
ax + by
cx + dy
ex + fy

The result is a vector. Here we need to keep in mind that number of columns of matrix must be equal to number of rows of vector. If this condition is not satisfied, matrix vector multiplication cannot happen.

[m x n] x [n x 1] = [m x 1]

An example :

A
1 3
4 0
2 1
x
1
5
A * x = y
1x1 + 3x5 = 16
4x1 + 0x5 = 4
2x1 + 1x5 = 7

Now we can use this matrix - vector multiplication in our hypothesis function. If you remember the house price prediction problem.

h(x) = θ0 + θ1x

Let’s assume we got some value of θ0 and θ1. So our hypothesis function will look like :

h(x) = -40 + 0.25x

Now if we start making prediction for the below houses :

2104, 1416, 1534, 852

We will have to write a for loop to compute the hypothesis function for each value of our prediction. We can do it in a simpler way if our programming language supports matrix and vector multiplication. Let’s assume matrix A is :

A
1 2104
1 1416
1 1534
1 852

We can make coeficient of hypothesis function as a vector x :

x
-40
0.25

Now if we multiply :

-40 + 0.25x2104
-40 + 0.25x1416
-40 + 0.25x1534
-40 + 0.25x852

So this concept of multiplication we can get our prediction very easily.

Matrix Matrix Multiplication

We multiply two matrices by breaking it into several vector multiplications and concatenating the result. Suppose if we want to multiple Matrix (A) with Matrix (B) to generate Matrix (C). To get yi, multiply A’s ith row with ith columns of Matrix B and add them up. So syntax will be :

A
a b
c d
e f
B
w x
y z
A * B = C
a b
c d * w x
e f * y z
y
aw + bx ax + bz
cw + dy cx + dz
ex + fy ex + fz

The result is a Matrix. Here we need to keep in mind that number of columns of matrix must be equal to number of rows of second matrix. If this condition is not satisfied, matrix vector multiplication cannot happen.

[m x n] x [n x o] = [m x o]

An example :

A
1 3
2 5
B
0 1
3 2
A * B = C
1x0 + 3x3 1x1 + 3x2 = 9 7
2x0 + 5x3 2x1 + 5x2 = 15 12

Now we can use this matrix - matrix multiplication in our multiple hypothesis function. If you remember the house price prediction problem.

h(x) = θ0 + θ1x

Let’s assume we got some value of θ0 and θ1. So our hypothesis function’s will look like :

h(x) = -40 + 0.25x h(x) = 200 + 0.1x h(x) = -150 + 0.4x

Now if we start making prediction for the below houses :

2104, 1416, 1534, 852

We will have to write two for loops to compute the hypothesis function for each value of our prediction. We can do it in a simpler way if our programming language supports matrix and matrix multiplication. Let’s assume matrix A is :

A
1 2104
1 1416
1 1534
1 852

We can make coeficient of hypothesis function’s as a matrix B :

B
-40 200 -150
0.25 0.1 0.4

Now if we multiply :

-40 + 0.25x2104 200 + 0.1x2104 -150 + 0.4x2104
-40 + 0.25x1416 200 + 0.1x1416 -150 + 0.4x1416
-40 + 0.25x1534 200 + 0.1x1534 -150 + 0.4x1534
-40 + 0.25x852 200 + 0.1x852 -150 + 0.4x852

So this concept of multiplication we can get our prediction’s for different hypothesis function very easily.

Some Properties

  1. In general, the matrix matrix multiplication is not cumulative. That is A X B is not equal to B X A. They might not be of even same dimensions.
  2. The matrix matrix multiplication is assosiative. That is A X (B X C) = (A X B) X C.

Identity Matrix

It is an square matrix (no of rows is equal to no of columns). It has another property that it has 1’s in the diagonal places and 0’s everywhere else. Example A :

1 0 0
0 1 0
0 0 1

Now if we multiply any matrix by an identity matrix the resulting matrix is always equal to the original matrix. Moreover it is even cumulative if we multiply an identity and normal matrix. Example :

A
1 3
2 5
I
1 0
0 1
A x I
1 3 X 1 0
2 5 X 0 1
Ans
1 3
2 5

So as you can see I x A = A x I = A

Matrix Inverse

If A is m x m matrix and if it has an inverse then :

A x (A-1) = (A-1) x A = I

So what basically means is that when you multiply your matrix with a matrix that produces an identity matrix then the given matrix has an inverse and the B matrix is actually inverse of matrix A. Some pointers about inverse of matrix :

  1. Only square matrix has an inverse.
  2. Not all square matrix has an inverse
  3. Matrix that dont have an inverse are called singular or denerative matrix.

Matrix Transpose

Let A be m x n matrix, then B is called transpose of A if B = AT. Then B is an n x m matrix where Bij = Aji. i.e. rows gets transformed to columns and columns get transformed to rows. Example :

A
0 3
1 4
A transpose
0 1
3 4