Fact 1. An elementary row operation on a matrix A produces a row equivalent matrix B.
Here is why: This is clear if we simply permute the rows, or multiply a row by a nonzero scalar. If we replace a row R by R+cS where S is a different row and c is a scalar then simply note that R = (R + cS) + (-c)S.
Fact 2. If A, B, and C are three matrices of the same size such that A is row equivalent to B and B to C, then A is row equivalent to C.
Here is why: Given a row of C, write it as a linear combination of rows of B: then replace each row of B by a formula for it as a linear combination of rows of A. Similarly, every row of A can be written as a linear combination of rows of C.
Fact 3. If B is obtained from the matrix A by a sequence of elementary row operations, then they are row equivalent.
Here is why: Apply Facts 1 and 2 repeatedly.
Fact 4. If A is in reduced row echelon form (RREF) and a vector V can be written as a linear combination of the nonzero rows of A, this can be done in only one way. In fact, the coefficient of the k th nonzero row R_k of A must be the same as the entry that V has in the j_k spot, where j_k is the number of the column in which R_k has its leading 1.
Here is why: All of the rows of A except R_k have a 0 in the j_k spot, while R_k has a 1. Thus, if c_k is the coefficient of R_k in the linear combination, one calculates that the entry of V in the j_k spot is c_k(1) plus a sum of other terms all of which are 0.
Fact 5. If two matrices are row equivalent then they remain row equivalent when their last h columns are omitted.
Here is why: Call the original matrices A, B and the truncated forms A', B'. To write a row of A' as a linear combination of rows of B' simply use exactly the same coefficients that you would for writing the corresponding row of A as a linear combination of rows of B. One can write the rows of B' as linear combinations of the rows of A' for the same reason.
Fact 6. If a matrix is in RREF it remains so when one omits one or more of its consecutive columns on the right. This is also true if one omits one or more of its consecutive rows on the bottom (or any set of row, but we do not need to use the more general fact).
Here is why: This is immediate from the definition of RREF.
Fact 7. If two matrices of the same size in RREF are row equivalent, then they are equal. Hence, there is only one matrix in RREF that is row equivalent to a given matrix, and so only one matrix in RREF that can be obtained from it by a sequence of elementary row operations.
Here is why: We need to go through several steps. Call the matrices A and B.
Step 1. Assuming that the result is false, choose a counterexample in which the matrices have as few columns as possible. If either matrix has all 0 entries, the other must be have all 0 entries as well, and we are done. Thus, we may assume that both matrices have nonzero entries, and so there are some leading 1's.
Step 2. Let s denote the number of the rightmost column in which A has a leading 1, and let t be the number of the rightmost column in which B has a leading 1. We shall show that s = t. The situation is symmetric. Suppose, say, that s < t and let R be the row of B which has a leading 1 in the t th column. Then R can be expressed as a linear combination of rows of A. But the entries of R in the columns where leading 1's of A occur are all 0, and so by Fact 4 the coefficients used must all be 0. This is a contradiction. One gets a contradiction similarly if t < s.
Step 3. By Step 2, all leading 1's in both matrices occur at or before the s = t column. Omit that column and all the ones after it from both matrices. This is no longer a counterexample, by Step 1. By Fact 5 the matrices are still row equivalent and by Fact 6 they are still both in RREF. Thus, they must be identical, or we would have a counterexample with fewer columns. It follows that the original matrices have the same numbers of leading 1's (exactly one more than for the truncated matrices) and that these occur in the same columns. They therefore have the same number of nonzero rows, which occur consecutively at the top.
Step 4. Consider the k th nonzero row S_k of B. This is a linear combination of the nonzero rows R_i of A. By Fact 4, the coefficient used on R_i is the entry of S_k in the column where the leading 1 of R_i occurs, which is the same as the column in which the leading 1 of S_i appears. This will be 0 if i is different from k, and 1 if i = k. Thus, S_k = R_k, and the argument is complete.