should row_starts be [0, 1, 2, ...] here?
+1 on the question about row_starts - looks like it should be [0, 1, 2, 3, ...]?
I think row_starts operates on a flattened version of values. So in this case, we have
values = [3, 1, 2, 4, ...]
cols = [0, 2, 1, 2, ...]
row_starts = [0, 2, 3, 4, ...]
so that each integer in row_starts corresponds to the index in values (and cols) at which a new row begins. Would appreciate clarification here though!
why is row_starts needed when we already have cols ? we could infer row_starts from the first value of each array in cols, right?
@woo how would we accomplish that? As far as I understand, no row information is stored in values or cols.
if values is a seq[seq[]], then for each i in values[i], row = i? I think we need a separate rows_start flags mainly because of the below: what happens if the entire row is zero? Say row 1 is all zeros. Row 1, then, has to be skipped as it wont be stored in a sparse matrix format. Then the 2nd sequence within the values sequence would be for row 2 and not row 1. In this case row != i. I think that's why we need a rows_start flag otherwise we wouldnt need it.
+1 on the question above - Kayvon seemed to have said that a row_start index indicates the index for that row's values within the values array - I could see how this might be a flattened version of values as suggested above, but doesn't that defeat the purpose of organizing the structure in this way? Clarification appreciated :D
@juliob the main benefit of storing a sparse matrix in this way is a considerable reduction in the total amount of space needed (because we're doing away with storing lots of useless zeros).
This is a very similar idea to how you can represent graphs in two ways via an adjacency list or an adjacency matrix, where the list is much more space-efficient for sparse graphs but lookups and operations can be less runtime-efficient.
I think that this case would require a lot of preprocessing to ensure that work is completely balanced. For example, one would have to find the count of all nonzero entries in the matrix, then perform balancing based on this count, and then perform the calculations accordingly.
Row_starts - indicates the index in values array, where a new row starts. Cols - indicates for each row, in which column of the matrix the value from values array should go in. This method of representation is useful because it is not worthy to allocate a lot of memory to just store 0s in the sparse matrix.
Please log in to leave a comment.
Here -- values should be flattened into one array (like data array on previous slide) right? Otherwise, row starts says the second row begins at index 2 in the values array, which is [4], but that's incorrect?