Chủ Nhật, 10 tháng 1, 2016

The Imposing INDEX

From my perspective, the Excel INDEX function is the single most important in the roster of Microsoft Excel functions.
Now that might be surprising considering the function's humdrum name, but please pay close attention, because INDEX is one of the magical secrets of how to use Excel! So what's so great about the INDEX function? It's nonvolatile, sprightly, agile, and versatile. Excel INDEX can return one value or an array of values; it can return a reference to one cell or to a range of cells. INDEX works well on either side of the three Reference Operators - the colon, the space, and the comma.

index_function_parameters_excelhero.png
 
The idea with INDEX is that you give it a range (or an array) and then specify an element to return. 
So,
=INDEX(A1:A5,2)
...returns a reference to cell A2.
And,
=INDEX({95,96,97,98,99},5)
...returns the value of 99.


Note:  If you use an international version of Excel, some of the array constants may not work as presented.  Please see the instructions at the end of this article.


Nothing extraordinary thus far. But it is crucial to understand and so I'll reiterate that the first example above returns a REFERENCE, while the second returns a VALUE. 
Also, note that both of these examples demonstrate how the Excel INDEX function behaves when passed a vector. A vector is a one-dimensional array or range. When passed a vector, INDEX does not care whether that vector is vertical or horizontal. The second parameter of INDEX identifies itself as row number. But this is not correct, when a vector is passed in. Instead of being the row number, the second input becomes the element number for a vector - and as such, horizontal vectors work just fine with this notation. In the second example above, five is not the row number, it is the element number!

But we can force INDEX to behave in its normal two-dimensional fashion:

=INDEX({95,96,97,98,99},1,5)

...also returns the value of 99.

And it is with this two-dimensional behavior that INDEX begins to really shine. Obviously,
=INDEX(A1:C5,1,3)
...returns a reference to C1. But what is surprising to most users is that both of the following do as well:
{=INDEX(A1:C5,0,3)}
{=INDEX(A1:C5,,3)}


Return Whole Rows or Columns
In fact, until the user understands what is happening here, they might think that the function has a bug because the overriding theme with INDEX is that unlike OFFSET for example, it must return an element from within the range or array passed to it - and row zero (or column zero) is outside. So what does ZERO or a null value mean in this setting?
If the row_num parameter is zero or missing, this directs INDEX to return the entire column specified by the column_num parameter!
{=INDEX(A1:C5,,2)}
...returns a reference to the range, B1:B5.
And the converse is also true - specifying a zero or missing column_num will return an entire row. The catch here is that the missing column_num MUST include the comma, like so:
{=INDEX(A1:C5,2,)}
...which returns a reference to the range, A2:C2.

But remember that when a particular cell is given a reference to a multi-cell range or that cell is set equal to an array, just the first element shows in the output cell - so most of the row or column returned by the two above INDEX formulas will be hidden. But you can array-enter one of these Microsoft Excel formulas over a range to see the entire output. You do this by selecting the output range first, typing the formula in the formula bar, and entering it by Control-Shift-Enter.
So yes the technique works with returning entire rows or columns of a two-dimensional array as well:
=SUM(INDEX({1,2,3,4,5;6,7,8,9,10;11,12,13,14,15},3,))
...results in 65, which is the summation of the entire third row of the input array.

Lookups
This ability to return entire columns or rows from a larger range or array is extraordinarily useful.  
Suppose we have a table or list of metrics for countries in the range of A1:M200. We can create a named formula that refers to this range - let's call it simply, d, for data. Then we can name individual columns of the data painlessly. If the country names are in the first column, we can create a named formula:
Country:   =INDEX(d,,1)
And if population is in the 4th column, we can create a named formula:
Population:   =INDEX(d,,4)
Now suppose we wanted to lookup the population of Scotland, all we need to do is:
=INDEX(Population,MATCH("Scotland",Country,0))
The Excel MATCH function simply produces a row number for INDEX.
This method of lookup is on par in terms of speed as doing a VLOOKUP, if you are doing just one lookup. But it has advantages over VLOOKUP. It is not restricted to looking up a column to the right like VLOOKUP is. Excel INDEX MATCH is significantly quicker than VLOOKUP if we are doing lookups for a list of countries, and we array-enter the one formula over the entire output column.
If we need to return multiple columns of metrics for a list of countries, the speed benefit is even greater. Suppose that GDP is the 2nd column of the data and that Capital was the 11th column.
If on a new worksheet, we wanted to output three columns of data for each country, Population, GDP, and Capital, the most efficient way to do so is to dedicate one column to create a common index, and then array-enter the INDEX formula over the entire three columns of output.
In this scenario, column B would be the list of countries. Column A would be dedicated to the common index. In A2, we would enter:
=MATCH(B2,Country,0)
...and then copy this formula down as far as the countries are listed in column B.
And then in columns, C, D, and E we would array-enter over the entire range that extends as far down as the countries:
{=INDEX(d,A2:A100,{4,2,11})}
To be clear, this means that if we had 99 countries in column B, we would select C2:E100, and then in the formula bar we would type the above formula, and then enter it by Control-Shift-Enter.
The end result is that the one formula is executed just once (in array fashion) and is brutally fast leaving a very light, non-volatile footprint in your model.

Sums
This usage of INDEX is a great way to power SUMPROPDUCT as well. For example:
=SUMPRODUCT( (Left(Country,1)="U")*Population )
...which returns the total population for all countries that start with the letter, U.
And while named formulas are self-documenting, this works just as well:
=SUMPRODUCT( (Left(INDEX(d,,1),1)="U")*INDEX(d,,4) )

The Dynamic Range
Building on this idea, we can alter the named formula, d, so that it results in a dynamic range instead of a fixed range. And here INDEX reigns supreme. 
A Dynamic range is often constructed with OFFSET or INDIRECT. Unfortunately, both of these Microsoft Excel functions are volatile, which simply means that all formulas that include these functions will recalculate every single time anything on the worksheet changes. For example, adding a value to any blank cell will cause all volatile formulas (and any other formulas that depend on them) to recalculate. Depending on your model design and size, this can have a major impact on the fluidity and responsiveness of the model.
INDEX can be used to create a dynamic range, and not only is it nonvolatile, it is way faster than either OFFSET or INDIRECT. In fact, the improvement in performance is so great that INDEX should be the foundation of all dynamic ranges in professional models.
The approach to creating the dynamic range is different than the approach used for OFFSET or INDIRECT. For those functions, the range is created within the function. With INDEX on the other hand, the dynamic range is produced by using INDEX on one side (or sometimes both) of the Range Operator, which is the colon.
For example, consider this normal looking range reference:
=A2:A100
By itself, this reference could be a hard reference to the country names in our list of countries. Assuming no blanks in our list and that there is a column header in A1, perhaps "Country", the following formula would return the name of the last country in the list:
=INDEX(A:A,COUNTA(A:A))
But the INDEX function here is really returning the reference to the last cell in column A with a country name, and then the equal sign forces the value of that cell to be returned. In the scenario that we have painted, this INDEX formula is really returning the reference, A100.
So the following two formulas point to the exact same range:
=A2:A100
=A2:INDEX(A:A,COUNTA(A:A))
But there is a significant difference. The first is a hard coded, static reference. The second is a nonvolatile, dynamic range that will expand or contract as the number of countries in the list changes. Please note that before we replace the Refers To value of our named formula, Country, with this INDEX based dynamic range, we need to make the references absolute. It would then look like this:
Country:   =$A$2:INDEX($A:$A,COUNTA($A:$A))
We can use the same techniques to create a two-dimensional dynamic range so that the number of columns is also dynamic:
d:   =$A$2:INDEX($1:$65535,COUNTA($A:$A),COUNTA($1:$1))
And with d defined in this dynamic manner, we can still do all of the wonderful row and column referencing from above, such as:
{=INDEX(d,$A2,{4,2,11})}

As a footnote to this section on dynamic ranges, I want to point out that when you use a function on either side (or both sides) of any of the three Reference Operators, the resulting formula always recalculates when the workbook is opened. So while INDEX is nonvolatile, it becomes what I term quasi-volatile when used for dynamic ranges - but this is orders of magnitude better than volatile, and so it remains the best dynamic range foundation by far.
And as a post script to this section on the dynamic range, you may very well ask should I not just use the new Structured Table References available since Excel 2007? While STR is a robust option, it is heavy. If you need speed, nothing beats the alacrity of INDEX.


Noncontiguous Areas
When working with ranges, INDEX offers a fourth parameter to select the AREA to work with from a noncontiguous input range. The areas are referenced by integer in the order that they appear in the input range. For example,
=INDEX((data1,data2,data3),,,2)
...returns a reference to data2. And
{=INDEX((data1,data2,data3),,1,3)}
...returns the entire first column of data3. With a little ingenuity, this can be extraordinarily useful for charting... and many other activities. And you should note that the number of areas specified has no hard limit, and those noncontiguous areas need not be of the same size!

More
All of this just scratches the surface of what is possible with this essential member of the Microsoft Excel functions. It can be used in countless situations, even to compare Excel worksheets, etc.
Some of the synonyms for the word, "imposing", in my thesaurus are impressive, august, commanding, effective, exciting, magnificent, and mind-blowing. The INDEX worksheet function is truly imposing!


International Versions of Excel
Throughout this article, I have used array constants within formulas.  This is a powerful technique.  In the English Language version of Excel, the symbol used to separate the columns in an array constant is the comma.  The symbol used to separate rows is the semicolon.
Other language versions of Excel use different symbols for column and row separation.  However it is easy to discover what symbols are used in your version of Excel.
Open the VBEditor (ALT-F11), and place the cursor in the Immediate Pane (CTRL-G).
Enter the following two lines in the the Immediate Pane (it's OK to use the clipboard):

?Application.International(xlColumnSeparator)
?Application.International(xlRowSeparator)

Click on each in turn and press ENTER on the keyboard.
Uses the printed characters to separate columns and rows in array constants.

Không có nhận xét nào:

Đăng nhận xét