Thứ Ba, 23 tháng 2, 2016

VLOOKUP In VBA – With Examples

In my earlier post, I had written about VLookUp in Excel. It was a massive post of around 2500 words, it explains most of the things about vertical look up function in excel. Today’s post is an extension to that post and here we will understand how to apply a VLookUp in VBA.
If you haven’t read that post then I would strongly recommend you to read that post before going any further. [Read Here]
Assuming that you have basic knowledge of VLOOKUP function we will move further.
Note: To perform these programs yourself, you may need to enable macros in excel. Read this post to know how to do this.

Syntax of VBA VLOOKUP:

You can use VLookUp in macros by following any of the below ways:
1. Application.VLOOKUP(lookup_value, table_array, column_index, range_lookup)
Or
2. Application.WorksheetFunction.VLOOKUP(lookup_value, table_array, column_index, range_lookup)
Note: If you are searching for something similar to VLookUp function for Access then probably you should use DLOOKUP.

5 Examples of Using VLOOKUP in VBA:

Now let’s move to some practical examples of using VLookUp in VBA codes.
Example 1:Using VLookUp find the monthly salary of “Justin Jones” from the below table. Display the salary using a dialog box.
VLOOKUP in VBA Example 1
Below is the code for this:
  1. Sub FINDSAL()  
  2. Dim E_name As String  
  3. E_name = "Justin Jones"  
  4. Sal = Application.WorksheetFunction.VLookup(E_name, Sheet1.Range("B3:D13"), 3, False)  
  5. MsgBox "Salary is : $ " & Sal  
  6. End Sub  
Example 1 Message Box
Explanation: In this code we have used a variable ‘E_name’ to store the employee name whose salary is to be fetched. After this we have simply supplied the employee name and other required arguments to the VLOOKUP and it returns the salary of the corresponding Employee.
Example 2: Now make the above program a little customisable by accepting the Employee name from user. If user enters any Employee name that is not present in the table then the program should be able to convey this clearly to the user.
To accomplish this we can use the below code:
  1. Sub FINDSAL()  
  2. On Error GoTo MyErrorHandler:  
  3. Dim E_name As String  
  4. E_name = InputBox("Enter the Employee Name :")  
  5. If Len(E_name) > 0 Then  
  6.   Sal = Application.WorksheetFunction.VLookup(E_name, Sheet1.Range("B3:D13"), 3, False)  
  7.   MsgBox "Salary is : $ " & Sal  
  8. Else  
  9.   MsgBox ("You entered an invalid value")  
  10. End If  
  11. Exit Sub  
  12. MyErrorHandler:  
  13. If Err.Number = 1004 Then  
  14.   MsgBox "Employee Not Present in the table."  
  15. End If  
  16. End Sub  
Explanation: In this code we are accepting the user input using an InputBox function. If the Employee name entered by the user is found, then VLookUp returns its corresponding salary. However if the employee name is not present in the table then VLOOKUP throws a “1004 Error”.
And, we have created an error handler to catch such cases for conveying the user that entered employee name doesn’t exist.
Example 3:In this example we will try to write a code that adds the Department field from the Employee Table 1 to our old Employee Table.
VLookup Employee Tables Example 3
As you can see that in both these tables there is only one common column i.e. Employee_ID. So, in this case we will have to apply the VLookUp based on the Employee ID.
Below is the code to do this:
  1. Sub ADDCLM()  
  2. On Error Resume Next  
  3. Dim Dept_Row As Long  
  4. Dim Dept_Clm As Long  
  5. Table1 = Sheet1.Range("A3:A13"' Employee_ID Column from Employee table  
  6. Table2 = Sheet1.Range("H3:I13"' Range of Employee Table 1  
  7. Dept_Row = Sheet1.Range("E3").Row ' Change E3 with the cell from where you need to start populating the Department  
  8. Dept_Clm = Sheet1.Range("E3").Column  
  9. For Each cl In Table1  
  10.   Sheet1.Cells(Dept_Row, Dept_Clm) = Application.WorksheetFunction.VLookup(cl, Table2, 2, False)  
  11.   Dept_Row = Dept_Row + 1  
  12. Next cl  
  13. MsgBox "Done"  
  14. End Sub  
VBA VLOOKUP Example 4
Explanation: This code takes each ‘lookup_value’ from the Employee ID field (one at a time), looks up its corresponding Department and then populates the corresponding department value at appropriate place.
Please note that in this code we have just pasted the result of VLookUp formula, and not the VLookUp formula itself (Refer Example 5).
Example 4:In this example we will try to write a code that displays all the details of an Employee from the Employee table (as shown below) when its Employee ID is entered.
VLookup For Example 4
Below is the code that can accomplish this:
  1. Sub FETCH_EMP_DETAILS()  
  2. On Error GoTo MyErrorHandler:  
  3. Dim E_id As Long  
  4. E_id = InputBox("Enter the Employee ID :")  
  5. Det = "Employee ID : " & Application.WorksheetFunction.VLookup(E_id, Sheet1.Range("A3:E13"), 1, False)  
  6. Det = Det & vbNewLine & "Employee Name : " & Application.WorksheetFunction.VLookup(E_id, Sheet1.Range("A3:E13"), 2, False)  
  7. Det = Det & vbNewLine & "Employee SSN : " & Application.WorksheetFunction.VLookup(E_id, Sheet1.Range("A3:E13"), 3, False)  
  8. Det = Det & vbNewLine & "Monthly Salary : " & Application.WorksheetFunction.VLookup(E_id, Sheet1.Range("A3:E13"), 4, False)  
  9. Det = Det & vbNewLine & "Department : " & Application.WorksheetFunction.VLookup(E_id, Sheet1.Range("A3:E13"), 5, False)  
  10. MsgBox "Employee Details : " & vbNewLine & Det  
  11. Exit Sub  
  12. MyErrorHandler:  
  13. If Err.Number = 1004 Then  
  14.   MsgBox "Employee Not Present in the table."  
  15. ElseIf Err.Number = 13 Then  
  16.   MsgBox "You have entered an invalid value."  
  17. End If  
  18. End Sub  
Message Box Output Example 4
Explanation:In this example we have asked user to enter the Employee Id and then we have used multiple VLookUp Statements and concatenated their outputs to show all the details in a single message box.
Example 5:Redo the example 3 but this time paste the whole VLookUp formula instead of pasting only the result.
VLookup Employee Tables Example 3
Below is the code for doing this:
  1. Sub ADDCLM()  
  2. On Error Resume Next  
  3. Dim Dept_Row As Long  
  4. Dim Dept_Clm As Long  
  5. ctr = 0  
  6. Table1 = Sheet1.Range("A3:A13"' Employee_ID Column from Employee table  
  7. Table2 = Sheet1.Range("H3:I13"' Range of Employee Table 1  
  8. Dept_Row = Sheet1.Range("E3").Row ' Change E3 with the cell from where you need to start populating the Department  
  9. Dept_Clm = Sheet1.Range("E3").Column  
  10. For Each cl In Table1  
  11.   Sheet1.Cells(Dept_Row, Dept_Clm).FormulaR1C1 = "=VLOOKUP(RC[-4], R3C8:R13C9, 2, False)"  
  12.   Dept_Row = Dept_Row + 1  
  13.   ctr = ctr + 1  
  14. Next cl  
  15. MsgBox "Done"  
  16. End Sub  
VLOOKUP in VBA paste formula
Explanation:This code is very similar with the one that we have discussed in Example 3, the only difference between these formulas is that here we are copying the VLookUp formula directly in the cells.
In this code we have applied the VLOOKUP in R1C1 form. So, the formula =VLOOKUP(RC[-4], R3C8:R13C9, 2, False) means =VLOOKUP(<4 cells to the left of current cell>, <Range of Employee Table 1>, <column to be fetched>, <exact match>).
One thing that is worth to note here is : the square brackets ( [ ] ) in your R1C1 formula indicate that you are specifying a relative range. If you want to specify an absolute range, you need to specify the R1C1 cells without brackets; e.g. R3C8:R13C9.
So, this was all about VBA VLookUp.

Chủ Nhật, 10 tháng 1, 2016

Introduction To The DATEDIF Function

The DATEDIF function computes the difference between two dates in a variety of different intervals, such as the number of years, months, or days between the dates. This function is available in all versions of Excel since at least version 5/95, but is documented in the help file only for Excel 2000. For some reason, Microsoft has decided not to document this function in any other versions. DATEDIF is treated as the drunk cousin of the Formula family. Excel knows it lives a happy and useful life, but will not speak of it in polite conversation. Do not confuse the DATEDIF worksheet function with the DateDiff VBA function.
The syntax for DATEDIF is as follows:

=DATEDIF(Date1, Date2, Interval)

Where:
Date1 is the first date,
Date2 is the second date,
Interval is the interval type to return.

If Date1 is later than Date2, DATEDIF will return a #NUM! error. If either Date1 or Date2 is not a valid date, DATEDIF will return a #VALUE error.
The Interval value should be one of

Interval Meaning Description
m Months Complete calendar months between the dates.
d Days Number of days between the dates.
y Years Complete calendar years between the dates.
ym Months Excluding Years Complete calendar months between the dates as if they were of the same year.
yd Days Excluding Years Complete calendar days between the dates as if they were of the same year.
md Days Excluding Years And Months Complete calendar days between the dates as if they were of the same month and same year.

If Interval is not one of the items listed in above, DATEDIF will return a #NUM error.

If you are including the Interval string directly within the formula, you must enclose it in double quotes:

=DATEDIF(Date1,Date2,"m")

If you have the interval in another cell referenced by the formula, that cell should not have quotes around the interval string. For example, with the formula

=DATEDIF(Date1,Date2,A1)

cell A1 should contain m not "m".
Supressing Zero Value Components
In typical usage to calculate the number of years, months, and days between two dates, you would use a formula such as the following, where A1 is the start date and B1 is the end date:

=DATEDIF(A1,B1,"y")&" years "&DATEDIF(A1,B1,"ym")&" months "&DATEDIF(A1,B1,"md")
&" days"


This will return a string such as

12 years 8 months 14 days

However, if the number of years and/or months is 0, you'll get a string like

0 years 0 months 14 days

0 years 3 months 14 days

If you want to suppress the 0 values, and return a result such as

8 months 14 days
or
14 days

where the 0-valued components are not displayed, use a formula like the following:

=IF(DATEDIF(A1,B1,"y")=0,"",DATEDIF(A1,B1,"y")&" years ")&IF(DATEDIF(A1,B1,"ym")=0,"",
DATEDIF(A1,B1,"ym")&" months ")&DATEDIF(A1,B1,"md")&" days"


This will display only the date components whose value is greater than 0. The day value will always be displayed, but the year and/or the month value may be suppresed.
SectionBreak
Examples
The follow are some examples of the DATEDIF function.

Date1: 1-Jan-2007
Date2: 10-Jan-2007
Interval: d
Result: 9
Explanation:
There are 9, not 10, calendar days between these two dates.

Date1: 1-Jan-2007
Date2: 31-Jan-2007
Interval: m
Result: 0
Explanation:
There 0 complete calendar months between the two dates.

Date1: 1-Jan-2007
Date2: 1-Feb-2007
Interval: m
Result: 1
Explanation:
There are 1 complete months between the two dates.

Date1: 1-Jan-2007
Date2: 28-Feb-2007
Interval: m
Result: 1
Explanation:
There are 1 complete months between the two dates.

Date1: 1-Jan-2007
Date2: 31-Dec-2007
Interval: d
Result: 364
Explanation:
There are 364 days between the two dates.

Date1: 1-Jan-2007
Date2: 31-Jan-2007
Interval: y
Result: 0
Explanation:
There are 0 complete years between the dates

Date1: 1-Jan-2007
Date2: 1-July-2008
Interval: d
Result: 547
Explanation:
There are 547 days between the two dates.

Date1: 1-Jan-2007
Date2: 1-July-2008
Interval: m
Result: 18
Explanation:
There are 18 months between the two dates.

Date1: 1-Jan-2007
Date2: 1-July-2008
Interval: ym
Result: 6
Explanation:
There are 6 months between the two dates if the dates are considered to have the same year. The year it taken from Date1, not Date2. This makes a difference when one year is a leap year. Since 2007 is not a leap year, 29-Feb is not counted. See DATEDIF And Leap Years below.

Date1: 1-Jan-2007
Date2: 1-July-2008
Interval: yd
Result: 181
Explanation:
There are 181 days between the dates if the dates are considered to have the same year. The year it taken from Date1, not Date2. This makes a difference when one year is a leap year. Since 2007 is not a leap year, 29-Feb is not counted. See DATEDIF And Leap Years below.

Date1: 1-Jan-2008
Date2: 1-July-2009
Interval: yd
Result: 182
Explanation:
There are 182 days between the dates if the dates are considered to have the same year. This result is 182 not 181 since Date1 is a leap year and thus 29-Feb is counted.

Date1: 1-Jan-2007
Date2: 31-Jan-2007
Interval: md
Result: 30
Explanation:
There are 30 days between the 1st and 31st of the dates when both dates' month and years are considered to be the same.
SectionBreak
DATEDIF And Leap Years
When calculating date intervals, DATEDIF uses the year of Date1, not Date2 when calculating the yd, ym and md intervals. For example,

=DATEDIF(Date1,Date2,"md")

returns 28 for Date1 = 1-Feb-2007 and Date2 = 1-March-2009. Since Date1 is not a leap year, the date 29-Feb is not counted. But the same formula with Date1 = 1-Feb-2008 returns 29, since Date1 is a leap year and therefore the date 29-Feb is counted.
SectionBreak
Calculating Age
You can use the DATEDIF to calculate a person's age. For example, the formula below will calculate the age of a person as of the current date, where BirthDate is the person's date of birth.

=DATEDIF(BirthDate,TODAY(),"y")&" years "&DATEDIF(BirthDate,TODAY(),"ym")&" months "&DATEDIF(BirthDate,TODAY(),"md")&" days"

The Imposing INDEX

From my perspective, the Excel INDEX function is the single most important in the roster of Microsoft Excel functions.
Now that might be surprising considering the function's humdrum name, but please pay close attention, because INDEX is one of the magical secrets of how to use Excel! So what's so great about the INDEX function? It's nonvolatile, sprightly, agile, and versatile. Excel INDEX can return one value or an array of values; it can return a reference to one cell or to a range of cells. INDEX works well on either side of the three Reference Operators - the colon, the space, and the comma.

index_function_parameters_excelhero.png
 
The idea with INDEX is that you give it a range (or an array) and then specify an element to return. 
So,
=INDEX(A1:A5,2)
...returns a reference to cell A2.
And,
=INDEX({95,96,97,98,99},5)
...returns the value of 99.


Note:  If you use an international version of Excel, some of the array constants may not work as presented.  Please see the instructions at the end of this article.


Nothing extraordinary thus far. But it is crucial to understand and so I'll reiterate that the first example above returns a REFERENCE, while the second returns a VALUE. 
Also, note that both of these examples demonstrate how the Excel INDEX function behaves when passed a vector. A vector is a one-dimensional array or range. When passed a vector, INDEX does not care whether that vector is vertical or horizontal. The second parameter of INDEX identifies itself as row number. But this is not correct, when a vector is passed in. Instead of being the row number, the second input becomes the element number for a vector - and as such, horizontal vectors work just fine with this notation. In the second example above, five is not the row number, it is the element number!

But we can force INDEX to behave in its normal two-dimensional fashion:

=INDEX({95,96,97,98,99},1,5)

...also returns the value of 99.

And it is with this two-dimensional behavior that INDEX begins to really shine. Obviously,
=INDEX(A1:C5,1,3)
...returns a reference to C1. But what is surprising to most users is that both of the following do as well:
{=INDEX(A1:C5,0,3)}
{=INDEX(A1:C5,,3)}


Return Whole Rows or Columns
In fact, until the user understands what is happening here, they might think that the function has a bug because the overriding theme with INDEX is that unlike OFFSET for example, it must return an element from within the range or array passed to it - and row zero (or column zero) is outside. So what does ZERO or a null value mean in this setting?
If the row_num parameter is zero or missing, this directs INDEX to return the entire column specified by the column_num parameter!
{=INDEX(A1:C5,,2)}
...returns a reference to the range, B1:B5.
And the converse is also true - specifying a zero or missing column_num will return an entire row. The catch here is that the missing column_num MUST include the comma, like so:
{=INDEX(A1:C5,2,)}
...which returns a reference to the range, A2:C2.

But remember that when a particular cell is given a reference to a multi-cell range or that cell is set equal to an array, just the first element shows in the output cell - so most of the row or column returned by the two above INDEX formulas will be hidden. But you can array-enter one of these Microsoft Excel formulas over a range to see the entire output. You do this by selecting the output range first, typing the formula in the formula bar, and entering it by Control-Shift-Enter.
So yes the technique works with returning entire rows or columns of a two-dimensional array as well:
=SUM(INDEX({1,2,3,4,5;6,7,8,9,10;11,12,13,14,15},3,))
...results in 65, which is the summation of the entire third row of the input array.

Lookups
This ability to return entire columns or rows from a larger range or array is extraordinarily useful.  
Suppose we have a table or list of metrics for countries in the range of A1:M200. We can create a named formula that refers to this range - let's call it simply, d, for data. Then we can name individual columns of the data painlessly. If the country names are in the first column, we can create a named formula:
Country:   =INDEX(d,,1)
And if population is in the 4th column, we can create a named formula:
Population:   =INDEX(d,,4)
Now suppose we wanted to lookup the population of Scotland, all we need to do is:
=INDEX(Population,MATCH("Scotland",Country,0))
The Excel MATCH function simply produces a row number for INDEX.
This method of lookup is on par in terms of speed as doing a VLOOKUP, if you are doing just one lookup. But it has advantages over VLOOKUP. It is not restricted to looking up a column to the right like VLOOKUP is. Excel INDEX MATCH is significantly quicker than VLOOKUP if we are doing lookups for a list of countries, and we array-enter the one formula over the entire output column.
If we need to return multiple columns of metrics for a list of countries, the speed benefit is even greater. Suppose that GDP is the 2nd column of the data and that Capital was the 11th column.
If on a new worksheet, we wanted to output three columns of data for each country, Population, GDP, and Capital, the most efficient way to do so is to dedicate one column to create a common index, and then array-enter the INDEX formula over the entire three columns of output.
In this scenario, column B would be the list of countries. Column A would be dedicated to the common index. In A2, we would enter:
=MATCH(B2,Country,0)
...and then copy this formula down as far as the countries are listed in column B.
And then in columns, C, D, and E we would array-enter over the entire range that extends as far down as the countries:
{=INDEX(d,A2:A100,{4,2,11})}
To be clear, this means that if we had 99 countries in column B, we would select C2:E100, and then in the formula bar we would type the above formula, and then enter it by Control-Shift-Enter.
The end result is that the one formula is executed just once (in array fashion) and is brutally fast leaving a very light, non-volatile footprint in your model.

Sums
This usage of INDEX is a great way to power SUMPROPDUCT as well. For example:
=SUMPRODUCT( (Left(Country,1)="U")*Population )
...which returns the total population for all countries that start with the letter, U.
And while named formulas are self-documenting, this works just as well:
=SUMPRODUCT( (Left(INDEX(d,,1),1)="U")*INDEX(d,,4) )

The Dynamic Range
Building on this idea, we can alter the named formula, d, so that it results in a dynamic range instead of a fixed range. And here INDEX reigns supreme. 
A Dynamic range is often constructed with OFFSET or INDIRECT. Unfortunately, both of these Microsoft Excel functions are volatile, which simply means that all formulas that include these functions will recalculate every single time anything on the worksheet changes. For example, adding a value to any blank cell will cause all volatile formulas (and any other formulas that depend on them) to recalculate. Depending on your model design and size, this can have a major impact on the fluidity and responsiveness of the model.
INDEX can be used to create a dynamic range, and not only is it nonvolatile, it is way faster than either OFFSET or INDIRECT. In fact, the improvement in performance is so great that INDEX should be the foundation of all dynamic ranges in professional models.
The approach to creating the dynamic range is different than the approach used for OFFSET or INDIRECT. For those functions, the range is created within the function. With INDEX on the other hand, the dynamic range is produced by using INDEX on one side (or sometimes both) of the Range Operator, which is the colon.
For example, consider this normal looking range reference:
=A2:A100
By itself, this reference could be a hard reference to the country names in our list of countries. Assuming no blanks in our list and that there is a column header in A1, perhaps "Country", the following formula would return the name of the last country in the list:
=INDEX(A:A,COUNTA(A:A))
But the INDEX function here is really returning the reference to the last cell in column A with a country name, and then the equal sign forces the value of that cell to be returned. In the scenario that we have painted, this INDEX formula is really returning the reference, A100.
So the following two formulas point to the exact same range:
=A2:A100
=A2:INDEX(A:A,COUNTA(A:A))
But there is a significant difference. The first is a hard coded, static reference. The second is a nonvolatile, dynamic range that will expand or contract as the number of countries in the list changes. Please note that before we replace the Refers To value of our named formula, Country, with this INDEX based dynamic range, we need to make the references absolute. It would then look like this:
Country:   =$A$2:INDEX($A:$A,COUNTA($A:$A))
We can use the same techniques to create a two-dimensional dynamic range so that the number of columns is also dynamic:
d:   =$A$2:INDEX($1:$65535,COUNTA($A:$A),COUNTA($1:$1))
And with d defined in this dynamic manner, we can still do all of the wonderful row and column referencing from above, such as:
{=INDEX(d,$A2,{4,2,11})}

As a footnote to this section on dynamic ranges, I want to point out that when you use a function on either side (or both sides) of any of the three Reference Operators, the resulting formula always recalculates when the workbook is opened. So while INDEX is nonvolatile, it becomes what I term quasi-volatile when used for dynamic ranges - but this is orders of magnitude better than volatile, and so it remains the best dynamic range foundation by far.
And as a post script to this section on the dynamic range, you may very well ask should I not just use the new Structured Table References available since Excel 2007? While STR is a robust option, it is heavy. If you need speed, nothing beats the alacrity of INDEX.


Noncontiguous Areas
When working with ranges, INDEX offers a fourth parameter to select the AREA to work with from a noncontiguous input range. The areas are referenced by integer in the order that they appear in the input range. For example,
=INDEX((data1,data2,data3),,,2)
...returns a reference to data2. And
{=INDEX((data1,data2,data3),,1,3)}
...returns the entire first column of data3. With a little ingenuity, this can be extraordinarily useful for charting... and many other activities. And you should note that the number of areas specified has no hard limit, and those noncontiguous areas need not be of the same size!

More
All of this just scratches the surface of what is possible with this essential member of the Microsoft Excel functions. It can be used in countless situations, even to compare Excel worksheets, etc.
Some of the synonyms for the word, "imposing", in my thesaurus are impressive, august, commanding, effective, exciting, magnificent, and mind-blowing. The INDEX worksheet function is truly imposing!


International Versions of Excel
Throughout this article, I have used array constants within formulas.  This is a powerful technique.  In the English Language version of Excel, the symbol used to separate the columns in an array constant is the comma.  The symbol used to separate rows is the semicolon.
Other language versions of Excel use different symbols for column and row separation.  However it is easy to discover what symbols are used in your version of Excel.
Open the VBEditor (ALT-F11), and place the cursor in the Immediate Pane (CTRL-G).
Enter the following two lines in the the Immediate Pane (it's OK to use the clipboard):

?Application.International(xlColumnSeparator)
?Application.International(xlRowSeparator)

Click on each in turn and press ENTER on the keyboard.
Uses the printed characters to separate columns and rows in array constants.

The Venerable SUMPRODUCT

Mastering the venerable SUMPRODUCT function is a requirement for any Excel Hero.

It's one of the most useful in the roster of Microsoft Excel functions.

From my perspective, Excel would be a very different application if the SUMPRODUCT function had never been developed to the degree it is today. It's an amazing function that can be marshaled to solve a dizzying array of problems. It's an Array Function in that it operates on arrays, but it does not need to be entered with the Control-Shift-Enter key combination, and so it does not need the fancy parenthesis that adorn normal Array Formulas. However, it is a function that works on arrays, and is actually slightly quicker than an equivalent CSE Array Function.

One would never guess from the innocuous name the shear power and versatility embedded within the SUMPRODUCT function. Not only is it well worth your time to examine this function in detail, it is quite literally a prerequisite to honing your Excel abilities to the Excel Hero level. Rarely do I build a model that does not make extensive use of the SUMPRODUCT function.

From a maths perspective, SUMPRODUCT calculates the dot product of two vectors (arrays), but realize that it can be used with many more than two arrays.  A better name might have been the SumOfProducts function, as this in a nutshell is what it does.  Consider the following image:

sumproduct.png

Notice that Column D is the product of the three columns to its left. Cell D2 contains the formula =A2*B2*C2 and that formula is copied down as far as the columns go (Row 30 in this case). The entire purpose for Column D is to get intermediary values so that we can later sum them. Cell F4 shows this final calculation. For some spreadsheets this intermediary column approach is wanted for presentation purposes. However, sometimes all we want is the ultimate total, the sum of all those products. And that's exactly what the SUMPRODUCT function does. Notice the formula in the Formula Bar. It results in the same total.

There are some points to note here. 

Firstly, notice that the arrays I've used are the entire columns (A:A, etc.). I've done this on purpose to point out that doing so is a bad idea.  This was not even possible prior to Excel 2007 and would result in an error. Excel 2007 is happy to accommodate the request, but it puts a massive hit on the instance of Excel that workbook is running in. Always try to limit the number of cells that Microsoft Excel functions or formulas must process. A better formula would have been:

=SUMPRODUCT(A2:A30,B2:B30,C2:C30)

The SUMPRODUCT function multiplies elements of different arrays that hold the same position in those arrays and when done, sums all those products. This trivial example made this obvious by showing the function at work on three columns. But understand that those arrays could have just as easily been rows instead of columns, or they could have been rectangular ranges of any arbitrary dimension. They could have been array constants, i.e. {2,3,4,1,9,8} or Named Ranges, or the results of a boolean comparison (simple or complex), a custom mixture of many native Microsoft Excel functions, an Array Function, or even a User Defined Function. The one overriding requirement is that the arrays must be rectangular, contiguous, and of the same dimension. So if Array 1 is a column of 29 elements (as in the above image) then so must be all of the other ranges.

The next point is that this simple example had just three columns (arrays). The SUMPRODUCT function is just as happy with 1 array or up to 30 arrays, each separated by a comma. Thirty is the maximum number of arguments that Microsoft Excel functions can support. 

Ok, so SUMPRODUCT is adept at summing the products of equivalently placed elements in arrays. Why is that so powerful? Because it can be used in many non-obvious ways to calculate totals and mine data.

One way that it shines is calculating sums, counts, averages, and other metrics of list data based on criteria. The result here is basically analogous to a SQL database query that produces a total with a WHERE clause. For example the following SQL query: 

SELECT SUM(OrderPrice) FROM Orders WHERE Customer='Hansen'

could be emulated in Excel with any one of these SUMPRODUCT formulas:

=SUMPRODUCT((OrderPrice),--(Customer="Hansen"))
=SUMPRODUCT((OrderPrice),1*(Customer="Hansen"))
=SUMPRODUCT((OrderPrice),(Customer="Hansen")*1)
=SUMPRODUCT((OrderPrice),(Customer="Hansen")+0)
=SUMPRODUCT((OrderPrice),(Customer="Hansen")^1)
=SUMPRODUCT((OrderPrice),SIGN(Customer="Hansen"))
=SUMPRODUCT((OrderPrice),N(Customer="Hansen"))
=SUMPRODUCT((OrderPrice),(Customer="Hansen")*True)
=SUMPRODUCT((OrderPrice)*(Customer="Hansen"))

Some notes are in order here. 

The terms OrderPrice and Customer could be Named Ranges (they would basically be defined as columns in a list or table and would each be of the same length). Those Named Ranges (I actually prefer to refer to them as Named Formulas) could be dynamic ranges. Or conversely the terms could be replaced with direct cell references, such as C2:C1000 for OderPrice and B2:B1000 for Customer. Or the terms could be replaced by almost any combination of Microsoft Excel functions that returns an array. Please note that for some reason, SUMPRODUCT always fails when using an IF function on any of the terms inside the SUMPRODUCT, which is OK because we can use boolean logic to eliminate the need for branching. Please read I Heart IF for details.

The order of the terms is irrelevant. In any of the above examples, the answer is the same, whether the OrderPrice term or the Customer term comes first.

The first eight variations above all have two things in common. They use a comma to separate the terms, and each is employing a slightly different technique to coerce a lone boolean value (True or False) into their numeric equivalents, 1 or 0. If we fail to do this coercion on LONE boolean values, the SUMPRODUCT function always returns the value of 0. Of the eight different coercion methods, the double unary method (--) is the quickest. 

The ninth variation (the last one) above is the syntax I personally use most often. The entire calculation is done in the first argument so no commas are required and since the boolean term is being multiplied by the OderPrice term, there is no need to further coerce the boolean into a number. The multiplication has already done. 

This method is shortest in punctuation and therefore more concise. But it is imperceptibly slower. However, the trade off is greater flexibility. Since all of the work is being done in the first argument to the SUMPRODUCT function, there is no longer a limit of 30 terms. Using this syntax you can combine as many terms as you need. And most important of all, this method allows for more robust logic in the criteria. Specifically, we can specify OR clauses. For example the following SQL query: 

SELECT SUM(OrderPrice) FROM Orders WHERE Customer='Hansen' OR Customer='Jones'

could be emulated in Excel with this SUMPRODUCT formula:

=SUMPRODUCT((OrderPrice) * ((Customer="Hansen")+(Customer="Jones")))

When combining the boolean terms (the ones that evaluate to True or False), they are easy to read if you remember that the plus symbol means OR, while the multiplication symbol means AND. Using this technique, you can tabulate very specific records indeed. For example, consider the following SQL query:

SELECT SUM(OrderPrice) FROM Orders WHERE (Customer='Hansen' OR Customer='Jones') AND State='CA' AND SalesDate>=#1/1/2010# And SalesDate<=#1/31/2010#

This query just says to total all of the orders in January of 2010 from California for the customers Hansen or Jones. It could be achieved in Excel using this SUMPRODUCT formula:

=SUMPRODUCT((OrderPrice) * ((Customer="Hansen")+(Customer="Jones")) * (State="CA")*(SalesDate>="1/1/2010") * (SalesDate<="1/31/2010"))

For me this is even more "readable" than the SQL database query. 

No matter how precise your criteria, which basically means no matter how many criteria terms you need to combine, the SUMPRODUCT function stands ready to tally the specified records, when used with my preferred syntax of combining everything into the first argument.

Another advantage (in my opinion) to using this syntax style is that it acts differently than the comma separated form when there is text in the summing array (i.e. OrderPrice in our examples) in any record returned by the criteria. Sometimes erroneous text can be in a column of data. The comma separated method will treat such text as a 0, combining everything in the first term will result in an error when the text is found in criteria specific records. I prefer to be notified with the error rather than silently calculating an incorrect result.


You'll notice that in all of these examples, we are summing the OrderPrice column. If instead we wanted to count the number of records that satisfied the criteria, all we need to do is drop the OrderPrice term from the formula. So if we wanted to know how many orders there were in January of 2010 from California for the customer Hansen or Jones, we would use this formula:

=SUMPRODUCT(((Customer="Hansen")+(Customer="Jones")) * (State="CA") * (SalesDate>="1/1/2010") * (SalesDate<="1/31/2010"))

Notice that the only difference here is that the OrderPrice term is gone. And finally if we wanted to know the Average amount of those orders we would just divide the total amount by the number of orders. This makes for a long formula, but now that you understand the syntax, it's easy to follow. This very specific average would be calculated with this formula:

=SUMPRODUCT((OrderPrice) * ((Customer="Hansen")+(Customer="Jones")) * (State="CA") * (SalesDate>="1/1/2010") * (SalesDate<="1/31/2010"))    /    SUMPRODUCT(((Customer="Hansen")+(Customer="Jones")) * (State="CA") * (SalesDate>="1/1/2010") * (SalesDate<="1/31/2010"))

It should be noted here that sometimes when the formulas grow this large, it can be helpful to name them (or parts of them) in the Name Manager.

These type of SUMPRODUCT calcs with multiple criteria can become very interesting when the individual terms are used with OFFSET or INDEX or INDIRECT or MOD/ROW. These enhancements go beyond the scope of this post, but they enable such refinements as considering only every nth record in the tabulations, changing the scope of the array, and many other fascinating feats.


Experienced users may point out that most of the above can be done by using the SUMIF or COUNTIF (or in Excel 2007 SUMIFS and COUNTIFS) functions. This is only partially true. The SUMIF and COUNTIF functions can only work with one criterion. The SUMIFS, COUNTIFS, and AVERAGEIFS function of Excel 2007 can work with up to 29 criteria, but they are limited to AND Logic between the criteria with no way to specify OR Logic.

Really experienced users might point out that all of the above can be done with Array Formulas. While this is true, the SUMPRODUCT function is optimized and is roughly 10% faster than an equivalent Array Formula, and there is no need for the Control-Shift-Enter formula entry.

All of this just scratches the surface of what the SUMPRODUCT function can accomplish. It can be used in countless situations that have nothing to do with database calculations. For example, if you wanted to know if the number in A1 was a prime number, you could use this non-Array Formula:

=SUMPRODUCT(1*(MOD(A1,ROW(INDIRECT("1:"&INT(A1^0.5))))=0))=1

SUMPRODUCT can be used to do a multi-column sort by formula.

When a SUMPRODUCT function is combined with Conditional Formatting some very interesting results can be achieved.

Another extremely important use of the SUMPRODUCT function which opens an entire universe of new possibilities is to use it to conduct bitwise logical operations such as Logical AND, OR, XOR, NOT, IMP, or EQ. This is the direct manipulation of the bits inside of an integer. There are countless uses of bit manipulation and this post can get you started.

Finally, amongst Excel gurus there is considerable debate about the best way to do database type calculations in Excel. While SUMPRODUCT as demonstrated in this post is very capable, there are alternatives. No one solution is the perfect answer to every situation. I have been conducting timing trials and will soon share detailed results for many different techniques. In the meantime it should be enough to know that SUMPRODUCT is perfectly fine when it is scanning several thousand records, even tens of thousands. However, if you are using it for hundreds of thousands of records than an alternative is likely in order.