Thursday, August 14, 2008
Add an uninstall and him shortcut to .NET deployment project in VS2005
Thanks!
1. Create simple project UninstallApp.
2. Add in
Main()
{
string[] arguments = Environment.GetCommandLineArgs();
foreach(string argument in arguments)
{
string[] parameters = argument.Split('=');
if (parameters[0].ToLower() == "/u")
{
string productCode = parameters[1];
string path = Environment.GetFolderPath(Environment.SpecialFolder.System);
Process proc = new Process();
proc.StartInfo.FileName = string.Concat(path,file://msiexec.exe/);
proc.StartInfo.Arguments = string.Concat(" /i ", productCode);
proc.Start();
}
}
3. Create new setup project
4. Add UninstallApp.exe to "Application Folder" in 'File System' part
5. In "User's Program menu" create shortcut to UninstallApp.exe and in properties of this shortcut in parameter 'arguments' insert value "/u=[ProductCode]".
6.Rebuild deployment project.Good luck ;)
Tuesday, August 5, 2008
Mapping USB to LPT1
net use lpt1: \\server\printer /Persistent:yes
Friday, July 11, 2008
Control SQL Server locking with hints
Takeaway: For clients dependent upon SQL, the number of concurrent users accessing a database could cause havoc on data integrity if left unchecked. Using these locking hints, you can ensure data changes are handled appropriately.
Over the past few years, SQL Server has blossomed from a small office data store to an enterprise-level database server. The number of users concurrently accessing a database also increased with this upgrade. SQL Server 2000's standard approach to locking resources often seems inefficient, but thankfully it provides features to override the standard locking. Locking hints may be used to tell the server how to lock resources, but let's examine locking before covering them.
What is a lock?
Relational database systems like SQL Server use locks to prevent users from stepping on each other's toes. That is, locks prevent users from making conflicting data changes. When one user has a particular piece of data locked, no other user may modify it. In addition, a lock prevents users from viewing uncommitted data changes. Users must wait for the changes to be saved before viewing. Data may be locked using various methods. SQL Server 2000 uses locks to implement pessimistic concurrency control among multiple users performing modifications in a database at the same time.
Deadlocks
A database deadlock can occur when there is a dependency between two or more database sessions for some set of resources. A deadlock is a condition that can occur on any system with multiple threads, not just on a relational database management system. A thread in a multithreaded system may acquire one or more resources (for example, locks). If the resource being acquired is currently owned by another thread, the first thread may have to wait for the owning thread to release the target resource. The waiting thread is said to have a dependency on the owning thread for that particular resource. The following listing shows the text of an exception where a deadlock occurred:
System.Data.SqlClient.SqlException: Transaction (Process ID 12) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
This exception was thrown when one SQL Server call conflicted with another resource that held a lock on the necessary resource. Consequently, one of the processes was terminated. This is a common error message for deadlocks with the process ID being unique to the system.
Types of locks
A database system may lock data items at one of many possible levels within the system hierarchy. The possibilities include:
- Rows—an entire row from a database table
- Pages—a collection of rows (usually a few kilobytes)
- Extents—usually a collection of a few pages
- Table—an entire database table
- Database—the entire database table is locked
Unless otherwise specified, the database uses its own judgment to determine the best locking approach based upon the scenario. Locking is a resource-intensive activity (with respect to memory), so this is not always the best approach. Thankfully, SQL Server does provide a way to circumvent the default behavior. This is accomplished with locking hints.
Hints
There are times when you need to override SQL Server's locking scheme and force a particular range of locks on a table. Transact-SQL provides a set of table-level locking hints that you can use with SELECT, INSERT, UPDATE, and DELETE statements to tell SQL Server how you want it to lock the table by overriding any other system-wide or transactional isolation levels. The available hints include the following:
- FASTFIRSTROW—The query is optimized to get the first row of the result set.
- HOLDLOCK—Hold a shared lock until the transaction has been completed.
- NOLOCK—Do not issue shared locks or recognize exclusive locks. This may result in data being returned that has been rolled back or has not been committed; therefore, working with dirty data is possible. This may only be used with the SELECT statement.
- PAGLOCK—Locks the table.
- READCOMMITTED—Read only data from transactions that have been committed. This is SQL Server's default behavior.
- READPAST—Rows locked by other processes are skipped, so the returned data may be missing rows. This may only be used with the SELECT statement.
- READUNCOMMITTED—Equivalent to NOLOCK.
- REPEATABLEREAD—Locks are placed on all data used in queries. This prevents other users from updating the data, but new phantom rows can be inserted into the data set by another user and are included in later reads in the current transaction.
- ROWLOCK—Locks the data at row level. SQL Server often locks at the page or table level to modify a row, so developers often override this setting when working with single rows.
- SERIALIZABLE—Equivalent to HOLDLOCK.
- TABLOCK—Lock at the table level. You may want to use this when performing many operations on table-level data.
- UPDLOCK—Use update locks instead of shared locks while reading a table, and hold locks until the end of the transaction. This has the advantage of allowing you to read data without locking and to update that data later knowing the data has not changed.
- XLOCK—Uses an exclusive lock on all resources until the end of the transaction.
Microsoft has two categories for the hints: granularity and isolation-level. Granularity hints include PAGLOCK, NOLOCK, ROWLOCK, and TABLOCK. On the other hand, isolation-level hints include HOLDLOCK, NOLOCK, READCOMMITTED, REPEATABLEREAD, and SERIALIZABLE. A maximum of one from each group may be used.
These hints allow the consultant to control the locking used by SQL Server, and they are included in the Transact-SQL statement. They are placed in the FROM portion of the statement preceded by the WITH statement. The WITH statement is an option with SQL Server 2000, but Microsoft strongly urges its inclusion. This leads many to believe that it may be mandatory in future SQL Server releases. Here is the hint syntax as it applies to the FROM clause:
[ FROM { <> } [ ,...n ] ]
<> ::=
table_name [ [ AS ] table_alias ] [ WITH ( <> [ ,...n ] ) ]
<> ::=
{ INDEX ( index_val [ ,...n ] )
| FASTFIRSTROW
| HOLDLOCK
| NOLOCK
| PAGLOCK
| READCOMMITTED
| READPAST
| READUNCOMMITTED
| REPEATABLEREAD
| ROWLOCK
| SERIALIZABLE
| TABLOCK
| TABLOCKX
| UPDLOCK
| XLOCK }
While this syntax does show its usage, it's easier to show a real example. The following Transact-SQL statement selects all data from the Employees table of the Northwind database:
SELECT *
FROM Employees WITH (nolock)
This gives me all data regardless of what other processes are currently doing with it, so the data may be dirty, but this is not important to my task. Another example updates all rows in a table, setting a field to a certain value:
UPDATE
Employees WITH (tablock)
SET Title='Test'
This example is updating every row in the table, so a table lock is utilized.
Alternate
At this point, I must stress the fact that even though a table-level hint is specified in code, the query optimizer may ignore the hint. Table-level hints are ignored if the query optimizer does not choose the table and used in the subsequent query plan. Also, the query optimizer will often choose an indexed view over a table. Lastly, a hint may be ignored if the table contains computed columns.
Use your discretion
Using table hints in your applications depends upon what is required. Whether you use them at all will depend upon your needs. For example, many consultants love to use the FASTFIRSTROW hint to return the first row quickly. This gives them something to work with while the rest of the query completes. When the data is unlikely to change (e.g., archived data), the NOLOCK hint is a good choice since the data is basically static. On the other hand, this approach would not be good when doing financial applications, when accuracy is a must.
Thursday, July 10, 2008
Database Modeling versus Database Design
Database modeling is generally focused on logical and physical database models. A logical database model is composed of entities and attributes, and it includes relationships between the different entities, which can be mandatory or not. The logical model consists of a normalized model that is generally set to third normal form. It includes many elements that make up a database, but it is not specific to any software or database implementation. Performance factors are not a major consideration at this point nor are the applications that will be using the database. The main concern is building a model of what the database would look like when capturing the data needed by the users.
The denormalization process begins with the physical database model. The database team takes the work performed in the logical model and starts to optimize it for queries, specific database implementations, and applications that may talk to the database. More specific items, like views, are added to make working with the database easier for users. The physical database model also needs to be mapped back to the logical database model; they can be very different, and each has its own purpose. As changes occur, the team must update one model from the other and maintain that mapping between them.
Database Design
While database modeling focuses mostly on depicting the database, database design encompasses the entire process from the creation of requirements, business processes, logical analysis, and physical database constructs to the deployment of the database. For example, in database design, physical data modeling includes the modeling of not only tables and columns but also tablespaces, partitions, hardware, and the entire makeup of the database system. Database design includes uncovering and modeling requirements, the business processes (as they are today and where they are going in the future), the activities of the business, the logical models, and the physical database models, as well as addressing issues of what information is needed, how the different parts relate, how applications communicate with the data, and how the entire system is implemented.
Sql Server 2005 - Powerful paging using WITH statement
Sql server 2005 formerly named "Yukon" , is bundled with loads of features that can rock a database developer's mind.Among many of features, one thing that is truly mind blowing is the mighty WITH statement. Previously , it is a real tedious task to do paging from database. Some of the option were creating temp table, using cursors,etc. But, now is the time to forget those tricky methodologies.
Lets consider a simple table called employee( emp_id, name, salary). Now, suppose that we need to create a gridview with paging. One option is to bring all the data from database and do the paging in client-side and another one is to do selective fetch. In both cases we can use a stored procedure that takes some parameter and returns a resultset.
Let's see the following...
CREATE PROCEDURE GetEmployees
@Status int,
@StartIndex int,
@PageSize int
AS
WITH FilteredList( [emp_id],[name], [salary], [RowNumber])
AS
(
SELECT
[emp_id],
[name],
[salary],
ROW_NUMBER() OVER ( ORDER BY [ID] DESC) AS [RowNumber]
FROM
Employee
)
SELECT
*
FROM
FilteredList
WHERE
RowNumber BETWEEN (@StartIndex + 1) AND (@StartIndex + @PageSize)
Actually , the WITH creates a in-memory data structure based on ROWNUMBER or any other identifier that we might give. Finally, we separate the result based on that identifier. But anyway in the end it makes our life more simple at the end of the day.
Here in Pageflakes, i use it almost everywhere possible, which surely saves my time and nothing else.
Tuesday, July 1, 2008
TSQL Query Deep Look
(11)
(1) FROM
(3)
(5) GROUP BY
(6) WITH {CUBE ROLLUP}
(7) HAVING
(10) ORDER BY
In SQL, the first clause that is processed is the FROM clause, while the SELECT clause which appears at first is processed almost last.Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller.
If a certain clause is not specified in a query, the corresponding step is simply skipped. Following is a brief description of the different logical steps applied in both SQL Server 2000 and SQL Server 2005.
Brief Description of Logical Query Processing Phases
1. FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
2. ON: The ON filter is applied to VT1. Only rows for which the
3. OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
4. WHERE: The WHERE filter is applied to VT3. Only rows for which the
5. GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
6. CUBE ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
7. HAVING: The HAVING filter is applied to VT6. Only groups for which the
8. SELECT: The SELECT list is processed, generating VT8.
9. DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
10. ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
11. TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.
How-To: SQL query IIS Logs with Log Parser
Face it: the built-in reports in Community Server are not good enough if you want detailed information about your traffic. Google Analytics is great, but you can't export the data in order to use it in your own queries. A better way is to analyze the IIS log files directly and join the information with the CS database tables. But how can you find useful data in these huge files?
First of all: what do you really want to know?
I wanted to answer the following question: How many request do we get per day and blog post ?
I'm only interested in "real people" requests. So we need to eliminate all requests from bots and feed readers.
I only want to count page requests, so we need to eliminate all requests to images and other files.
I only want to count blog page requests, no navigation on tags, or other pages.
I only want to count successful requests.
I want to use the data to join it with CS Database data, in order to query additional meta-information (Author, Blog Name, Post creation date etc.) so some transformation of the standard IISW3C fields has to take place.
Since we are database specialists here at Trivadis, it's quite natural that SQL is a perfect language to query data. But how do you query an IIS log file which is in a special format (no CSV)?
No, don't even try do do it with Excel or Access. Even SQL Server can't import dozens of files in a proprietary format easily. You could try to play with SSIS, but it would kill you, believe me :-). The files can have breaks where the header is repeated somewhere in the middle of a file, and none of the tools I mentioned can deal with that. Also, the size is just too much for office tools.
Use Log Parser! This tool is just great, simple to use and extremely powerful. It's only purpose is to query log files (not only IIS). And it does it very well: with a SQL-like query language you can run queries against complete folders of files, and the performance is just amazing. It takes some seconds to query gigabytes of date. But it is only a command line tool. The cream topping is the UI that the fine people at SerialCoder wrote: Visual Log Parser GUI . It's quite simple, but has all the features you need if you are a GUI spoiled developer like me :-). Just think of it as a SQL Server Management Studio, but for files.
A query to provide data for the use case I mentioned above could look like this:
select
ViewDate, PostName,
KlickCount,
Blognameusing
date as ViewDate, coalesce( replace_str(extract_prefix(extract_token(cs-uri-stem, 7,'/'),0,'.'),'-',' '),'default') as PostName
, to_uppercase( extract_token(cs-uri-stem, 2 ,'/')) as BlogName
, count(*) as KlickCount
from 'E:\Blog\*.log'
where cs-uri-stem like '/blogs/%'
and cs-uri-stem not like '%.png'
and cs-uri-stem not like '%.jpg'
and cs-uri-stem not like '%.gif'
and cs-uri-stem not like '%rss%'
and cs-uri-stem not like '%/tags/%'
and cs-uri-stem not like '%MainFeed.aspx'
and cs(User-Agent) not like '%bot%'
and cs(User-Agent) not like '%slurp%'
and cs(User-Agent) not like '%crawl%'
and [date] between to_timestamp('2007-12-01' ,'yyyy-MM-dd') and to_timestamp('2007-12-17','yyyy-MM-dd')
and extract_token(cs-uri-stem, 3 ,'/') in ('archive', 'default.aspx')
and sc-status = 200
group by date, PostName, BlogName
order by KlickCount desc
The output should look somewhat like this (one example row) :
12.09.2007 00:00:00;mvc castle subsonic bye bye webforms;111;MICHAELKOENINGS
For Community Server admins:
The PostName column matches the column PostName on the cs_Posts Table. The Blogname column matches the ApplicationKey column on the sections table. I used MS-Access to bring the data together. This is the perfect starting point to provide excel pivot tables about the performance of your service that makes every manager happy :-)
Some notes:
The hits on the default.aspx page cannot be matched with a particular post. I will try to find a solution to match it by date. The last post that was saved before the page was displayed should be count.
More cool stuff to come: there is a free regions DB available on the internet. With that you can resolve the country, region and city of your visitors IP address. Since it cannot be queried with SQL (it is a proprietary binary format that can only be read via a API), we will use LINQ to help us with that. That's right: a real world LINQ example! So stay tuned!
How many days left on Terminal Services without Licensing Server
The quote from Microsoft's article makes that pretty clear.
As far as the remaining grace period is concerned, you can try to check the date of the %Systemroot%\System32\Lserver folder on the TS license server; this should be the date you installed the TS licensing.
You could try lsreport.exe from the Resource Kit as described here:HOW TO: Use the Terminal Services Licensing Reporter Tool (Lsreport.exe)http://support.microsoft.com/?kbid=317592
Don't forget that the licensing server might have a grace period of 120 days, but the temporary licenses it's issuing are only valid for 90 days; from then on, the client will require a valid license to connect.
Tuesday, April 1, 2008
How does Add/Remove Programs get the size and other information?
This has saved me trying to uninstall an application from an XP embedded machine (has no Add/Remove Applet).
Thank Mr. Chen
If the program doesn't provide this information itself, Add/Remove Programs is forced to guess.
The problem is that there is no "obvious" way to map an entry in the Add/Remove Programs list to an actual program. Each entry in the list, for those who care about such things, comes from the HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Uninstall registry key.
The only mandatory properties for an uninstallable program are the DisplayName and the UninstallPath. Everything else is optional.
Let's suppose Add/Remove Programs is given a program registration like this:
HKEY_LOCAL_MACHINE\
Software\
Microsoft\
Windows\
CurrentVersion\
Uninstall\
SomeProgram
DisplayName=REG_SZ:"Awesome Program for Windows"
UninstallPath=REG_SZ:"C:\WINDOWS\uninstall.exe -SomeParameters"
In order to get the "Last Used" and "Frequency" values, Add/Remove Programs needs to know the name of the EXE so it can ask the Start menu "Hey, how often did the user run this program, and when was the last time it happened?"
Notice that there are no clues in the registration above as to the identity of this EXE file.
So Add/Remove Programs starts guessing. It goes through all the programs on your Start menu and compares their names with the display name of the uninstallable item. It looks for Start menu items which share at least two words with the words in the DisplayName.
For example, if there were a Start menu item called "Pretty Decent Windows Program", this would count as a two-word match ("Windows" and "Program").
It then takes the one with the most matches and decides, "Okay, I guess this is it." Suppose for the sake of illustration that the best match is indeed "Pretty Decent Windows Program.lnk", which is a shortcut to "C:\Program Files\LitWare\Decent Program\Decent.exe". Add/Remove Programs would decide that "Awesome Program for Windows" should get the icon for "Pretty Decent Windows Program.lnk", that the frequency of use and most-recently-used information for "C:\Program Files\LitWare\Decent Program\Decent.exe" will be displayed for "Awesome Program for Windows".
But wait, there's more. There's also the program size. Add/Remove Programs looks in your "Program Files" directory for directories whose names share at least two words in common with the DisplayName. The best match is assumed to be the directory that the program files are installed into. The sizes are added together and reported as the size of "Awesome Program for Windows".
A program can add some properties to its registration to avoid a lot of this guessing. It can set an EstimatedSize property to avoid making Add/Remove Programs guess how big the program is. It can also set a DisplayIcon property to specify which icon to show for the program in the list.
But if a program omits all of these hints, the guess that Add/Remove Programs ends up making can often be ridiculously wide of the mark due to coincidental word matches. In my experience, Spanish suffers particularly badly from this algorithm, due to that language's heavy use of prepositions and articles (which result in a lot of false matches).
Yes, this is all lame, but when you are forced to operate with inadequate information, lame is the best you can do.
Friday, March 28, 2008
http://www.iridescence.no/Posts/A-Set-of-Useful-Extension-Methods-for-DateTime.aspx
I was just reading Rich Strahl's post today on formatting dates in JavaScript, in which he mentions that the .NET framework has a fairly rich API for working with dates. Compared to JavaScript, he's definitively right - but that doesn't mean that the DateTime API is anything near perfect. Disregarding all the minute detail problems regarding time zones and such, or the fact that we're forced to deal with date AND time together and no separating the two, there are some things that are just plain cumbersome and tedious to get done with it. But lo and behold, with the new extensions method feature in C# 3.0 we can do something about that - so in this post I'd like to present 11 time-saving (oh, the pun!) extensions for the DateTime class.
The Extentions
First()This method returns a DateTime representing the first day of the month represented by the instance it is called on. For example:
DateTime firstOfMarch = DateTime.Now.First();
First(DayOfWeek dayOfWeek)
This method returns the first occurrence of the specified day in the month represented by the instance it is called on. For example, the first Monday of this month was the 3rd of March:
DateTime thirdOfMarch = DateTime.Now.First(DayOfWeek.Monday);
Last()
This method returns a DateTime representing the last day of the month represented by the instance it is called on. For example:
DateTime lastDayOfMarch = DateTime.Now.Last();
Last(DayOfWeek dayOfWeek)
This method returns a DateTime representing the last occurence of the specified day in the month represented by the instance it is called on. For example, the last Tuesday of this month will be the 25th of March:
DateTime lastTuesdayOfMarch = DateTime.Now.Last(DayOfWeek.Tuesday);
Midnight()
This method returns a DateTime representing midnight on the day represented by the instance it is called on:
DateTime midnightToday = DateTime.Now.Midnight();
Noon()This method returns a DateTime representing noon on the day represented by the instance it is called on:
DateTime noonToday = DateTime.Now.Noon();
Next(DayOfWeek dayOfWeek)
This method returns the first date falling on the specified day following the one represented by the instance it is called on. For example, to get a DateTime representing next monday:
DateTime nextMonday = DateTime.Now.Next(DayOfWeek.Monday);
SetTime(hour)
SetTime(hour, minute)
SetTime(hour, minute, second)
SetTime(hour, minute, second, millisecond)
These four overloads can be used to get a DateTime representing a different time of day on the same date as the instance it is called on.
For instance, if we want to get a DateTime representing 11am today:
DateTime lunchTime = DateTime.Now.SetTime(11);
Almost Fluent
Since each of these methods in turn return a new DateTime instance, they can be chained together in a way resembling fluent interfaces. For example, say I wanted to represent half past two PM, next Tuesday:
DateTime lateLunchWithCustomer = DateTime.Now.Next(DayOfWeek.Tuesday).SetTime(14,30);
Or maybe I want to represent midnight on the second Monday of the month:
DateTime secondMondayOfMonth = DateTime.Now.First(DayOfWeek.Monday).Next(DayOfWeek.Monday).Midnight();
Go Get It!
I've created a project and uploaded the source code for the extension methods and accompanying unit tests to Codeplex - as much for having an excuse to test Codeplex, as for the fact that the methods I've knocked together this evening could just be the beginnings of a larger and more useful set of extensions for the DateTime class.
Friday, March 21, 2008
Indexes in SQL Server
Indexes are used by relational databases like SQL Server to find data quickly when a query is processed. One doesn’t need to modify the application’s code while manipulating indexes. Indexes operate “behind the scenes” in support of the database engine. However, creating the proper index can drastically increase the performance of an application.
The SQL Server engine uses an index in much the same way a reader uses a book index. For example, one way to find all references to INSERT statements in a SQL book would be to begin on page one and scan each page of the book. We could mark each time we find the word INSERT until we reach the end of the book. This approach is pretty time consuming and laborious. Alternately, we can also use the index in the back of the book to find a page number for each occurrence of the INSERT statements. This approach produces the same results as above, but with tremendous savings in time.
When a SQL Server has no index to use for searching, the result is similar to the reader who looks at every page in a book to find a word: the SQL engine needs to visit every row in a table. In database terminology we call this behavior a table scan, or just scan.
A table scan is not always a problem, and is sometimes unavoidable. However, as a table grows to thousands of rows and then millions of rows and beyond, scans become correspondingly slower and more expensive.
Create an Index
We can create the index using the following SQL. The command specifies the name of the index (IDX_UnitPrice), the table name (Products), and the column to index (UnitPrice).
Let’s create an Index on the Products table of the Northwind database.
CREATE INDEX [IDX_UnitPrice] ON Products (UnitPrice)
To verify that the index is created, use the following stored procedure to see a list of all indexes on the Products table:
EXEC sp_helpindex Customers
How It Works
The database takes the columns specified in a CREATE INDEX command and sorts the values into a special data structure known as a B-tree. A B-tree structure supports fast searches with a minimum amount of disk reads, allowing the database engine to quickly find the starting and stopping points for the query we are using.
Each index entry contains the index key (UnitPrice). Each entry also includes a reference (which points) to the table rows which share that particular value and from which we can retrieve the required information.
Much like the index in the back of a book helps us to find keywords quickly, so the database is able to quickly narrow the number of records it must examine to a minimum by using the sorted list of UnitPrice values stored in the index. We have avoided a table scan to fetch the query results. Given this sketch of how indexes work, lets examine some of the scenarios where indexes offer a benefit.
Taking Advantage of Indexes
The database engine can use indexes to boost performance in a number of different queries. Sometimes these performance improvements are dramatic. An important feature of SQL Server 2000 is a component known as the query optimizer. The query optimizer’s job is to find the fastest and least resource intensive means of executing incoming queries. An important part of this job is selecting the best index or indexes to perform the task. In the following sections we will examine the types of queries with the best chance of benefiting from an index.
Searching For Records
The most obvious use for an index is in finding a record or set of records matching a WHERE clause. Indexes can aid queries looking for values inside of a range (as we demonstrated earlier), as well as queries looking for a specific value. By way of example, the following queries can all benefit from an index on UnitPrice:
DELETE FROM Products WHERE UnitPrice = 1 UPDATE Products SET Discontinued = 1 WHERE UnitPrice > 15
SELECT * FROM PRODUCTS WHERE UnitPrice BETWEEN 14 AND 16
Indexes work just as well when searching for a record in DELETE and UPDATE commands as they do for SELECT statements.
Sorting Records
When we ask for a sorted dataset, the database will try to find an index and avoid sorting the results during execution of the query. We control sorting of a dataset by specifying a field, or fields, in an ORDER BY clause, with the sort order as ASC (ascending) or DESC (descending). For example, the following query returns all products sorted by price:
SELECT * FROM Products ORDER BY UnitPrice ASC
With no index, the database will scan the Products table and sort the rows to process the query. However, the index we created on UnitPrice (IDX_UnitPrice) earlier provides the database with a presorted list of prices. The database can simply scan the index from the first entry to the last entry and retrieve the rows in sorted order.
The same index works equally well with the following query, simply by scanning the index in reverse.
SELECT * FROM Products ORDER BY UnitPrice DESC
Grouping Records
We can use a GROUP BY clause to group records and aggregate values, for example, counting the number of orders placed by a customer. To process a query with a GROUP BY clause, the database will often sort the results on the columns included in the GROUP BY. The following query counts the number of products at each price by grouping together records with the same UnitPrice value.
SELECT Count(*), UnitPrice FROM Products GROUP BY UnitPrice
The database can use the IDX_UnitPrice index to retrieve the prices in order. Since matching prices appear in consecutive index entries, the database is able count the number of products at each price quickly. Indexing a field used in a GROUP BY clause can often speed up a query.
Maintaining a Unique Column
Columns requiring unique values (such as primary key columns) must have a unique index applied. There are several methods available to create a unique index. Marking a column as a primary key will automatically create a unique index on the column. We can also create a unique index by checking the Create UNIQUE checkbox in the dialog shown earlier. The screen shot of the dialog displayed the index used to enforce the primary key of the Products table. In this case, the Create UNIQUE checkbox is disabled, since an index to enforce a primary key must be a unique index. However, creating new indexes not used to enforce primary keys will allow us to select the Create UNIQUE checkbox. We can also create a unique index using SQL with the following command:
CREATE UNIQUE INDEX IDX_ProductName On Products (ProductName)
The above SQL command will not allow any duplicate values in the ProductName column, and an index is the best tool for the database to use to enforce this rule. Each time an application adds or modifies a row in the table, the database needs to search all existing records to ensure none of values in the new data duplicate existing values. Indexes, as we should know by now, will improve this search time.
Index Drawbacks
There are trade-offs to almost any feature in computer programming, and indexes are no exception. While indexes provide a substantial performance benefit to searches, there is also a downside to indexing. Let’s talk about some of those drawbacks now.
Indexes and Disk Space
Indexes are stored on the disk, and the amount of space required will depend on the size of the table, and the number and types of columns used in the index. Disk space is generally cheap enough to trade for application performance, particularly when a database serves a large number of users. To see the space required for a table, use the sp_spaceused system stored procedure in a query window.
EXEC sp_spaceused Orders
Given a table name (Orders), the procedure will return the amount of space used by the data and all indexes associated with the table, like so:
Name rows reserved data index_size unused——- ——– ———– —— ———- ——-Orders 830 504 KB 160 KB 320 KB 24 KB
According to the output above, the table data uses 160 kilobytes, while the table indexes use twice as much, or 320 kilobytes. The ratio of index size to table size can vary greatly, depending on the columns, data types, and number of indexes on a table.
Indexes and Data Modification
Another downside to using an index is the performance implication on data modification statements. Any time a query modifies the data in a table (INSERT, UPDATE, or DELETE), the database needs to update all of the indexes where data has changed. As we discussed earlier, indexing can help the database during data modification statements by allowing the database to quickly locate the records to modify, however, we now caveat the discussion with the understanding that providing too many indexes to update can actually hurt the performance of data modifications. This leads to a delicate balancing act when tuning the database for performance.
In decision support systems and data warehouses, where information is stored for reporting purposes, data remains relatively static and report generating queries outnumber data modification queries. In these types of environments, heavy indexing is commonplace in order to optimize the reports generated. In contrast, a database used for transaction processing will see many records added and updated. These types of databases will use fewer indexes to allow for higher throughput on inserts and updates.
Every application is unique, and finding the best indexes to use for a specific application usually requires some help from the optimization tools offered by many database vendors. SQL Server 2000 and Access include the Profiler and Index Tuning Wizard tools to help tweak performance.
Now we have enough information to understand why indexes are useful and where indexes are best applied. It is time now to look at the different options available when creating an index and then address some common rules of thumb to use when planning the indexes for your database.
Types of Indexes
Clustered Indexes
Earlier in the article we made an analogy between a database index and the index of a book. A book index stores words in order with a reference to the page numbers where the word is located. This type of index for a database is a non-clustered index; only the index key and a reference are stored. In contrast, a common analogy for a clustered index is a phone book. A phone book still sorts entries into alphabetical order. The difference is, once we find a name in a phone book, we have immediate access to the rest of the data for the name, such as the phone number and address.
For a clustered index, the database will sort the table’s records according to the column (or columns) specified by the index. A clustered index contains all of the data for a table in the index, sorted by the index key, just like a phone book is sorted by name and contains all of the information for the person inline. The non-clustered indexes created earlier in the chapter contain only the index key and a reference to find the data, which is more like a book index. You can only create one clustered index on each table.
A clustered index is the most important index you can apply to a table. If the database engine can use a clustered index during a query, the database does not need to follow references back to the rest of the data, as happens with a non-clustered index. The result is less work for the database, and consequently, better performance for a query using a clustered index.
To create a clustered index, simply select the Create As CLUSTERED checkbox in the dialog box we used at the beginning of the chapter. The SQL syntax for a clustered index simply adds a new keyword to the CREATE INDEX command, as shown below:
CREATE CLUSTERED INDEX IDX_SupplierID ON Products(SupplierID)
Most of the tables in the Northwind database already have a clustered index defined on a table. Since we can only have one clustered index per table, and the Products table already has a clustered index (PK_Products) on the primary key (ProductId), the above command should generate the following error:
Cannot create more than one clustered index on table ‘Products’.Drop the existing clustered index ‘PK_Products’ before creating another.
As a general rule of thumb, every table should have a clustered index. If you create only one index for a table, use a clustered index. Not only is a clustered index more efficient than other indexes for retrieval operations, a clustered index also helps the database efficiently manage the space required to store the table. In SQL Server, creating a primary key constraint will automatically create a clustered index (if none exists) using the primary key column as the index key.
Sometimes it is better to use a unique non-clustered index on the primary key column, and place the clustered index on a column used by more queries. For example, if the majority of searches are for the price of a product instead of the primary key of a product, the clustered index could be more effective if used on the price field. A clustered index can also be a UNIQUE index.
A Disadvantage to Clustered Indexes
If we update a record and change the value of an indexed column in a clustered index, the database might need to move the entire row into a new position to keep the rows in sorted order. This behavior essentially turns an update query into a DELETE followed by an INSERT, with an obvious decrease in performance. A table’s clustered index can often be found on the primary key or a foreign key column, because key values generally do not change once a record is inserted into the database.
Composite Indexes
A composite index is an index on two or more columns. Both clustered and non-clustered indexes can be composite indexes. Composite indexes are especially useful in two different circumstances. First, you can use a composite index to cover a query. Secondly, you can use a composite index to help match the search criteria of specific queries. We will go onto more detail and give examples of these two areas in the following sections.
Covering Queries with an Index
Earlier in the article we discussed how an index, specifically a non-clustered index, contains only the key values and a reference to find the associated row of data. However, if the key value contains all of the information needed to process a query, the database never has to follow the reference and find the row; it can simply retrieve the information from the index and save processing time. This is always a benefit for clustered indexes.
As an example, consider the index we created on the Products table for UnitPrice. The database copied the values from the UnitPrice column and sorted them into an index. If we execute the following query, the database can retrieve all of the information for the query from the index itself.
SELECT UnitPrice FROM Products ORDER BY UnitPrice
We call these types of queries covered queries, because all of the columns requested in the output are contained in the index itself. A clustered index, if selected for use by the query optimizer, always covers a query, since it contains all of the data in a table.
For the following query, there are no covering indexes on the Products table.
SELECT ProductName, UnitPrice FROM Products ORDER BY UnitPrice
This is because although the database will use the index on UnitPrice to avoid sorting records, it will need to follow the reference in each index entry to find the associated row and retrieve the product name. By creating a composite index on two columns (ProductName and UnitPrice), we can cover this query with the new index.
Matching Complex Search Criteria
For another way to use composite indexes, let’s take a look at the OrderDetails table of Northwind. There are two key values in the table (OrderID and ProductID); these are foreign keys, referencing the Orders and Products tables respectively. There is no column dedicated for use as a primary key; instead, the primary key is the combination of the columns OrderID and ProductID.
The primary key constraint on these columns will generate a composite index, which is unique of course. The command the database would use to create the index looks something like the following:
CREATE UNIQUE CLUSTERED INDEX PK_Order_Details ON [Order Details] (OrderID, ProductID)
The order in which columns appear in a CREATE INDEX statement is significant. The primary sort order for this index is OrderID. When the OrderID is the same for two or more records, the database will sort this subset of records on ProductID.
The order of columns determines how useful the index is for a query. Consider the phone book sorted by last name then first name. The phone book makes it easy to find all of the listings with a last name of Smith, or all of the listings with a last name of Jones and a first name of Lisa, but it is difficult to find all listings with a first name of Gary without scanning the book page by page.
Likewise, the composite index on Order Details is useful in the following two queries:
SELECT * FROM [Order Details] WHERE OrderID = 11077
SELECT * FROM [Order Details] WHERE OrderID = 11077 AND ProductID = 13
However, the following query cannot take advantage of the index we created since ProductID is the second part of the index key, just like the first name field in a phone book.
SELECT * FROM [Order Details] WHERE ProductID = 13
In this case, ProductID is a primary key, however, so an index does exist on the ProductID column for the database to use for this query.
Suppose the following query is the most popular query executed by our application, and we decided we needed to tune the database to support it.
SELECT ProductName, UnitPrice FROM Products ORDER BY UnitPrice
We could create the following index to cover the query. Notice we have specified two columns for the index: UnitPrice and ProductName (making the index a composite index):
CREATE INDEX IX_UnitPrice_ProductName ON Products(UnitPrice, ProductName)
While covered queries can provide a performance benefit, remember there is a price to pay for each index we add to a table, and we can also never cover every query in a non-trivial application.
Additional Index Guidelines
Choosing the correct columns and types for an index is another important step in creating an effective index. In this section, we will talk about two main points, namely short index keys and selective indexes (we’ll explain what selective indexes are in just a moment).
Keep Index Keys Short
The larger an index key is, the harder a database has to work to use the index. For instance, an integer key is smaller in size then a character field for holding 100 characters. In particular, keep clustered indexes as short as possible.
There are several approaches to keeping an index key short. First, try to limit the index to as few columns as possible. While composite indexes are useful and can sometimes optimize a query, they are also larger and cause more disk reads for the database. Secondly, try to choose a compact data type for an index column, based on the number of bytes required for each data type. Integer keys are small and easy for the database to compare. In contrast, strings require a character-by-character comparison.
As a rule of thumb, try to avoid using character columns in an index, particularly primary key indexes. Integer columns will always have an advantage over character fields in ability to boost the performance of a query.
Distinct Index Keys
The most effective indexes are the indexes with a small percentage of duplicated values. Think of having a phone book for a city where 75% of the population has the last name of Smith. A phone book in this area might be easier to use if the entries were sorted by the resident’s first names instead. A good index will allow the database to disregard as many records as possible during a search.
An index with a high percentage of unique values is a selective index. Obviously, a unique index is the most selective index of all, because there are no duplicate values. SQL Server will track statistics for indexes and will know how selective each index is. The query optimizer utilizes these statistics when selecting the best index to use for a query.
Maintaining Indexes
In addition to creating an index, we’ll need to view existing indexes, and sometimes delete or rename them. This is part of the ongoing maintenance cycle of a database as the schema changes, or even naming conventions change.
View Existing Indexes
A list of all indexes on a table is available in the dialog box we used to create an index. Click on the Selected index drop down control and scroll through the available indexes.
There is also a stored procedure named sp_helpindex. This stored procedure gives all of the indexes for a table, along with all of the relevant attributes. The only input parameter to the procedure is the name of the table, as shown below.
EXEC sp_helpindex Customers
Rename an Index
We can also rename any user created object with the sp_rename stored procedure, including indexes. The sp_rename procedure takes, at a minimum, the current name of the object and the new name for the object. For indexes, the current name must include the name of the table, a dot separator, and the name of the index, as shown below:
EXEC sp_rename 'Products.IX_UnitPrice', 'IX_Price'
This will change the name of the IX_UnitPrice index to IX_Price.
Delete an Index
It is a good idea to remove an index from the database if the index is not providing any benefit. For instance, if we know the queries in an application are no longer searching for records on a particular column, we can remove the index. Unneeded indexes only take up storage space and diminish the performance of modifications. You can remove most indexes with the Delete button on the index dialog box, which we saw earlier. The equivalent SQL command is shown below.
DROP Index Products.IX_Price
Again, we need to use the name of the table and the name of the index, with a dot separator. Some indexes are not so easy to drop, namely any index supporting a unique or primary key constraint. For example, the following command tries to drop the PK_Products index of the Products table.
DROP INDEX Products.PK_Products
Since the database uses PK_Products to enforce a primary key constraint on the Products table, the above command should produce the following error.
An explicit DROP INDEX is not allowed on index ‘Products.PK_Products’.It is being used for PRIMARY KEY constraint enforcement.
Removing a primary key constraint from a table is a redesign of the table, and requires careful thought. It makes sense to know the only way to achieve this task is to either drop the table and use a CREATE TABLE command to recreate the table without the index, or to use the ALTER TABLE command.
TSQL Execution Order
IMHO, this is the main difficulty in understanding complicated queries.According to ANSI standard the query should be written in the following way:
SELECT [...]
FROM [table1]
JOIN [table2]ON [condition]
WHERE [...]
GROUP BY [...]
HAVING [...]
ORDER BY [...]
However the query will be processed by MSSQL in the following order like so:
FROM [table1]ON [condition]
JOIN [table2]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...]
ORDER BY [...]
Please note, that ORDER BY is the last to be processed, only TOP clause would be processed after ORDER BY. DISTINCT will be processed after SELECT and before ORDER BY.