Euclid's Dream: July 2008

Friday, July 11, 2008

Control SQL Server locking with hints

Takeaway: For clients dependent upon SQL, the number of concurrent users accessing a database could cause havoc on data integrity if left unchecked. Using these locking hints, you can ensure data changes are handled appropriately.

Over the past few years, SQL Server has blossomed from a small office data store to an enterprise-level database server. The number of users concurrently accessing a database also increased with this upgrade. SQL Server 2000's standard approach to locking resources often seems inefficient, but thankfully it provides features to override the standard locking. Locking hints may be used to tell the server how to lock resources, but let's examine locking before covering them.

What is a lock?
Relational database systems like SQL Server use locks to prevent users from stepping on each other's toes. That is, locks prevent users from making conflicting data changes. When one user has a particular piece of data locked, no other user may modify it. In addition, a lock prevents users from viewing uncommitted data changes. Users must wait for the changes to be saved before viewing. Data may be locked using various methods. SQL Server 2000 uses locks to implement pessimistic concurrency control among multiple users performing modifications in a database at the same time.

Deadlocks
A database deadlock can occur when there is a dependency between two or more database sessions for some set of resources. A deadlock is a condition that can occur on any system with multiple threads, not just on a relational database management system. A thread in a multithreaded system may acquire one or more resources (for example, locks). If the resource being acquired is currently owned by another thread, the first thread may have to wait for the owning thread to release the target resource. The waiting thread is said to have a dependency on the owning thread for that particular resource. The following listing shows the text of an exception where a deadlock occurred:
System.Data.SqlClient.SqlException: Transaction (Process ID 12) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

This exception was thrown when one SQL Server call conflicted with another resource that held a lock on the necessary resource. Consequently, one of the processes was terminated. This is a common error message for deadlocks with the process ID being unique to the system.

Types of locks
A database system may lock data items at one of many possible levels within the system hierarchy. The possibilities include:

Rows—an entire row from a database table
Pages—a collection of rows (usually a few kilobytes)
Extents—usually a collection of a few pages
Table—an entire database table
Database—the entire database table is locked

Unless otherwise specified, the database uses its own judgment to determine the best locking approach based upon the scenario. Locking is a resource-intensive activity (with respect to memory), so this is not always the best approach. Thankfully, SQL Server does provide a way to circumvent the default behavior. This is accomplished with locking hints.

Hints
There are times when you need to override SQL Server's locking scheme and force a particular range of locks on a table. Transact-SQL provides a set of table-level locking hints that you can use with SELECT, INSERT, UPDATE, and DELETE statements to tell SQL Server how you want it to lock the table by overriding any other system-wide or transactional isolation levels. The available hints include the following:

FASTFIRSTROW—The query is optimized to get the first row of the result set.
HOLDLOCK—Hold a shared lock until the transaction has been completed.
NOLOCK—Do not issue shared locks or recognize exclusive locks. This may result in data being returned that has been rolled back or has not been committed; therefore, working with dirty data is possible. This may only be used with the SELECT statement.
PAGLOCK—Locks the table.
READCOMMITTED—Read only data from transactions that have been committed. This is SQL Server's default behavior.
READPAST—Rows locked by other processes are skipped, so the returned data may be missing rows. This may only be used with the SELECT statement.
READUNCOMMITTED—Equivalent to NOLOCK.
REPEATABLEREAD—Locks are placed on all data used in queries. This prevents other users from updating the data, but new phantom rows can be inserted into the data set by another user and are included in later reads in the current transaction.
ROWLOCK—Locks the data at row level. SQL Server often locks at the page or table level to modify a row, so developers often override this setting when working with single rows.
SERIALIZABLE—Equivalent to HOLDLOCK.
TABLOCK—Lock at the table level. You may want to use this when performing many operations on table-level data.
UPDLOCK—Use update locks instead of shared locks while reading a table, and hold locks until the end of the transaction. This has the advantage of allowing you to read data without locking and to update that data later knowing the data has not changed.
XLOCK—Uses an exclusive lock on all resources until the end of the transaction.

Microsoft has two categories for the hints: granularity and isolation-level. Granularity hints include PAGLOCK, NOLOCK, ROWLOCK, and TABLOCK. On the other hand, isolation-level hints include HOLDLOCK, NOLOCK, READCOMMITTED, REPEATABLEREAD, and SERIALIZABLE. A maximum of one from each group may be used.

These hints allow the consultant to control the locking used by SQL Server, and they are included in the Transact-SQL statement. They are placed in the FROM portion of the statement preceded by the WITH statement. The WITH statement is an option with SQL Server 2000, but Microsoft strongly urges its inclusion. This leads many to believe that it may be mandatory in future SQL Server releases. Here is the hint syntax as it applies to the FROM clause:
[ FROM { <> } [ ,...n ] ]
<> ::=
table_name [ [ AS ] table_alias ] [ WITH ( <> [ ,...n ] ) ]
<> ::=
{ INDEX ( index_val [ ,...n ] )
| FASTFIRSTROW
| HOLDLOCK
| NOLOCK
| PAGLOCK
| READCOMMITTED
| READPAST
| READUNCOMMITTED
| REPEATABLEREAD
| ROWLOCK
| SERIALIZABLE
| TABLOCK
| TABLOCKX
| UPDLOCK
| XLOCK }

While this syntax does show its usage, it's easier to show a real example. The following Transact-SQL statement selects all data from the Employees table of the Northwind database:
SELECT *
FROM Employees WITH (nolock)

This gives me all data regardless of what other processes are currently doing with it, so the data may be dirty, but this is not important to my task. Another example updates all rows in a table, setting a field to a certain value:
UPDATE
Employees WITH (tablock)
SET Title='Test'

This example is updating every row in the table, so a table lock is utilized.

Alternate
At this point, I must stress the fact that even though a table-level hint is specified in code, the query optimizer may ignore the hint. Table-level hints are ignored if the query optimizer does not choose the table and used in the subsequent query plan. Also, the query optimizer will often choose an indexed view over a table. Lastly, a hint may be ignored if the table contains computed columns.

Use your discretion
Using table hints in your applications depends upon what is required. Whether you use them at all will depend upon your needs. For example, many consultants love to use the FASTFIRSTROW hint to return the first row quickly. This gives them something to work with while the rest of the query completes. When the data is unlikely to change (e.g., archived data), the NOLOCK hint is a good choice since the data is basically static. On the other hand, this approach would not be good when doing financial applications, when accuracy is a must.

Thursday, July 10, 2008

Database Modeling versus Database Design

Database Modeling

Database modeling is generally focused on logical and physical database models. A logical database model is composed of entities and attributes, and it includes relationships between the different entities, which can be mandatory or not. The logical model consists of a normalized model that is generally set to third normal form. It includes many elements that make up a database, but it is not specific to any software or database implementation. Performance factors are not a major consideration at this point nor are the applications that will be using the database. The main concern is building a model of what the database would look like when capturing the data needed by the users.

The denormalization process begins with the physical database model. The database team takes the work performed in the logical model and starts to optimize it for queries, specific database implementations, and applications that may talk to the database. More specific items, like views, are added to make working with the database easier for users. The physical database model also needs to be mapped back to the logical database model; they can be very different, and each has its own purpose. As changes occur, the team must update one model from the other and maintain that mapping between them.

Database Design

While database modeling focuses mostly on depicting the database, database design encompasses the entire process from the creation of requirements, business processes, logical analysis, and physical database constructs to the deployment of the database. For example, in database design, physical data modeling includes the modeling of not only tables and columns but also tablespaces, partitions, hardware, and the entire makeup of the database system. Database design includes uncovering and modeling requirements, the business processes (as they are today and where they are going in the future), the activities of the business, the logical models, and the physical database models, as well as addressing issues of what information is needed, how the different parts relate, how applications communicate with the data, and how the entire system is implemented.

Sql Server 2005 - Powerful paging using WITH statement

Source : mehfuz blog

Sql server 2005 formerly named "Yukon" , is bundled with loads of features that can rock a database developer's mind.Among many of features, one thing that is truly mind blowing is the mighty WITH statement. Previously , it is a real tedious task to do paging from database. Some of the option were creating temp table, using cursors,etc. But, now is the time to forget those tricky methodologies.

Lets consider a simple table called employee( emp_id, name, salary). Now, suppose that we need to create a gridview with paging. One option is to bring all the data from database and do the paging in client-side and another one is to do selective fetch. In both cases we can use a stored procedure that takes some parameter and returns a resultset.

Let's see the following...

CREATE PROCEDURE GetEmployees
@Status int,
@StartIndex int,
@PageSize int
AS

WITH FilteredList( [emp_id],[name], [salary], [RowNumber])
AS
(
SELECT
[emp_id],
[name],
[salary],
ROW_NUMBER() OVER ( ORDER BY [ID] DESC) AS [RowNumber]

FROM
Employee
)

SELECT
*
FROM
FilteredList
WHERE
RowNumber BETWEEN (@StartIndex + 1) AND (@StartIndex + @PageSize)

Actually , the WITH creates a in-memory data structure based on ROWNUMBER or any other identifier that we might give. Finally, we separate the result based on that identifier. But anyway in the end it makes our life more simple at the end of the day.

Here in Pageflakes, i use it almost everywhere possible, which surely saves my time and nothing else.

Tuesday, July 1, 2008

TSQL Query Deep Look

(8) SELECT (9) DISTINCT
(11)
(1) FROM
(3) JOIN (2) ON (4) WHERE
(5) GROUP BY
(6) WITH {CUBE ROLLUP}
(7) HAVING
(10) ORDER BY

In SQL, the first clause that is processed is the FROM clause, while the SELECT clause which appears at first is processed almost last.Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller.

If a certain clause is not specified in a query, the corresponding step is simply skipped. Following is a brief description of the different logical steps applied in both SQL Server 2000 and SQL Server 2005.

Brief Description of Logical Query Processing Phases

1. FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.

2. ON: The ON filter is applied to VT1. Only rows for which the is TRUE are inserted to VT2.

3. OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.

4. WHERE: The WHERE filter is applied to VT3. Only rows for which the is TRUE are inserted to VT4.

5. GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.

6. CUBE ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.

7. HAVING: The HAVING filter is applied to VT6. Only groups for which the is TRUE are inserted to VT7.

8. SELECT: The SELECT list is processed, generating VT8.

9. DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.

10. ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).

11. TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.

How-To: SQL query IIS Logs with Log Parser

Face it: the built-in reports in Community Server are not good enough if you want detailed information about your traffic. Google Analytics is great, but you can't export the data in order to use it in your own queries. A better way is to analyze the IIS log files directly and join the information with the CS database tables. But how can you find useful data in these huge files?
First of all: what do you really want to know?

I wanted to answer the following question: How many request do we get per day and blog post ?
I'm only interested in "real people" requests. So we need to eliminate all requests from bots and feed readers.

I only want to count page requests, so we need to eliminate all requests to images and other files.
I only want to count blog page requests, no navigation on tags, or other pages.
I only want to count successful requests.
I want to use the data to join it with CS Database data, in order to query additional meta-information (Author, Blog Name, Post creation date etc.) so some transformation of the standard IISW3C fields has to take place.

Since we are database specialists here at Trivadis, it's quite natural that SQL is a perfect language to query data. But how do you query an IIS log file which is in a special format (no CSV)?

No, don't even try do do it with Excel or Access. Even SQL Server can't import dozens of files in a proprietary format easily. You could try to play with SSIS, but it would kill you, believe me :-). The files can have breaks where the header is repeated somewhere in the middle of a file, and none of the tools I mentioned can deal with that. Also, the size is just too much for office tools.
Use Log Parser! This tool is just great, simple to use and extremely powerful. It's only purpose is to query log files (not only IIS). And it does it very well: with a SQL-like query language you can run queries against complete folders of files, and the performance is just amazing. It takes some seconds to query gigabytes of date. But it is only a command line tool. The cream topping is the UI that the fine people at SerialCoder wrote: Visual Log Parser GUI . It's quite simple, but has all the features you need if you are a GUI spoiled developer like me :-). Just think of it as a SQL Server Management Studio, but for files.

A query to provide data for the use case I mentioned above could look like this:

select
ViewDate, PostName,
KlickCount,
Blognameusing
date as ViewDate, coalesce( replace_str(extract_prefix(extract_token(cs-uri-stem, 7,'/'),0,'.'),'-',' '),'default') as PostName
, to_uppercase( extract_token(cs-uri-stem, 2 ,'/')) as BlogName
, count(*) as KlickCount
from 'E:\Blog\*.log'
where cs-uri-stem like '/blogs/%'
and cs-uri-stem not like '%.png'
and cs-uri-stem not like '%.jpg'
and cs-uri-stem not like '%.gif'
and cs-uri-stem not like '%rss%'
and cs-uri-stem not like '%/tags/%'
and cs-uri-stem not like '%MainFeed.aspx'
and cs(User-Agent) not like '%bot%'
and cs(User-Agent) not like '%slurp%'
and cs(User-Agent) not like '%crawl%'
and [date] between to_timestamp('2007-12-01' ,'yyyy-MM-dd') and to_timestamp('2007-12-17','yyyy-MM-dd')
and extract_token(cs-uri-stem, 3 ,'/') in ('archive', 'default.aspx')
and sc-status = 200
group by date, PostName, BlogName
order by KlickCount desc

The output should look somewhat like this (one example row) :
12.09.2007 00:00:00;mvc castle subsonic bye bye webforms;111;MICHAELKOENINGS

For Community Server admins:
The PostName column matches the column PostName on the cs_Posts Table. The Blogname column matches the ApplicationKey column on the sections table. I used MS-Access to bring the data together. This is the perfect starting point to provide excel pivot tables about the performance of your service that makes every manager happy :-)

Some notes:

The hits on the default.aspx page cannot be matched with a particular post. I will try to find a solution to match it by date. The last post that was saved before the page was displayed should be count.

More cool stuff to come: there is a free regions DB available on the internet. With that you can resolve the country, region and city of your visitors IP address. Since it cannot be queried with SQL (it is a proprietary binary format that can only be read via a API), we will use LINQ to help us with that. That's right: a real world LINQ example! So stay tuned!

How many days left on Terminal Services without Licensing Server

'The time limit has been over since the first client connected' is not that the grace period has ended, but that the legal use of your terminal server without having TS licenses has ended; there is no "free trial" period.

The quote from Microsoft's article makes that pretty clear.
As far as the remaining grace period is concerned, you can try to check the date of the %Systemroot%\System32\Lserver folder on the TS license server; this should be the date you installed the TS licensing.

You could try lsreport.exe from the Resource Kit as described here:HOW TO: Use the Terminal Services Licensing Reporter Tool (Lsreport.exe)http://support.microsoft.com/?kbid=317592

Don't forget that the licensing server might have a grace period of 120 days, but the temporary licenses it's issuing are only valid for 90 days; from then on, the client will require a valid license to connect.