In Oracle, a database refers to the entire Oracle RDBMS environment and includes the following components:
• Oracle database processes and buffers (instance).
• SYSTEM tablespace containing one centralized system catalog, which is made up of one or more datafiles.
• Other optional tablespaces as defined by the database administrator (DBA), each made up of one or more datafiles.
• Two or more online Redo Logs.
• Archived Redo Logs (optional).
• Miscellaneous files (control file, Init.ora, Config.ora, and so on).
A database in Microsoft SQL Server 2005 refers to a physical grouping of set of schema objects of a database into one or more physical files. Databases are classified into system-defined database and user-defined database within SQL Server 2005. The system databases consist of the system data and also controls the temporary storage required for application data. The application data is available in a SQL Server user-defined database.
An instance of SQL Server can support multiple databases. Applications built using SQL Server can use databases to logically divide business functionality. There can be multiple instances of SQL Server on a single computer. Each instance of SQL Server can contain multiple databases.
Each SQL Server database can support filegroups, which provide the ability to physically distribute the placement of the data. A SQL Server filegroup categorizes the operating-system files containing data from a single SQL Server database to simplify database administration tasks, such as backing up. A filegroup is a property of a SQL Server database and cannot contain the operating-system files of more than one database, although a single database can contain more than one filegroup. After database creation, filegroups can be added to the database.
Figure 1: Oracle tablespaces and SQL Server databases compared.
Microsoft SQL Server also installs the following databases by default:
• The model database—a template for all newly created user databases and tempdb.
• The tempdb database—is similar to an Oracle temporary tablespace in that it is used for temporary working storage and sort operations. Unlike the Oracle temporary tablespace, users can create temporary tables that are automatically dropped when the user logs off.
• The msdb database—which supports the SQL Server Agent and its scheduled jobs, alerts, and replication information.
• The AdventureWorks and AdventureWorksDW databases—are sample databases for testing and training purposes that can be optionally installed.
Physical and Logical Storage Structures
The Oracle RDBMS is comprised of tablespaces, which in turn are comprised of datafiles. Tablespace datafiles are formatted into internal units called blocks. The block size is set by the DBA when the Oracle database is first created. When an object is created in an Oracle tablespace, the user can specify its space in units called extents (initial extent, next extent, min extents, and max extents). If no extent size is explicitly defined, a default extent is created. An Oracle extent varies in size and must contain a chain of at least five contiguous blocks.
SQL Server utilizes a fixed 8 KB page size which is the basic unit of IO. To better manage pages, SQL Server organizes these pages into groups of eight pages which are physically contiguous to each other, called extents. Although IO is performed a page at a time, space is managed in terms of extents. Each page belongs to one object and is of one type (data, index, GAM, IAM, etc), but an extent may belong to multiple objects (up to eight).
An extent that has all eight pages belonging to the same object is referred to as a Uniform Extent; otherwise, it is referred to as a mixed extent. Although a row may not span more than one page (8060 bytes – page minus header space), SQL Server 2005 may move data to ROW_OVERFLOW_ALLOCATION unit. Extents live within a filegroup. The following diagram provides a graphical comparison of how blocks and extents in Oracle map to pages and extents in SQL Server.
Figure 2: Comparing Oracle and SQL Server pages and extents.
Database System Catalogs
Each Oracle database runs on one centralized system catalog, or data dictionary, which resides in the SYSTEM tablespace. Each Microsoft SQL Server 2005 database maintains its own system catalog, which contains information related to:
• Database objects (tables, indexes, stored procedures, views, triggers, and so on).
• Users and permissions.
• User-defined data types.
• Schema Objects
• Files used by the database.
• Replication definition
• Snapshot definitions
The system level information for an instance of SQL Server 2005 is recorded by the master database. The recorded information includes the following:
• Database names and the primary file location for each database.
• SQL Server login accounts.
• System messages.
• Database configuration values.
• Remote and/or linked servers—a feature in SQL Server 2005 which enables you to execute commands against OLEDB Data sources on remote servers (including Oracle).
• Current activity information.
• Endpoint information—endpoints are means of implementing TCP protocol-based interfaces for implementation of certain features including HTTP web services, Service Broker, Database mirroring.
• System stored procedures.
In SQL Server 2005, the system objects are not stored in the master database but stored in a hidden database, called the resource database.
The system catalog is used in SQL Server 2005 in order to retrieve metadata information about objects in a database. The system catalogs are accessible to the user as catalog views. In order to monitor health of a server instance, diagnose problems and tune performance, the SQL Server 2005 Dynamic Management views can be used. The dynamic management views in SQL Server are similar to the V$_ views in Oracle which are used for performance monitoring.
Like the SYSTEM tablespace in Oracle, the SQL Server master database must be available to access any other database. As such, it is important to protect against failures by backing up the master database after any significant changes are made to the database. Database administrators can also mirror the files that make up the master database.
Data Files on Disk
Oracle-type segments are not needed for most Microsoft SQL Server installations. Instead, SQL Server can distribute, or stripe, data more efficiently with hardware-based RAID or with software–based RAID solutions available through the Windows disk administration infrastructure or from third party software. Though software RAID is supported, hardware RAID solutions are strongly recommended for their superior performance and resilience.
The recommended RAID configuration for SQL Server is RAID 1+0 (mirroring of a striped set) or RAID 1 (mirroring) for the data files. RAID 1+0 is considerably more expensive to implement but it offers the greatest performance and protection from failures. RAID 5 (stripe sets with parity) may be implemented for SQL Server data files that are used mostly for read operations.
Transaction log files operations are sequential in nature so RAID 1 is the typical recommendation. Not many systems will require striping log files for performance unless the system has very high rates of write operations. As such, mirroring the log files for protection against failure is usually adequate so long as the disk can support the number of IO operations required by the application. Further details on SQL Server’s transaction log architecture are discussed in the section below on Transaction Logs and Automatic Recovery.
SQL Server 2005 also uses several different background processes to efficiently manage computer resources. One example is checkpoints. SQL Server periodically generates automatic checkpoints in each database. Checkpoints flush dirty data and log pages from the buffer cache of the current database, minimizing the number of modifications that have to be rolled forward during a recovery.
Another background process is called Lazywriter which is unique to each instance. The lazywriter process sleeps for an interval of time then wakes to scan through the buffer cache where it checks the size of the free buffer list. If the free buffer list is below a certain point (dependent on the size of the cache) the lazywriter process scans the buffer cache to reclaim unused pages and write dirty pages that have not been recently referenced, while frequently referenced pages remain in memory. If Lazywriter is used on a NUMA-capable server then a lazywriter process will be spawned for each NUMA node for an instance.
While SQL Server 2005 made LazyWriter to control only data pages in the buffer memory, another thread called ResourceMonitor was introduced to control behavior the rest of the memory caches and clerks. Its role is to react to internal and external memory pressures dynamically. The ResourceMonitor background thread won’t allow overall cache memory to exceede 70% of Buffer Pool. If SQL Server’s prediction mechanism detects such a possibility, the ResourceMonitor starts running and shrinks the caches. In addition, ResourceMonitor won’t allow any single cache to be larger than 50% of the Buffer Pool.
Transaction Logs and Automatic Recovery
The Oracle RDBMS performs automatic recovery each time it is started. Oracle verifies that the contents of the tablespace files are coordinated with the contents of the online redo log files. If they are not, Oracle applies the contents of the online redo log files to the tablespace files (roll forward), and then removes any uncommitted transactions that are found in the rollback segments (roll back). If Oracle cannot obtain the information it requires from the online redo log files, it consults the archived redo log files.
Microsoft SQL Server 2005 also performs automatic data recovery by checking each database in the system each time it is started. SQL Server first checks the master database and then launches threads to recover all of the other databases in the system. For each SQL Server database, the automatic recovery mechanism checks the transaction log. If the transaction log contains any uncommitted transactions, the transactions are rolled back. The recovery mechanism then checks the transaction log for committed transactions that have not yet been written out to the database. If it finds any, it performs those transactions again, rolling forward.
Each SQL Server transaction log has the combined functionality of an Oracle rollback segment and an Oracle online redo log. Each database has its own transaction log that records all changes to the database and is shared by all users of that database. When a transaction begins and a data modification occurs, a BEGIN TRANSACTION event (as well as the modification event) is recorded in the log. This event is used during automatic recovery to determine the starting point of a transaction. As each data modification statement is received, the changes are written to the transaction log prior to being written to the database itself.
SQL Server has an automatic checkpoint mechanism that ensures completed transactions are regularly written from the SQL Server disk cache to the transaction log file. A checkpoint writes any cached page that has been modified since the last checkpoint to the database. Checkpointing these cached pages, known as dirty pages, onto the database, ensures that all completed transactions are written out to disk. This process shortens the time that it takes to recover from a system failure, such as a power outage. This setting can be changed by modifying the recovery interval setting by using SQL Server Management Studio or the Transact-SQL sp_configure system stored procedure.
Virtual Log Files
Internally, transaction logs are broken down into smaller granular chunks called virtual log files (or VLFs). VLFs are the unit of truncation for the transaction log. The purpose of VLFs is to allow for easier and automated management of the transaction log. When a VLF no longer contains log records for active transactions, it can be truncated and the space becomes available to log new transactions.
The smallest size for a virtual log file is 256 KB. The minimum size for a transaction log is 512 KB, which provides two 256 KB VLFs. The number and size of the virtual log files in a transaction log increase as the size of the log file increases.
As records are written to the log, the end of the log grows from one virtual log file to the next. If there is more than one physical log file for a database, the end of the log grows through each virtual log file in each physical file before circling back to the first virtual log file in the first physical file. Once all log files are full, the log will begin to grow automatically (if configured to allow auto-grow). The size and number of VLFs you’ll have depends largely on the size that the chunk is when it’s added to the transaction log.
If you add a new chunk to the transaction log which is 20 MB (through auto growth or through manual growth) then the number of VLFs that are added is 4.
• If you add a chunk which is greater than 64 MB but less than or equal to 1 GB, you’ll add 8 VLFs.
• If you add more than 1 GB then you’ll add 16 VLFs.
In general, most transaction logs will only have 20 or 30 VLFs – even 50 could be reasonable depending on the total size of the transaction log. However, in many cases what happens is that excessive auto-grow occurrences can cause an excessive number of VLFs to be added – sometimes resulting in hundreds of VLFs. Excessive VLFs adds overhead to log related activities such as transaction logging, log backups, replication log reader performance, and triggers. The general rule of thumb is to avoid frequent auto growth by pre-allocating the transaction log size such that auto growth is not likely to occur. SQL Server’s auto-grow feature (available for both data and log files) is intended to be a “safety-net” in the event of significant unplanned storage use. It is not intended to replace storage capacity planning and deployment.
Backing up Data in SQL Server 2005
Like Oracle, Microsoft SQL Server 2005 offers several options for backing up data, including:
• Full Backups
• Full Differential Backups
• Partial Backups
• Partial Differential Backups
• File and Filegroup Backups
• Transaction Log Backups
All backups can be performed while the database is in use, allowing backups to be made on high availability databases. The backup processing and internal data structures of SQL Server maximize their rate of data transfer with minimal effect on transaction throughput.
Both Oracle and SQL Server require a specific format for backup files. In SQL Server, these files are called backup devices and are created using SQL Server Management Studio or the Transact-SQL sp_addumpdevice stored procedure.
Although backups can be performed manually, it is recommended that you use SQL Server Management Studio and/or the Maintenance Plan Wizard to schedule periodic backups or backups based on database activity.
Each backup option is discussed in detail below.
A full backup (also known as a database backup) backs up the entire database, including part of the transaction log (so that the full backup can be recovered). A full back up is a representation of the database at the time of the backup being successfully completed. By developing a backup strategy that combines full backups with partial transaction log backups allows the recovery of the database to a specific point of time.
A full backup can be executed from the Backup task in the SQL Server Management Studio or by using the BACKUP DATABASE statement. Like other backup methods, full backup can also be scheduled using SQL Server Agent or via the Database Maintenance Wizard.
Full Differential backup
After a full backup, backing up incremental changes to the database can be performed using a full differential backup. The full differential backup can be implemented using the SQL Server Management Studio back up task and also using the BACK UP DATABASE WITH DIFFERENTIAL statement. The Differential option backs up only incremental changes to the database. As such, the backup occupies less space and takes less time to complete than a full backup. However, a full backup is required for recovery along with a differential backup.
Partial backup is an enhanced feature in SQL Server 2005. A partial backup consists of each file in the default Primary Filegroup and every read/write Filegroup and specific read-only files that are specified using the Backup command. The partial backup is a new feature introduced in SQL Server 2005 in order to add more flexibility to the backup procedure in SQL Server databases. It is an ideal model to back up write intensive portions of a database.
Partial Differential Backup
Partial differential backups are used only with partial backups. A partial differential contains only those extents changed in the primary filegroup and read-write filegroups as of the time of backup. If only some of the data captured by a partial backup has changed, using a partial differential backup allows a database administrator to enable creation of smaller and faster backups. However, restoring from partial differential backups will take longer than restoring from a partial backup and be more complex, since two separate backup files are restored.
File backup, Filegroup backup
A file or filegroup backup copies one or more files of a specified database individually, allowing a database to be backed up in smaller units: at the file or filegroup level. This can increase the speed of recovery in some instances by allowing users to restore only damaged files without restoring the rest of the database. For example, a database that has several data files located on several disks and one of the disks fails, only the failed disk needs to be restored.
Transaction log backup
Transaction logs in SQL Server 2005 are associated with individual databases. The default configuration of SQL Server 2005 allows the transaction log to grow automatically as needed. The log will grow until it uses all available disk space or it reaches the maximum configured size. It is best to avoid filling the transaction log in a SQL Server environment. If the transaction log is full, data modifications within in that database can be impacted. Other databases are not affected. Transaction logs can be backed up using the BACKUP LOG statement or by using the Backup task in SQL Server Management Studio. Upon successful completion of a transaction log backup, the portion that has been backed up is truncated and available for use by new transactions.
A database can be restored to a certain point in time by applying transaction log backups or differential backups to a full database backup. A database restore overwrites the data with the information contained in the backups. Restores can be performed using SQL Server Management Studio or the RESTORE DATABASE statement. Online restores are supported in SQL Server 2005.
Network Configuration Components
Oracle Net Services supports networked connections between Oracle database servers and their clients. It communicates with the Transparent Network Substrate (TNS) data stream protocol, and allows users to run many different network protocols without writing specialized code.
With Microsoft SQL Server, client communication with a server supports primarily two components which include the following:
• Network Protocol on the Client and Server
• SQL Server Native Client (SNAC)—This is a new technology introduced in SQL Server 2005 which combines the functions of OLEDB and ODBC for SQL Server into one native library. The native client supports Net-libraries and TDS (Tabular Data stream) endpoints for multiple protocols in SQL Server. TDS is the internal client/server data transfer protocol in SQL Server.
SNAC is loaded with the installation of SQL Server 2005 and does not require a separate install. It can be installed separately on machines which do not have SQL Server installed, but requires SQL Server access.
Figure 3: Comparing the network components of Oracle and SQL Server
SNAC configuration can also be performed after the installation of SQL Server using the SQL Server Configuration Manager.
SQL Server Security Model
Whether you are trying to map your Oracle skills and knowledge to SQL Server or trying to migrate Oracle applications to Microsoft SQL Server 2005, it is important to understand how SQL Server implements database security and roles. This section describes SQL Server’s security architecture and features and maps them to Oracle equivalents where appropriate.
Microsoft SQL Server 2005 can use Secure Sockets Layer (SSL) to encrypt data transmitted across a network between an instance of SQL Server and a client application. The SSL encryption is performed within the SQL Native Client Net-Library and applies to all inter-computer protocols supported by SQL Server 2005.
SSL encryption works with instances of SQL Server running on a computer that has been assigned a certificate from a public certification authority. The computer on which the application is running must be configured to trust the certificate’s root authority. (Encryption with a self-signed certificate is possible and described in the next section, but a self-signed certificate offers only limited protection.)
The Net-Library encryption is implemented using the Secure Sockets Layer API. The level of encryption, 40-bit or 128-bit, depends on the version of the Microsoft Windows operating system that is running on the application and database computers.
A login account allows a user to access a SQL Server Instance and its administrative options. It allows connection to the instance but does not carry data access privileges. A login account is typically mapped to a database user name. Accessing data and/or administration functions are controlled via user names.
There is a guest login account that allows users to log in to SQL Server and only view databases that allow guest access. However, inline with security best practices, the guest account is not set up by default and must be created manually if needed.
Database User Name
Each Windows account or SQL Server login must be associated with a user name in each database that the user is authorized to access, or the database must have guest access enabled. Database user names are defined by members of the db_owner (database owner) or db_accessadmin (databases access administrator) fixed database role, and are stored in the sys.database_principals table found in each database.
SQL Server 2005 supports two types of login names which are managed and authenticated using different methods; Windows authentication and SQL Server authentication.
A DBA can specify which Windows login accounts can be used to connect to an instance of SQL Server 2005. Users logged into Windows using these accounts can connect to SQL Server 2005 without having to specify a separate database login and password. When using Windows Authentication, SQL Server 2005 uses the security mechanisms of NTLM or Kerberos to validate login connections, and relies on a user’s Windows security credentials. Users do not need to enter login IDs or passwords for SQL Server 2005 because their login information is taken directly from the trusted network connection. Windows authentication is similar in functionality to the IDENTIFIED EXTERNALLY option associated with Oracle user accounts.
Groups, Roles, Schemas and Permissions
SQL Server and Oracle use permissions to enforce database security. SQL Server statement-level permissions are used to restrict the ability to create new database objects (similar to the Oracle system-level permissions).
SQL Server also offers object-level permissions. As in Oracle, object-level ownership is assigned to the creator of the object and cannot be transferred. Object-level permissions must be granted to other database users before they can access the object. Members of the sysadmin fixed server role, db_owner fixed database role, or db_securityadmin fixed database role can also grant permissions on one user’s objects to other users.
Permissions can be granted, denied or revoked on any SQL Server securable. Granting specific permissions to a principal (logins, users or roles) allows the principal to perform specific tasks. Revoking that permission takes away the ability of the principal to perform the tasks. Deny places an explicit blocker on a securable that prevents a principal(s) from performing specific actions against the securable and always takes precedence over all other permissions.
For example, if user JOE belongs to the SUPERUSER role and the SUPERUSER role has been granted select, insert, update and delete permissions on the contacts table, JOE will be able to perform those four tasks against the contacts table because JOE inherited the permissions from the SUPERUSER role. However, if a DENY is explicitly set against user JOE on the contacts table for all four actions, JOE will not be able to perform any of the actions since DENY always takes precedence.
SQL Server statement-level and object-level permissions can be granted directly to database user accounts. However, it is often easier to administer permissions to database roles. SQL Server roles are used for granting and revoking privileges to groups of database users (much like Oracle roles). Roles are database objects associated with a specific database. There are a few fixed server roles associated with each installation, which apply across all databases. An example of a fixed server role is sysadmin. Windows groups can also be added as SQL Server logins, as well as database users. Permissions can be granted to a Windows group or a Windows user.
A database can have any number of roles or Windows groups. The default role public is always found in every database and cannot be removed. The public role functions much like the PUBLIC account in Oracle. Each database user is always a member of the public role. A database user can be a member of any number of roles in addition to the public role. A Windows user or group can also be a member of any number of roles, and is also always in the public role.
SQL Server 2005 supports a new feature called schemas. A schema is a collection of database entities that form a single namespace. A namespace is a set in which every element has a unique name. This helps in creating a logical separation between the application user and the database objects.
The abstraction of database users from schemas provides several benefits to administrators and developers as listed below:
• Multiple users can own a single schema through membership in roles or Windows groups. This extends the familiar functionality allowing roles and groups to own objects.
• Dropping database users is greatly simplified as it does not require the renaming of objects contained within that user’s schema. Schema owners can also be changed with no impact to existing application code. Thus it is no longer necessary to revise and test applications that refer explicitly to schema-contained objects after dropping the user that created them.
• Multiple users can share a single default schema for uniform name resolution.
• Shared default schemas allow developers to store shared objects in a schema created specifically for a specific application, rather than in the DBO schema.
• Permissions on schemas and schema-contained objects can be managed with a higher degree of granularity than in earlier releases.
• Schemas increases the level of granularity at which permissions are managed. Permissions can be given for a specific user to manage and own objects within a schema as compared to the entire database. For instance, User 1 can be given permissions to modify the objects in the “Sales” schema within a database but the user will be restricted from modifying objects in the “inventory” schema within the same database.
SQL Server 2005 natively supports all encryption infrastructure requirements natively within the database and are fully integrated with a key management infrastructure. SQL Server secures data with a hierarchical encryption and key management infrastructure. Each layer secures the layer below it, using a combination of certificates, asymmetric keys, or symmetric keys. This feature works with Windows Server 2003 or later.
Figure 4: SQL Server data encryption
Each SQL Server instance has a Service Master Key, created automatically during setup, which is encrypted with a Data Protection API (using the credentials of the SQL Server service account) provided by the underlying Windows Server 2003 operating system. Its main purpose is to secure system data, such as passwords used in instance-level settings such as linked servers or connection strings).
The Service Master Key is also used to secure each of the Database Master Keys (protected additionally with a password supplied when a Database Master Key is created). Within each database, its master key serves as the basis for creating certificates or asymmetric keys, which subsequently can be applied to protect data directly or to further extend the encryption hierarchy (i.e., by creating symmetric keys). Creation, storage, and other certificate and key management tasks can be handled internally within SQL Server, without resorting to features of the operating system or third party products. Encryption and decryption is provided by pairs of functions complementing each other such as EncryptByCert() and DecryptByCert().
Data Movement and Tools
Data Export/Import tools
SQL Server 2005 also has a collection of data movement and loading tools. The BULK INSERT and bcp commands (similar to Oracle’s SQL*Loader) have a long history of performing insert operations. DTS was often the tool of choice in SQL Server 2000 when transformations were required on the data despite is relatively basic set of features.
SQL Server 2005 now offers an exciting new tool called SQL Server Integration Services (SSIS). SSIS allows the integration of managed code in the data and control flow tasks to get the precise results needed for transformations and the ultimate destination of the data. It is a true ETL tool that efficiently gets data where it is needed by eliminating the need for multiple staging tables. SSIS uses a “pipeline” architecture and massages the data on its way from the source to its destination. The pipeline may also be a data source for Data Mining so that “real-time” data mining may be implemented.
With SSIS, users now have an industrial strength ETL tool bundled with SQL Server at no additional cost. Oracle does have an ETL tool but it is part of the Oracle Warehouse Builder which is sold as part of the Internet Development Suite.
Similar to Oracle’s transportable tablespaces feature, SQL Server provides a simple way to move databases from one instance to another. The move can be a logical and/or physical move and the difference in execution is minimal and the operations can be performed via SQL Server Management Studio or command line using T-SQL. To move a SQL Server database from one instance to another, you only need to detach the database from one instance and attach it to another. If physical movement is necessary, simply move the datafiles and log files associated with that database to the desired new location before attaching to the new instance.
Moving to different platforms is irrelevant in SQL Server since it only runs on the Windows platform. However, the process to move from one edition or version of Windows to another is done the same way as moving within the same edition or version. Do note that different versions/editions of Windows have different levels of functionality so please be aware of your database’s requirements prior to moving.
Replication is a set of technologies for copying and distributing data and database objects from one database to another and then synchronizing between databases to maintain consistency. Using replication, you can distribute data to different locations and to remote or mobile users over local and wide area networks, dial-up connections, wireless connections, and the Internet. SQL Server 2005 supports all popular types of replication including transactional, peer-to-peer transactional (aka multi-master), merge and snapshot. SQL Server also supports replication with non SQL Server databases including Oracle.
Implementation methods and requirements will differ depending on the type of replication needed but for all practical purposes, these models deliver on most of the combined capabilities provided by Oracle replication and Oracle Streams. Setting up, monitoring and managing replication can be done through SQL Server Management Studio or through T-SQL and some replication system stored procedures.
Microsoft SQL Server 2005 is a comprehensive, integrated end-to-end data management solution that empowers users across the organization by providing them with a secure, reliable, and productive platform for enterprise data and business intelligence (BI) applications. SQL Server 2005 delivers powerful and familiar tools to information technology professionals and information workers.
SQL Server 2005 is designed with goals to increase business value through reduced TCO, ease of use and extensive integration capabilities for information management. SQL Server 2005 also offers various services and infrastructure to deploy Business Intelligence systems with significantly reduced development and integration effort. Further capabilities offered by features like Service Broker and Query Notification allows today’s IT Professionals to design and deploy both classic client/server and web applications, and more advanced web services based applications with service oriented architectures.