09/04/2025

DBeaver Series - Part 1 - Installing and Connecting to Postgres

One of the common complain when you move to open source database is not having proper database management tool like SSMS for SQL server.

These days I'm trying few with Postgres databases. One of them is DBeaver.

DBeaver is a open source universal database management tool (as they call it). It can handle other databases such as MySQL, MariaDB and even commercial products like SQL Server and Oracle.

It has two versions, free community version and pro version. Pro version has lot of enterprise level features such as better security and AI related features. It also provide you with technical support. However for pro edition you have to pay fee like $25 per month or like $500 per yearly (if you choose Ultimate edition). 

You can compare editions here -> https://dbeaver.com/edition/

DBeaver is a desktop application, but it also have a web version and it is called CloudBeaver.

CloudBeaver is a web server which you need to install and configure using the source code they provide. However, demo can be found here -> https://demo.cloudbeaver.io/#/

DBeaver is currently on version 25.0.2.

DBeaver can be download from here -> https://dbeaver.io/download/

For my practices, I'm ok with free community version.

On the installation page, you have the option of choosing windows installer, zip or install using Microsoft Store. Though installation file is 121 Mb, I had tough time downloading it and it was showing over 1h or download time. So I have opted to go with Microsoft store one and seems much faster. It downloaded and installed within 2 minutes.

Once you installed, you are presented with UI like below:


First question it asked me was, do I want to create a simple database to explore features of the DBeaver. Why not? so I said yes.

When I press yes, not sure where or what type of database it created. I was presented with choose connection dialog box.


I have clicked on PostgreSQL and clicked "Next"

Kept all default value from next dialog box (I have already installed Postgres Database on this server with all default values):


Then clicked on "Finished"

After clicking on "Finished" I could see 2 database connections on the "Navigator" pane.


That's when I realized that, sample database was created using SQLite. When click on the "postgres" node first time, it has asked me to download driver files.


I have clicked on "Download" and it has started downloading and installing drivers for postgres. It took about 12 minutes to complete the driver download and installation.

After installation, I have presented with following error:


That's when I realized, I haven't input the password when creating the connection (with all defaults). So I edited the connection (right click -> Choose "Edit Connection"), and entered the password for the postgres server in the box shown below:



Then it was all ok. I can see my postgres database, but I cannot see the custom database I have already created on that server. Not quite sure why. I will have to investigate this.


Same as postgres node, if you click on SQLite node, it will also ask you download the required driver files (first time only).

I will continue to explore this and will blog about it if there anything interesting.

27/03/2025

Introduction to SQL Server Indexes

What are SQL Server Indexes

Indexes in SQL Server servers similar purpose as they do in a book we read.

In a large book, let say a book with 300 pages, which is on technology, we want to read pages written about "Databases". If we don't have an index, we would have to read the entire book to find the pages on "Databases". But if that book contains an index, we can simply read the index and find the page numbers on "Databases" and directly turn to them and read them.

Similar way, Indexes in SQL server provide a method to retrieve data much faster way and serves your query more quickly. For example consider a scenario where SQL server is looking for all the Employees starting with Letter "D" from the Employee table. If Employee table doesn't have an index like our book had, SQL server will have to read the entire table to find employees with name starting with letter "D". But if it had an index on Employee name, it could just read the index and jump straight to employees it require directly and present the result set quickly.

Index is a data structure which is linked with a table. In a way, you can call it small copy of the table, which has data arranged in a different way than the main table. You can have multiple indexes on a single table. Each of these indexes will arrange copy of the table data in a different way, so they can satisfy different different queries.


Type of Indexes

There are several types of indexes.

You can have tables without indexes. When a table doesn't have an index, it is called a "Heap". No order to the data and they are stored as they come.

Clustered Index

You can create a "Clustered" index to a table. When you create a clustered index for a table, table (or the heap) is re-arranged in the order of the index. So basically table become the clustered index, because data in table physically re-arranged to match the index. 

For example think we have an employee data stored in a table without an index (i.e. a heap). Then we create an clustered index on that table, let say on Employee Id column (EmpId).

CREATE CLUSTERED INDEX IX_EMPLOYEE_ID ON Employee(EmpId)

Then we get a table which is physically sorted on employee Id. This is a clustered index. You can only have one clustered index per table, because table data is physically structured on clustered index.
When you specify a Primary Key for a table, SQL server automatically creates an clustered index on that column(s).

Non-Clustered Index

Let say we have lot of queries that based on employee age. So we need to access age very frequently. For example we need employees who are younger than 25 years. But as it stands now, we need to read the entire table to find young employees.
In order to ease the pain SQL server having, we can create an index on Age column, but leave the physical structure of the table as it is. To do this we can use Non-clustered index.

CREATE NONCLUSTERED INDEX IX_EMPLOYEE_AGE ON Employee(Age)

This will create a another structure link to the table. It is kind of small copy of the table, but arrange in a different manner. Though we only specify "Age" column in the index, SQL server always put the primary key of the table on the index, so that it can relate to the row back in the original table.

You can have many non-clustered indexes as you like on a table. However, more indexes you have, more data SQL server will have to add/update/delete when you add/update/delete data from a table. Because SQL server have to update non-clustered indexes as well as the table data (if they are effected by update).

Unique Index

Unique index is a special type of index, it just tell SQL server, column specified cannot have duplicate values. You can have clustered and non-clustered unique indexes. By default primary key constraint create a unique clustered index and unique constraint create unique non-clustered index.

CREATE UNIQUE NONCLUSTERED INDEX IX_EMPLOYEE_EMAIL ON Employee(Email)

This tell query optimizer that data in this email column is unique (i.e no duplicate). This information is used when query optimizer come up with execution plans.

Filtered Index

Filtered index is another special type of non-clustered index, where we only store data frequently required from a column.
For example, if we have "Location" column on Employee table and most queries are based on Location = "UK" employees, we can create a non-clustered index filtered only for employees who are in UK.

CREATE NONCLUSTERED INDEX IX_EMPLOYEE_LOC_FILTERED ON Employee(Location) WHERE Location = 'UK'

Having filtered index is efficient because you don't need to store the entire value collection for that column in the index. This makes index smaller and also easier for optimizer to use.


Other type of Index

There are many other types of indexes in SQL Server. For example:
• Full Text Index
• XML Index
• Spatial Index
• Column Store Index
• Hash index (on in memory OLTP)
We are not going to talk about above in this module.


Index Storage and How they work

As mentioned before, we are mainly focused on row store indexes. Rowstore index are organized into data structure called B+ tree

As we learned in other modules, data in SQL server are organized into 8k pages.
Let's first look at the organisation of clustered index.
Each page in B+ tree structure is called Node.
Top node of the index is called "Root Node".
Very bottom nodes of the index is called "Leaf Nodes".
There can be multiple levels of nodes between "Root Nodes" and "Leaf Nodes" and they are called intermediate levels hence nodes in these levels are called "Intermediate Nodes".
Each of these nodes are doubly linked, which means we can go back and forward from one node to other in same level.

Clustered Index Example:

Let's consider an example index on following Employee table. Let's also assume EmpId is the primary key of the table. When you define a primary key on a table, SQL server automatically, create a CLUSTERED index on the table.

Or you can create a one like below:

CREATE CLUSTERED INDEX IX_EMPLOYEE_ID ON Employee(EmpId)


Let's assume in one 8k page we can only fit 5 records. So at the leaf level, nodes are arranged in following way.


Note because nodes are doubly linked (indicated by arrows), you can traverse (read) from one node to another both forward and backward.

Then we add intermediate level of nodes:

Note that each record in intermediate level is pointer to leaf node (data page) and also intermediate pages are doubly linked, making traversing more easy.
In a complex data table you will have multiple levels of intermediate nodes.

Finally you add the root level to complete the picture:
Note that root node has pointers to intermediate level.


Index Seek Operation:

Let say you want to seek into record 26 - Ethan Hunt. 

SELECT EmpId, Name FROM Employee WHERE EmpId = 26

Following picture shows how the seek path:

As you can see we first come to root, look for 26 and found pointer to first intermediate node, go to first intermediate node, found pointer to second leaf page, go to second leaf page and go to record 26.


Range scan operation:

If we want to get range of records, for example, records from 20 to 40. 

SELECT EmpId, Name FROM Employee WHERE EmpId BETWEEN 20 AND 40

We will traverse index in following way:

We start from the root, find 20s in first intermediate node pointer, go to first intermediate node, find 20s in second leaf node pointer, go to second leaf node find record 21 (because no 20), and start reading from there and read on until the end of the page, then using the link to next leaf page, move to next leaf node and read on until record 38.

Non-clustered Index

Non-clustered index use the same B+ tree structure, but leaf nodes are actually in heap or in index. Let's consider a scenario where we want to create a non-clustered index on same Employee table as above. We will create a non-clustered index on "Name" field.

CREATE NONCLUSTERED INDEX IX_EMPLOYEE_NAME ON Employee(Name)

Leaf level of the non-clustered index will be like below:


Note that, now index rows are sorted on name column and have a pointer to the primary key of the clustered index (i.e. EmpId). In addition to that, non-clustered index can have included columns. For example you can include "Age" column in the index. If you include value of the age will be stored in the non-clustered index. Since non-clustered index carry less columns than the table, one page will be able to hold more rows than the clustered index. This is illustrated in above picture by showing 6 rows in single page.

In above diagram it shows, that how first two rows of the non-clustered index is pointing to its related row in the clustered index (table). For clarity we have only showed first two rows, but all rows in the non-clustered index are pointing to related row in clustered index.

Non-clustered index also have intermediate and root nodes.


Index Seek Operation with Non-clustered Index

For example, let say we want to retrieve record for "Oscar White". If we use non-clustered index, we will have to scan the entire table/index. But if we use non-clustered index on "Name" column, we can directly go to Oscar White as shown on the picture below:

To seek "Oscar White", SQL server start from root node and from root node, it figure out records starting with O are in second intermediate node. Therefore it traverse to second intermediate node. From the second intermediate node, SQL server finds out Os are third leaf node, therefore it goes there and fetch the record. If query required only name and Id, non-clustered index itself will be able to satisfy the query and operation completes. However if query require additional fields which are not in non-clustered index, SQL server can use the pointer in the non-clustered index to go to clustered index to fetch additional columns (this is called key lookup operation).


Covering Index

Covering Index is an index which has all columns that required to satisfy a query without referring to base table. Because covering index satisfy a query by itself, IO requirement to serve the query is less. If all fields to satisfy the query is not in non-clustered index, SQL server has to do key lookups and go to clustered index to pick missing fields.

For example let's consider a query like below:

SELECT EmpId, Name, Age 
FROM Employee
WHERE Name = 'Hanna Lee'

This query looks for employee name "Hanna Lee" and need to find her age as well.

If we have a non-clustered index like one shown in above section (see Non-Clustered Index section), we can find the record easily by seeking to "Hanna Lee" using the index, but since index doesn't have "Age" field, SQL server need to do a key lookup (using the pointer in non-clustered index to clustered index) and get "Age" field value from clustered index.

What if we include the Age column in the index like below:

CREATE NONCLUSTERED INDEX IX_EMPLOYEE_NAME ON Employee(Name)
INCLUDE (Age)

This is a non-clustered index on Name column, but additionally it has included "Age" column. Note that "Age" column is not part of the index key. Therefore, index will not be sorted on Age column. It is just part of the index leaf/data rows.

When query runs SQL server will use our IX_EMPLOYEE_NAME index. It will do a seek on to Hanna Lee's record as shown above. Query need Age value as well, luckily "Age" value is also in the non-clustered index, so SQL server can return the result, without even touching the clustered index.

This module only teach you very basics of the indexes. We will hope to look into move advanced index related topics in next modules.


















09/03/2025

Restoring Encrypted SQL Server DB Backup on different Server

Last week, when I was tasked with moving old server to new server, I was faced with an error that I wasn't familiar. I got this error while I was trying to restore a database backup from old SQL server to new SQL server. Error message told me that database is encrypted and there fore I cannot restore on the new server.

Further analysis showed me that on the old server database was encrypted using TDE.

As a side note, TDE is a security feature in SQL Server that encrypts the database at rest (when it saved on disk). It protects data files by encrypting them on disk, ensuring that even if someone gains access to the database files, they cannot read the data without the proper decryption keys. When a database use TDE, backup files also encrypted.



Above illustration from Microsoft Learn website shows the architecture of the TDE.

Therefore, the first thing we need when restoring this encrypted database on the new server is Database Master Key (DMK).

You can do this my running following TSQL on the new server:

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword';

Next, we need certificates. However, creating new certificate on new server won't allow us to decrypt the backup from old server. There fore we need certificate from old server. To be precise, you need backups of the certificate that encrypted the Database Encryption Key of the database, so you can restore it on new server. 

That is why it is vital that you backup your certificates in a safe place (e.g. online data store or vault) right after you create them. Because if your server get crashed, you will never able to restore databases even you have backups of them, without the certificate (if they are TDE encrypted).

Assume you don't have the certificate backed up (or you don't know where they are because some one else has done them and that knowledge is not available to you now). In that case if you still have access to the old server (like in my case above), you can still generate the certificate backup.

First, you need to know the name of the certificate. To get the name of the certificate you can run following TSQL:

SELECT db_name(database_id) AS DatabaseName, c.name AS CertificateName
FROM sys.dm_database_encryption_keys dek
JOIN sys.certificates c ON dek.encryptor_thumbprint = c.thumbprint;

Now you know the name, you can back it up:

USE master;
BACKUP CERTIFICATE MyBackupCert 
TO FILE = 'C:\Backup\MyBackupCert.cer'  
WITH PRIVATE KEY (
    FILE = 'C:\Backup\MyBackupCert.pvk',
    ENCRYPTION BY PASSWORD = 'StrongPassword123'
);

Note that the password you use to generate the master key and this doesn't need to match.

Above query will generate two files:
  • Certificate file (with cer extension)
  • Private Key file (with pvk extension)
Move them to the new server, and also make sure you have noted down the password you used.

Then, create the certificate on the new server:

USE master;
CREATE CERTIFICATE MyBackupCert
FROM FILE = 'C:\Backup\MyBackupCert.cer'
WITH PRIVATE KEY (
    FILE = 'C:\Backup\MyBackupCert.pvk',
    DECRYPTION BY PASSWORD = 'StrongPassword123'
);

Note that you need to use the same password you use to backup the certificate and private key file.

When you restore the certificate from above TSQL, it get encrypted with the master database key of the new server, there fore complete the chain that showed in the TDE architecture diagram.

Now you are ready to restore your database to new server.





28/02/2025

Question on C# version on VS Projects

I have recently inherited a Visual Studio 2022 project, it is a C# Console application on .Net Framework 4.8. So I thought not a very old project what can go wrong.

Well, when I worked on the project, only into first few hours, I have faced following error:

Feature 'nullable reference types' is not available in C# 7.3. Please use language version 8.0 or greater.

I got this error because I tried to use Nullable type in my new code I was developing. But I was thinking, why would this project is using C# 7.3? At the time of writing latest version of the C# was version 13, which was released with .NET 9.0 in 2024. So how come C# version used in relatively new project has taken version 7.3?

To find the answer I have turned to ChatGPT and web. This is what it had to tell:

The default version of the C# depends on the the target framework of your project. Here is the default versions for .NET Framework 4.6.2 and 4.8:

 .NET Framework Version

 Default C# Version

Maximum Supported Version

 .Net Framework 4.6.2

 C# 7.0

 C# 7.3

 .Net Framework 4.8

 C# 7.3

 Latest version as of now


That explained lot. This project was created on .Net Framework 4.8, so it has defaulted to C# version 7.3.

Next question pop-up to me was, then how can I upgrade C# version on this project to more latest version, so I can use new features in the language in my code?

Answer is, you need to manually edit the .csproj file (i.e. Visual Studio project file). You need to insert something similar to below:

<PropertyGroup>
  <LangVersion>8.0</LangVersion>
</PropertyGroup>

If you want to use latest version of the language forever, you can do following:
<PropertyGroup>
  <LangVersion>latest</LangVersion>
</PropertyGroup>

However one of the thing to consider is, although .Net Framework 4.8 support latest version of the language, some of the features require runtime support which is only available in Framework. For example async streams, interface methods require runtime support of .Net Core/.NET 5+.

Unfortunately, there is no setting to tell on Visual Studio project to tell, which version of C# language in use. So if your csproj file doesn't have above property, you need to infer that your project will use the default version of the Framework as the language version.

If you are using more modern Frameworks, here are the default C# versions go with them:
  • .NET 8 => default to C# 12
  • .NET 7 => default to C# 11
  • .NET 6 => default to C# 10

IIS Application Pool Recycling Settings

Recently I had chance to look into IIS's capability to auto recycle app pool. Because one of our web app was hogging the whole server claiming all the memory it can get when it working at its peak. 

When hosting applications on IIS (Internet Information Services), managing application pool recycling is crucial for ensuring stability and performance not only on the target app, but for whole server. Recycling helps refresh the application environment by periodically restarting the worker process to prevent memory leaks, releasing un-wanted memory, resource locking and unexpected issues. 

Recycling Settings in IIS App Pools

Recycling settings are per App Pool. There fore you can configure each pool differently. These settings can be found in Advanced Setting dialog. Right click on the desired Application Pool and go to "Advanced Settings" dialog.

There are lot of "Advanced Settings" therefore, you might have to scroll down to see these settings, they are at the right below the dialog.


"Disable overlapping recycle", "Disabling Recycling for Configuration Changes" (blog done for this), "Generate Recycle Event Log Entry" are settings which are relate to recycling but not something allow us to control recycling times. So we skip them for bit.


1. Regular Time Interval (Fixed Interval Recycling)

  • Setting Name: Regular Time Interval (in Minutes)

  • Default: 0 - means application pool will not recycle regular time intervals

  • Description: Automatically recycles the worker process after a specified time interval.

  • Use Case: Useful for applications that need periodic refreshing to prevent performance degradation. However, setting this too frequently can disrupt user sessions. For example you can set your application pool to restart every 12 hours (720 minute), when you know these 12 hour intervals are not its peak times.


2. Specific Time(s) (Scheduled Recycling)

  • Setting Name: Specific Times

  • Description: Allows recycling at predefined times during the day.


  • Use Case: Ideal for scheduled maintenance windows. For example, in above screen shot we recycle application pool at 00:05 mid night, then 3:05AM (which is probably after maintenance job), then 6.30AM (just before users try to access), 12.45PM (mid day when traffic is low when every one at lunch), 5.30PM (when users logging off), 7PM (before we kick off maintenance tasks).

3. Memory-Based Recycling

a) Private Memory Limit

  • Setting Name: Private Memory Limit (KB)

  • Description: Recycles the worker process when its private memory consumption exceeds a specified threshold.

  • Use Case: Helps prevent excessive memory usage due to memory leaks in applications. For example, if an app is expected to use a maximum of 2 GB, setting a limit slightly above (e.g., 2.5 GB) ensures it gets recycled before causing server slowdowns.

b) Virtual Memory Limit

  • Setting Name: Virtual Memory Limit (KB)

  • Description: Recycles the worker process when its virtual memory usage exceeds a specified limit.

  • Use Case: Less commonly used but can help control applications with excessive virtual memory allocations. Similar to private memory limit, but include shared memory space as well.


4. Request-Based Recycling

a) Request Limit Recycling

  • Setting Name: Request Limit

  • Description: Recycles the worker process after processing a specified number of requests.

  • Use Case: Useful for high-traffic applications where a certain number of requests may cause the application to degrade. For example, if an API handles millions of requests daily, recycling it after every 500,000 requests can help maintain performance.


There are other settings such as "Idle Time" (cause app pool to terminate or suspend when there are no requests), "CPU Limit" which can also effect the application pool recycling/terminating behavior, but we will discuss this in a separate blog.



Best Practices for Application Pool Recycling

Although above helps your recycle the claim back any un-used or over used memory, this process need to be done with care. Because Frequent recycling can disrupt user sessions and degrade performance.

Always try to recycle during off-peak hours. This will minimize disruptions to user session. 

There are settings in "Generate Recycle Event Log Entry" section which allows your to control what recycle action should be logged in Windows Event log. Use this to your advantage and track when is the actual recycling is happening by monitoring the logs for period of time. This will give more insight and use this insight to adjust the recycling settings.






30/01/2025

Azure Map Routing

Azure map, replace earlier mapping technology provided by Microsoft which was called "Bing Maps".

Recently I had chance to look into Azure map routing feature (calculating route between two points) for my work. There fore, I thought I should write some blog about this new mapping technology by Microsoft.

My requirement was to calculate routed distance between two post codes in UK. In order to do this I had to use Azure Map routing APIs.

Before you start, first thing you must do is create a Azure Map account in your Azure portal.

In Azure portal, market place, search for "Azure Maps".


Then click on "Create" button under that. Fill the basic information for the account as shown in following screen shot.


If you wish you can fill other options as well, but for simplicity, I will keep them default.

In order to authenticate for Azure Maps, you can use either Microsoft Entra ID (active directory) permissions or can use shared key. For simplicity I will be using Shared Key Authentication.


Once account is created, go into that resource and go to Settings >> Authentication section.

You will see a section like above. Copy the Primary Key (or secondary key, either will work) and store in a safe place.


Now that we have an Azure Map and shared key to use it, we will now go and build the demo app for routing.

Console APP:

To keep it simple I'm going to create .Net core console application:


We are going to use Azure Maps routing client library for .NET.
https://www.nuget.org/packages/Azure.Maps.Routing

There fore install above nuget package to the newly create project.
NuGet\Install-Package Azure.Maps.Routing

Note that package is in pre-release as of now (25/01/2025), so you need to tick "Include pre-releases" box on the package manager window.

Initiate route client object:

//Create a MapsRoutingClient that will authenticate through Subscription Key (Shared key)
AzureKeyCredential credential = new AzureKeyCredential("<My Subscription Key>");
MapsRoutingClient client = new MapsRoutingClient(credential);

Replace the "My subscription key" part with your key from the Azure Maps Account. It is good idea to put this in configuration file and read it from there.

Note that I'm using shared key authentication, which is the simplest form of authentication, but you can use AAD (Azure Active Directory) authentication also. When using AAD authentication, you configure permissions on Azure Maps account for each AAD user.

Now I want user to input two post code (source and destination), so lets do the code for that:

            Console.WriteLine("Enter the start postcode:");
            string startPostcode = Console.ReadLine();

            Console.WriteLine("Enter the end postcode:");
            string endPostcode = Console.ReadLine();

Now we need to convert post code to geo coordinates (latitude and longitude) before sending them to Azure Maps API.

Simplest way to get geo coordinates from post code is use service like postcode.io. Postcodes.io is a free, open-source API specifically designed for UK postcodes.

Let's create a helper class to help with getting geo coordinates. Let's call it "PostCodeGeocoder".




First part of the class is setting the base address for postcode.io api.

        private static readonly HttpClient _httpClient = new HttpClient
        {
            BaseAddress = new Uri("https://api.postcodes.io/")
        };

Then we have two inner classes to hold the latitude and longitude:

        public class PostcodeResult
        {
            [JsonPropertyName("result")]
            public ResultData? Result { get; set; }
        }

        public class ResultData
        {
            [JsonPropertyName("latitude")]
            public double Latitude { get; set; }

            [JsonPropertyName("longitude")]
            public double Longitude { get; set; }
        }

(Note that they have Json attribute to deserialize them correctly)

Code next to that is the heart of the class which brings the coordinates from postcode.io:

        public static async Task<(double Latitude, double Longitude)> GetCoordinatesAsync(string postcode)
        {
            try
            {
                var response = await _httpClient.GetAsync($"postcodes/{postcode}");
                response.EnsureSuccessStatusCode();

                var content = await response.Content.ReadAsStringAsync();
                var result = JsonSerializer.Deserialize<PostcodeResult>(content);

                return (result.Result.Latitude, result.Result.Longitude);
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error fetching geocode for postcode {postcode}: {ex.Message}");
                throw;
            }
        }

Last part call the above method for start and end post codes:

        // Fetch coordinates for two postcodes
        public static async Task<(double StartLat, double StartLng, double EndLat, double EndLng)> GetCoordinatesForTwoPostcodesAsync(
            string startPostcode, string endPostcode)
        {
            var startCoords = await GetCoordinatesAsync(startPostcode);
            var endCoords = await GetCoordinatesAsync(endPostcode);

            return (startCoords.Latitude, startCoords.Longitude, endCoords.Latitude, endCoords.Longitude);
        }

Now let's use our helper class in main program:

                var result = await PostcodeGeocoder.GetCoordinatesForTwoPostcodesAsync(startPostcode, endPostcode);

                Console.WriteLine($"Start Coordinates: Latitude = {result.StartLat}, Longitude = {result.StartLng}");
                Console.WriteLine($"End Coordinates: Latitude = {result.EndLat}, Longitude = {result.EndLng}");

Ok. now we got all our preliminaries.

Route matrix is as it sounds, is a 2D matrix (array).
You can enter as many origins (starting points) and as many as destinations (end points) to it. For example, if you enter 2 origins and 2 destinations, you will get 4 resulting routes.

For simplicity we will use one origin pint and one destination:

                // prepare our geolocations for origins and destinations (we just use one, but can use many as you like)
                List<GeoPosition> origins = new List<GeoPosition>() { new GeoPosition(result.StartLng, result.StartLat) };
                List<GeoPosition> destination = new List<GeoPosition>() { new GeoPosition(result.EndLng, result.EndLat) };

Then you can instantiate route matrix:

                // Instantiate route matrix query
                RouteMatrixQuery routeMatrixQuery = new RouteMatrixQuery
                {
                    Origins = origins,
                    Destinations = destination
                };

If you prefer you can override default options for route matrix by using RouteMatrixOptions class:

            // Instantiate route matrix options
            RouteMatrixOptions routeMatrixOptions = new RouteMatrixOptions(routeMatrixQuery)
            {
                RouteType = RouteType.Economy
            };

RouteMatrixOptions class allows you to configure your route, how you want to travel and route output. For example "TravelMode" define which travelling mode (car, van, buts, walk) you will be using.

See class reference for more detail.

You can call Route Matrix request synchronously and asynchronously. 

If you using synchronous mode, matrix element count should be less than 100.
I.e. : origins count * destination count should be <= 100

Calling Synchrounously:
Response<RouteMatrixResult> reqResult = client.GetImmediateRouteMatrix(routeMatrixQuery);

Calling Synchronously (with options):
Response<RouteMatrixResult> reqResult = client.GetImmediateRouteMatrix(options);

You can use asynchronous calling if the matrix has more than 100 elements. However, asynchronous method also only support 700 elements maximum.

Calling Asynchronously:

// Invoke an long-running operation route matrix request and directly wait for completion
GetRouteMatrixOperation reqResult = client.GetRouteMatrix(WaitUntil.Completed, routeMatrixOptions);

var routeResults = reqResult.Value;

Above will wait for completion and output results.

Printing results:

                // Route matrix result summary
                Console.WriteLine($"Total request routes: {0}, Successful routes: {1}",
                    routeResults.Summary.TotalRoutes,
                    routeResults.Summary.SuccessfulRoutes);

                // Route matrix result
                foreach (IList<RouteMatrix> routeResult in routeResults.Matrix)
                {
                    Console.WriteLine("Route result:");
                    foreach (RouteMatrix route in routeResult)
                    {
                        RouteLegSummary summary = route.Summary;
                        Console.WriteLine($"Travel time: {summary.TravelTimeInSeconds} seconds");
                        Console.WriteLine($"Travel length: {summary.LengthInMeters} meters");
                        Console.WriteLine($"Departure at: {summary.DepartureTime.ToString()} meters");
                        Console.WriteLine($"Arrive at: {summary.ArrivalTime.ToString()} meters");
                    }
                }

Getting Asynchronous results later:

Results are saved in Azure for 14 days. There fore you can access results later using the operation id. You need to save this operation id if you are going to use this method.

// Invoke an async route matrix request and get the result later via assigning `WaitUntil.Started`
GetRouteMatrixOperation operation = client.GetRouteMatrix(WaitUntil.Started, routeMatrixOptions);

// Get the operation ID and store somewhere
string operationId = operation.Id;

Then you can use the operation id to retrieve results.

// Within 14 days, users can retrieve the cached result with operation ID
// The `endpoint` argument in `client` should be the same!
GetRouteMatrixOperation newRouteMatrixOperation = new GetRouteMatrixOperation(client, operationId);
Response<RouteMatrixResult> reqResult= newRouteMatrixOperation.WaitForCompletion();

Results:
You will get result like below when you run the application:


Note that travel length is in meters, if you want in miles, you will have to do the conversion. Unlike BingMaps there is no option to get results in specific units.


Used following as a reference: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/maps/Azure.Maps.Routing/samples/RouteMatrixSamples.md


10/01/2025

Introduction to SQL Server Statistics - Tutorial

Wishing you all, my loving readers, Happy New Year 2025!

This is the first blog for the year 2025. As a new initiative, I'm going to (try to) write some tutorials on some selected features of the SQL server. This is my own take of these features, and written using knowledge I acquire over the years. Hope this will give something for the society. I'm also hoping to create a Youtube video associate to this. When I done, I will update the page to link to that.

Introduction to SQL Server Statistics

What are SQL Server Statistics

SQL Server statistics are mainly used by Query Optimizer, therefore before we talk about statistics, we need to know little about how SQL Server Query Optimizer works.

When running a query, there can many ways to execute it (different ways to get data, different ways to join them, etc.). Therefore SQL server need a plan before executing a query. We call it Query Plan.

Since there are can be many combinations of ways to execute a query, there can be lot of plans that optimizer can come up for one query.

Therefore, need to find the optimized plan, the best plan (or rather good enough plan …). Optimizer only have limited time to come up with a plan. It has to finish and present the plan quickly as possible. In this process optimizer, will try to find good enough plan quickly as possible. Hens, it need all the help it can get. 


In order to create the optimal plan, Query Optimizer analyse the query and then it need to know meta data about data that query is looking for.

For example query can be looking for total number of orders for one particular customer placed during a date range. To create plan for this query, optimizer need to look into orders table and customer table. 

It need to know roughly how many orders are placed during the given time period. Because depend on the number of rows we might need to do different things to aggregate the data. For example, if we find 100 rows, we can aggregate one way and if find 1M rows, we might not be able to use the same approach. 

It also need to allocate memory for query to run. In order to estimate amount of memory to allocate, it need to estimate how many rows it is dealing with.

It cannot count rows, because that will take lot of time. Optimizer need to report back to SQL server quick as possible.

So it need to find out how data is distributed on the order table. We call this cardinality estimate. This is where statistics come to play. Statistics hold how data is distributed in a table/column.

Creation of SQL Server Statistics

Statistics are create on indexes and columns. It can be on single column or combination of columns.
Two types of statistics
• Statistics create for index (can be multi columns)
• Statistics created for columns (only created on single column)

Statistics have 3 components
• Header -> meta information about statistics
• Density -> mathematical construct of the selectivity of column or column
• Histogram -> show how data is distributed

You can see statistics information by running following command

DBCC SHOW_STATISTICS(MyTable, MyIndex);


Let's create "Employee" table, which have Employee Id, Age, Salary and Gender as an example:
CREATE TABLE Employee (
    EmpId INT PRIMARY KEY, -- Primary Key column
    Age INT,    -- Integer column for age
    Salary NUMERIC(18,2),  -- Numeric column for salary with precision 18 and scale 2
    Gender VARCHAR(10),-- Gender of the employee
);

We will populate the table with some random data:
-- Insert 1000 random rows into the Employee table with salary rounded to the nearest 500
SET NOCOUNT ON;

DECLARE @i INT = 1;

WHILE @i <= 1000
BEGIN
    INSERT INTO Employee (EmpId, Age, Salary, Gender)
    VALUES (
        @i,  -- EmpId (using sequential numbers for simplicity)
        FLOOR(RAND(CHECKSUM(NEWID())) * (70 - 18 + 1)) + 18,  -- Age: random number between 18 and 70
        ROUND(CAST((RAND(CHECKSUM(NEWID())) * (100000 - 15000) + 15000) AS NUMERIC(18,2)), -2),  -- Salary: rounded to nearest 500
        CASE 
            WHEN RAND(CHECKSUM(NEWID())) < 0.4 THEN 'MALE'        -- 40% chance for MALE
            WHEN RAND(CHECKSUM(NEWID())) < 0.8 THEN 'FEMALE'     -- 40% chance for FEMALE
            WHEN RAND(CHECKSUM(NEWID())) < 0.9 THEN 'DUAL'       -- 10% chance for DUAL
            ELSE 'UNKNOWN'                                       -- 10% chance for UNKNOWN
        END -- Gender
    );

    SET @i = @i + 1;
END;

Create indexes:
CREATE INDEX IX_Employee_Age ON Employee(Age);

-- Create an index on the 'Salary' column
CREATE INDEX IX_Employee_Salary ON Employee(Salary);

-- Create an index on the 'Gender' column
CREATE INDEX IX_Employee_Gender ON Employee(Gender);

Let's check out stats now:
DBCC SHOW_STATISTICS(Employee, IX_Employee_Age);



Let's deep dive into each section in the statistic data structure:

Header
Header contains name of the statistics, when it was last updated, row count AT THE TIME of the statistic creation
• Name -> name of the index
• Updated -> when this was last updated
• Rows -> Number of rows at the time of the statistic creation/update
• Rows Sample

Density
How much variety (selectivity). Unique index has very high selectivity.
Density is a measure that provide insight into the selectivity of a column or combination of columns in an index

Density = 1 / Number of Distinct Values

Let's have a look at density data for Age index:



There are 53 distinct age values in Employee table (in the example data I have, this might be slightly vary on yours). So Density = 1/53 => 0.0188679245283019
But if you add primary key EmpId to that column, Density is => 1/1000 (because empid is unique in that table and we have 1000 rows)

If we look at density data for Gender index (where only 4 possible values)



So high density means less selectivity, optimizer will use this data to choose the correct indexes.
"Average Length" column in this vector shows average of total length of fields in consideration. E.g. in above Gender column in 5 characters in length.

Histogram

Let's take a look at histogram for the Salary Index: 

DBCC SHOW_STATISTICS(Employee, IX_Employee_Salary);

SELECT * FROM Employee WHERE Salary BETWEEN 80600.00 AND 81500.00
ORDER BY Salary

Let's take a look at data for 81500 bucket from the table



Histogram is the most used information in statistics.
Histogram in SQL server statistics is a data structure which describe the distribution of data in a column or index
• Histograms main structure contains up to 200 steps called, buckets
• If NULL values are allowed there is a special bucket for NULLs so 201 buckets
• If number of rows are below 200, steps will be lower than 200

• RANGE_HI_KEY -> Highest value in the range (81500)
• EQ_ROW -> Rows that equal to highest value (1 for above example for 81500 bucket, but 2 for 80600 bucket)
• RANGE_ROWS -> Number of other values between high value and previous high key (i.e. 80600) -> in above example 7 (80800, 80900, 81100, 81300, 81400)
• DISTINCT_RANGE_ROWS -> distinct values in range rows.
• AVG_RANGE_ROWS -> RANGE_ROWS/DISTINCT_RANGE_ROWS (i.e. 7/5 = 1.4)

Note that values are considered from high-key (bottom to top)

Creation of Statistics

Indexed Columns (Rowstore):
Creation of statistics in indexed column is automatic and cannot be turned off.

Non Indexed Columns
By default SQL server create stats for non-indexed columns if they are used as filtering columns in queries.
But you can turn this off

In order to see auto create stats is turned on or off

SELECT DATABASEPROPERTYEX('<<YOUR DB>>, 'IsAutoCreateStatistics')

OR

SELECT is_auto_create_stats_on 
FROM sys.databases 
WHERE name = 'YourDatabaseName'

In order to enable or disable auto create stats:
ALTER DATABASE <<your database>> SET AUTO_CREATE_STATISTICS ON/OFF

These auto-created statistics are single-column only and are named in the format: _WA_Sys_columnname_hexadecimal

To see created stats:

SELECT * FROM sys.stats 
WHERE object_id = OBJECT_ID('<<your table name>>')

SELECT * FROM sys.stats 
WHERE object_id = OBJECT_ID('dbo.Employee')



In this screenshot "PK__Employee_..." is the stats for primary key index, which will be created automatically and cannot be turned off. Same with all IX stats, those are stats for non-clustered indexes.

_WA_Sys one is the one created automatically.
Why they are prefix with _WA? Well there is no official statement from Microsoft, so some believe, it is stands for Washington, where Microsoft headquarters are in.

Currently there are no user created stats in there.

If you want to create statistics on a column (which is not normally recommended):
CREATE STATISTICS <<statistics name>>
ON <<table name>> (<<column name>>)
WITH [Options];

CREATE STATISTICS _mpa_Employee_Age
ON dbo.Employee (Age)
WITH FULLSCAN;

If we check stats for Employee tables now:



Note that, new manually created stats has "user_created" field set to true.

Update Statistics

SQL server update stats automatically. However, there might be occasions where updating statistics manually will give optimzer better advantage.

You can update stats using following T-SQL:
UPDATE STATISTICS <<table name>> <<stats name>>
[WITH [FULLSCAN | SAMPLE n PERCENT | SAMPLE n ROWS]];

Examples:
This update stats for all statistics in a table
UPDATE STATISTICS dbo.Employee

Alternatively, you can create management task to maintain statistics.



You can update statistics for all tables using following SQL:
EXEC sp_updatestats;

Remove Statistics

Though we don't really want to remove stats, in case it is required:

DROP STATISTICS <<table name>>;

DROP STATISTICS dbo.Employee

DBeaver Series - Part 1 - Installing and Connecting to Postgres

One of the common complain when you move to open source database is not having proper database management tool like SSMS for SQL server. The...