SharePoint 2010: Storing Documents on the File System with Remote Blob Storage
19 April 2011
by Damon Armstrong
Remote BLOB Storage (RBS) is a means of storing the large unstructured data outside of the SharePoint database, by using a BLOB provider. It can provide some very useful options if the size of your SharePoint database is becoming unmanageable, but shouldn't be seen as a general storage panacea.
You may have heard rumors that SharePoint 2010 has the ability to store files on the file system, instead of the content database. If the prospect of moving files out of the database intrigues you, it is probably because your SharePoint content database is beginning to get so large that backups are becoming difficult, and SharePoint is giving you warnings about exceeding recommended content database sizes. Rest assured, moving files out of the database is possible in SharePoint 2010, using a SQL Server 2008 technology known as Remote Blob Storage (RBS).
I recently had to configure RBS on a SharePoint 2010 server, and the most difficult part of the process was piecing together a full understanding of Remote Blob Storage from scattered blog posts and TechNet articles. At the time, I could not find a complete overview of RBS that outlined what it was, how it worked, whether or not it was right for my situation, or how you to get it running on a SharePoint 2010 server. I hope this article will address these issues and save you a lot of time if and when you come to implement RBS in your environment.
Why you might need RBS
If RBS is the solution, let's talk about the problem. SharePoint is a document management system, but not just any ordinary one. Microsoft would also like you to know that SharePoint can make your organization highly collaborative, searchable, fix every problem you have, and bake you cookies.
Microsoft hyperbole aside, however, SharePoint is basically a document management system and one of its core features is that it stores document, lots of documents, and it stores all of those documents and document versions in an ever-expanding content database.
Databases are optimized for structured data that is relatively small – about 8KB per row. At that rate, you can put 130,000 packed-out rows into a database and still have a database under 1 gigabyte. Generally speaking, smaller databases are easier to backup, restore, and maintain, so keeping them small is good. Documents, however, tend to be orders of magnitude larger than the average database row size and can quickly bloat the database; a content database upwards of 100 gigabytes is by no means uncommon and that size is comprised mostly by the physical files that reside inside the database.
When your content database size starts to expand to the point of unmanageability, one of the first questions that may come to mind is why the heck are all those files in there in the first place? It makes perfect sense to store the structured document metadata like the author, location, keywords, document type, and file size directly in the database, for querying purposes, but the physical file itself is just taking up room until someone goes to download it. So the option to move files outside of the database and back into the file system, where they have traditionally resided, seems very sensible.
What is a Blob?
As much as you want your documents outside of the database, the folks who make databases really don't like them being in there either. Documents tend to represent unstructured data that makes it difficult to use them in querying, defining relationships, indexing, or any other database operation that normally makes putting data into a database useful. As far as the database is concerned, a document is just a bunch of bits that it has to hold onto until someone asks for them back.
Logically, storing a document in a database row looks something like as shown in Figure 1.
Figure 1: Logical Representation of Document Storage in a Database
Databases performance is optimized for rows that are under 8KB in size. If the actual implementation looked as shown in Figure 1, that size would easily be breached and performance would lag.
Fortunately, database developers came up with the concept of a Binary Large Object (BLOB). In essence, the BLOB is a pointer that stores the location of binary data. The pointer, which is relatively small, resides with the normal database record. The binary data, which is comparatively huge, resides in some dark, magical location in the database called BLOB storage.
Figure 2: Actual Representation of Document Storage in a Database
With the binary data offloaded to another location in the database, the database record becomes much smaller and more optimal for index and query performance. If we actually need the binary data associated with the row, the database uses the pointer in the row to acquire the binary data from BLOB storage. The process of going to BLOB storage is completely transparent, so it appears as though the data is coming directly from the row itself. There is a slight performance cost for going to BLOB storage for raw data retrieval, but the query performance benefits tend to outweigh the retrieval cost.
What is Remote Blob Storage?
BLOB storage is effectively abstracted from database users. You ask for it, the database goes out and gets it. This brings up a new question: why does the BLOB storage have to be in the database? In SQL Server 2005 it had to be in the database simply because that's how they built it. But in SQL 2008, you've got a few more options because Microsoft implemented a FileStream provider model for the BLOB storage mechanism that opens up the possibility of storing it elsewhere, as shown in Figure 3.
Figure 3: Remote BLOB Storage Provider Mechanism
Remote BLOB Storage is the term used to denote that a BLOB provider stores BLOB data outside of the database (i.e. remotely).
Where Can I Store Remote Blob Data?
A marketing guy would eagerly tell you that you can store BLOBs anywhere you want, but the reality is that you can store it anywhere you want as long there is a provider available capable of storing it there. If you're really ambitious you can build your own (not recommended), but the majority of us are just going to use what comes out of the box, or whatever is available from third party providers. Out of the box you get two providers:
· Native SQL BLOB Provider (Default)
Stores BLOB data directly in the database, just like SQL Server 2005. This is the default BLOB provider whenever you setup a new instance of SQL server.
· FileStream BLOB Provider
This is the only Remote BLOB provider that ships with SQL Server 2008 and it allows you to store BLOB data on any local hard drive on the SQL server.
Please note that you cannot use the FileStream BLOB provider to store data on a network share or a mapped drive. It has to be a physical disk on the server. Many people are disappointed when they hear this because they're running out of space and want to offload the file data to a file server. The reason behind this is because file shares are not very conducive to ACID operations that make transactions in database "safe".
There is also a rumor that they are working on a feature inside the FileStream provider that will allow you to store BLOBs from one database in another remote database, though this will only be available in the Enterprise edition of SQL Server 2008 R2 and has yet to be released (at least at the time of this writing).
If you have your heart set on storing files on a SAN, network attached storage (NAS), or even the cloud then you'll have to look to a third-party provider to make it happen. Here is a brief list of third party providers and links to their websites:
· AvePoint – RBS provider for network shares, FTP, and cloud storage. This is also a FREE provider!
· EMC2 – RBS provider for Centera storage devices.
· NetApp – RBS provider for the NetApp SnapManager 6.0 product line.
· OpenText – RBS provider that supports Hitachi Data Systems, StorageTek, Network Appliance, EMC Centera, HP, IBM, and Sun Microsystems.
· Storage Point – RBS provider for SAN, NAS, and cloud storage.
What are the drawbacks of Remote BLOB Storage?
One of the major side effects of RBS is that you cannot use database mirroring when RBS is enabled. Clustering and log shipping are still supported, but if you currently use or plan to use mirroring then RBS is probably not an approach you want to consider.
You should be aware that individual RBS providers may or may not support native SQL backups. Fortunately, the FileStream RBS provider supports native SQL backups, but others may not, so you need to read the documentation of your RBS provider to ensure that your backup strategy is getting everything you need. SharePoint backups, however, will include any files stored in BLOB storage regardless of whether the RBS provider supports native SQL backups.
You should also be very wary of the fact that data is being stored in a location outside of the database and any implications that may have for security and reliability. With the FileStream provider, for example, BLOBS are stored directly on the file system. You can actually navigate right to them using Explorer and delete them, which of course would be bad. So with the FileStream RBS provider you need to make sure that the appropriate security is in place to protect your blob data from accidental deletion.
Another issue you may encounter is that BLOB retrieval performance may be negatively impacted if you move the BLOB store to a slower hard drive. This can also be true if the BLOB store is hosted on a networked location where bandwidth is limited or latency is an issue. On the other hand, BLOB retrieval performance may improve if the hard drive you place it on is faster than the SQL drive. So don't skimp on the hardware!
It is also worth noting that RBS is enabled at the content database level, not the site collection level. This means that if you have multiple site collections in a content database, enabling RBS on that content database enables RBS on all of those site collections. If this is not the desired behavior, then you may need to move site collections to other content databases.
Finally, Microsoft has stated that it is not guaranteeing that RBS will seamlessly migrate to the next version of SharePoint. You can always "undo" RBS and pull everything back into the database, so you won't be completely out of luck during the next upgrade.
What are the benefits of Remote BLOB Storage?
Following is a brief rundown of some of the benefits associated with RBS:
· Database size is reduced. Having a smaller database can be extremely beneficial if you are running out of disk space on your SQL server and don't have the ability to add additional disks.
· Prolongs use of SQL Express. If you are using SQL Express in your SharePoint environment but you are bumping up against the 4 gigabyte database size limit (10 gigabytes in R2) associated with the free edition of SQL Server, then you can use RBS to offload data from the database. Since the data is not in directly in the database it does not count against the size quota.
· Cheaper file storage. Database hard drives tend to be fast and expensive, so archiving a large number of files that are rarely accessed on those disks is not always the best use of fiscal resources. In as much as RBS allows you to move BLOB storage to fast hard drives and see a performance benefit, you can also move them to slower, cheaper hard drives for a budgetary boost when performance is not as much of a concern.
· Database backup time reduced. As database size increases, so does the time to back up the database. If you have a specific maintenance window in which the backup has to occur, the database can grow so large that it becomes difficult or impossible to complete in that time frame. By offloading the BLOB store, the database size is reduced and the database backup time can be reduced as well. You will still need to make a backup of the BLOB store, but that can be done using a separate parallel process.
· Large BLOB performance benefits. Streaming data from a file stream can outperform streaming data directly from a database. However, setting up SQL server for use of a remote file stream is an expensive process. For small files (less than 60KB), it may not be beneficial to bother with a file stream at all. For larger files it can be very beneficial. Fortunately, RBS allows you to configure certain file size thresholds to store smaller files directly in the database and larger files using RBS, so they reside in a location where performance is optimal.
In case you were hoping to find it in this list, I just want to clarify that remote BLOB storage will not get around the 2 gigabyte limit for files in SharePoint. Sorry to dash your hopes if you have really enormous files with which you're trying to deal.
Backing up a Database Without BLOBs
One of the reasons people tend to remove BLOBs from their database is to reduce backup time, but if your RBS provider supports native SQL backups then the BLOBs end up in the backup anyway so you don't really save much time. How do you avoid this? BLOBs are all placed in a specific file group, so you'll have to employ a backup strategy that explicitly excludes the BLOB file group from the backup, or else you're backup is probably going to be larger than you want. Also remember, if you exclude the BLOBs from your SQL backup, then you will want to implement a separate backup process for the BLOBs.
What is the RBS Maintainer?
RBS Maintainer is a utility application that is responsible for performing garbage collection on the remote BLOB store. You will need to setup a scheduled task to run RBS maintainer with settings that work for your database and file store backup strategy. You can find information on configuring the RBS Maintainer in the TechNet article Maintain Remote BLOB Storage (RBS) (SharePoint Foundation 2010).
Are There Alternatives to RBS?
If you are encountering size issues with a single content database and do not want to implement RBS in your environment, then one option to consider is moving some of your site collections into a separate content database. If you only have one site collection in your content database, then you may want to consider identifying some of your larger sub-webs and turning them into their own site collections. Although this will not reduce the overall size of the databases on your system, having two smaller databases can help performance and opens up the possibility of running parallel tasks (e.g. database backups) if you are pushing the limit with your maintenance windows.
How to Enable the RBS FileStream Provider for SharePoint 2010
Most of the information you'll find from here on out can be found on TechNet from the links provided below, but I've put it all together in one place and added some commentary on parts that I found were confusing, or not described in detail. I'm going to walk through how to enable the out-of-the-box FileStream provider for SharePoint 2010. If you want to use a third-party RBS provider, then you'll need to use their documentation.
In the sections that follow, any items in italicized red will need to be updated with appropriate values for your environment. For example, any place you see [ContentDbName], will need to be replaced with the name of the content database for the SharePoint site for which you wish to enable RBS.
Step 1 – Enable Filestream Access on the SQL Server
You'll need to complete the following steps to enable Filestream access on the SQL Server Box:
· Click on the Start menu
· Type SQL Server Configuration Manager in the 'Search Programs and Files' textbox
• In the list of results you should see the SQL Server Configuration Manager icon appear
· Right click on the icon and choose Run as administrator from the context menu
· In the left pane, click on the SQL Server Services icon
• You may need to double click or right click on this icon and select Open
• The right pane should populate with SQL Server services running on the server
· In the right pane, right click on the instance of SQL server for which you wish to enable RBS and choose Properties from the context menu
• The SQL Server Properties dialog should appear
· Click on the FILESTREAM tab
· Check the box named Enable FILESTEAM for Transact-SQL access
· Check the box named Enable FILESTREAM for file I/O streaming access
· Leave the default value for the Window share name (it should be set to MSSQLSERVER)
· Check the box named Allow remote clients to have streaming access to FILESTREAM data
· Click OK
Step 2 – Configure the filestream_access_level Setting
Next, you will need to open SQL Management studio and execute the following command:
EXEC sp_configure filestream_access_level, 2
RECONFIGURE
The sp_configure command is responsible for displaying or changing global configuration settings for the database server. In this case, we need to set thefilestream_access_level value which has three possible settings:
· 0 – FILESTREAM support is disabled (default value)
· 1 – FILESTREAM support for T-SQL is enabled
· 2 – FILESTREAM support for T-SQL and Win32 streaming is enabled
So, the command listed above enables FILESTREAM for T-SQL and Win32 streaming. You can read more about this command here: sp_configure (Transact-SQL).
Step 3 – Create the master key if it does not already exist
Execute the following command in SQL Management Studio:
USE [ContentDbName]
IF NOT EXISTS ( SELECT *
FROM [sys.symmetric_keys]
WHERE [name] = N'##MS_DatabaseMasterKey##' )
CREATE MASTER KEY ENCRYPTION BY PASSWORD = N'Admin Key Password !2#4'
This command determines if the master encryption key exists, and if not, creates it using the specified password. The master key is used to encrypt other keys stored in the database. You can read more about this command here: CREATE MASTER KEY (T-SQL). Make sure you remember the password you used in this command.
Step 4 – Create a File Group for the FileStream Provider
Execute the following command in SQL Management Studio:
USE [ContentDbName]
IF NOT EXISTS ( SELECT [groupname]
FROM [sysfilegroups]
WHERE [groupname] = N'RBSFilestreamProvider' )
ALTER DATABASE [ContentDbName]
ADD FILEGROUP RBSFILESTREAMPROVIDER CONTAINS FILESTREAM
This command creates a filegroup for the FileStream provider named RBSFileStreamProvider. If you want, you can name it something different. You can read more about this command here: ALTER DATABASE File and Filegroup Options (Transact-SQL).
Step 5 – Define the physical location where BLOB data is stored
Execute the following command in SQL Management Studio:
USE [ContentDbName]
ALTER DATABASE [ContentDbName]
ADD FILE (name = RBSFilestreamFile, filename = 'c:\Blobstore')
TO FILEGROUP RBSFilestreamProvider
This command adds a file named RBSFileStreamFile to the RBSFileStreamProvider file group you created in Step 2. You will need to replace 'c:\Blobstore' with the location where you wish to store BLOB data. This value MUST be a location on a physical hard drive directly on the SQL server. No UNC paths or networked drives are allowed. You can read more about this command here: ALTER DATABASE File and Filegroup Options (Transact-SQL).
Step 6 – Install RBS on the SQL Server Box
First, you will need to download RBS_X64.msi, but DO NOT INSTALL IT by double clicking on the icon once it is downloaded. Instead, do the following:
· Click on the Start menu
· Type cmd in the 'Search Programs and Files' textbox
• In the list of results you should see the cmd.exe icon appear
· Right click on the cmd.exe icon and choose Run as administrator from the context menu
· Navigate to directory where you downloaded the RBS_X64.msi file
· Type the following command:
msiexec /qn /lvx* rbs_install_log.txt /i RBS_X64.msi TRUSTSERVERCERTIFICATE=true FILEGROUP=PRIMARY
DBNAME="<ContentDbName>" DBINSTANCE="<SQLServerInstance>"
FILESTREAMFILEGROUP=RBSFilestreamProvider FILESTREAMSTORENAME=FilestreamProvider_1
You will need to type this on a single line and replace <SQLServerInstance> with the server name of your SQL machine. If you have a named instance you will need to use the named instance syntax for the server name (e.g. server/InstanceName). The parameter rbs_install_log.txt defines the location where the installation log is written, and theFILESTREAMFILEGROUP parameter should be equal to the FILEGROUP name created in Step 2.
When you run this command from the command prompt, it runs asynchronously, so it appears to complete almost instantly. In reality, it takes about a minute to complete. You can determine whether or not the installation succeeded by looking at the installation log file (rbs_install_log.txt). There should be a message near the end of the file (you may have to scroll up a bit from the end to find it) that says
Product: SQL Remote Blob Storage – Installation completed successfully
If you do not find this message, read through the log looking for information about what went wrong and try to remedy those issues.
Step 7 – Install RBS on All Other SharePoint Farm Servers
To work correctly, RBS must be installed on the SQL Server and every other machine in the SharePoint farm – so this step needs to be repeated on each additional server in the farm. Please note, however, that the RBS installation is slightly different after you've done the initial installation outlined in Step 4. For each additional server in the farm, you'll need to open the command prompt with run as administrator in the same way outlined in Step 4, but you'll need to execute the following command to install RBS instead of the one from Step 4:
msiexec /qn /lvx* rbs_install_log.txt /i RBS_X64.msi
DBNAME="<ContentDbName>" DBINSTANCE="<SQLServerInstance>"
ADDLOCAL="Client,Docs,Maintainer,ServerScript,FilestreamClient,FilestreamServer"
The installation works the same way – it should take about a minute and you'll be able to determine if installation was successful by looking at the installation log.
Note that if you are running SQL server directly on one of the machines in your SharePoint farm then there is no need to run the RBS installation a second time on the same box. Just move on to the next server in the farm.
Step 8 – Enable RBS on the Content Database
On one of the web servers in your SharePoint farm, complete the following steps:
· Click on the Start menu
· Type SharePoint 2010 Management Shell in the 'Search Programs and Files' textbox
• In the list of results you should see the SharePoint 2010 Management Shell icon appear
· Right click on the icon and choose Run as administrator from the context menu
· Type the following PowerShell commands
• Each of these commands should be typed on their own line
$contentDB = Get-SPContentDatabase –WebApplication <http://WebAppUrl>
$rbsSettings = $contentDB.RemoteBlobStorageSettings
$rbsSettings.Enable()
$rbsSettings.SetActiveProviderName($rbsSettings.GetProviderNames()[0])
The first line of the PowerShell script creates a variable called $contentDB that is populated with the content database object for the specified web application; you will need to provide the URL of the web application for which you wish to enable RBS. The second line creates a variable called $rbsSettings and populates it with the remote blob storage configuration settings for the content database. The last two lines enable RBS for the content database and set the provider name to the FILESTREAM provider. Once these commands have been completed, RBS should be enabled on your SharePoint site.
Step 9 – Verify RBS is Working
You can test whether RBS is working by completing the following:
· Use Windows Explorer to browse to the folder containing the blob storage folder specified in step 5
• For example, If you specified c:\Blobstore as the blob store location, then you would browse to c:\ so you can see the Blobstore folder
· Right click on the blob storage folder and choose Properties from the context menu
· Note the size of the folder
· Upload a file that is larger than 100 KB to a document library in the SharePoint site where RBS is enabled
· Verify that the upload succeeded by downloading the file from the document library and making sure it downloads successfully
· Right click on the blob storage folder and choose Properties from the context menu
· Verify that the size of the folder has grown by the size of the file you just uploaded
• If you uploaded a 100 KB file then the folder size should be 100 KB larger than before.
• If you uploaded a 5 MB file then the folder size should be 5 MB larger than before, etc.
Step 10 – Migrate Existing Documents from the Database to the Remote Blob Store
Once you have RBS working, migrating documents is really easy. You just have to complete the following steps:
· Click on the Start menu
· Type SharePoint 2010 Management Shell in the 'Search Programs and Files' textbox
• In the list of results you should see the SharePoint 2010 Management Shell icon appear
· Right click on the icon and choose Run as administrator from the context menu
· Type the following PowerShell commands
• Each of these commands should be typed on their own line
$contentDB = Get-SPContentDatabase –WebApplication <http://WebAppUrl>
$rbsSettings = $contentDB.RemoteBlobStorageSettings
$rbsSettings.Migrate()
Please note that when you run this command the server begins copying data from the database to the file system. If you've got a lot of data in the database, it may take a while.
How Do I Disable RBS?
If you decide that you do not want to use RBS to store your BLOB data remotely, you can always disable RBS. To do so you will need to complete the following steps:
· Click on the Start menu
· Type SharePoint 2010 Management Shell in the 'Search Programs and Files' textbox
• In the list of results you should see the SharePoint 2010 Management Shell icon appear
· Right click on the icon and choose Run as administrator from the context menu
· Type the following PowerShell commands
• Each of these commands should be typed on their own line
$contentDB = Get-SPContentDatabase –WebApplication <http://WebAppUrl>
$rbsSettings = $contentDB.RemoteBlobStorageSettings
$rbsSettings.SetActiveProviderName("")
After you execute this command, any new BLOB data is written directly to the database. Existing BLOB data will remain in the remote BLOB store until it is migrated back to the database. To migrate it back to the database, just follow the instructions from Step 10 and the data will be migrated back to the database (once again, this can take some time). If you are wondering why it will go back to the database this time, it's because the Migrate() method moves data to the current provider, and in this step you've specified a new provider– the default SQL provider that stores BLOB data directly in the database.
Please note that Microsoft highly recommends that you do not try to uninstall RBS! Just disable it.
No comments:
Post a Comment