Thursday, March 14, 2013

How to find SharePoint List and Libraries true Size?




Overview
I wanted to get size of list and libraries for my portal in a reporting fashion. For Publishing Portal site collection you can access the list and library size from site actions> under the Site Collection Administration group, you have access to “Storage space allocation”. This is great info. But not something I can consume the same information programmatically.
image
If you notice this page is server through /_layouts/storman.aspx. Further digging on this page gives me the inherited  name space “Microsoft.SharePoint.ApplicationPages.StorMan” which is obfuscated and hence not sure how this above data is retrieved.
 

Further exploration

So still hoping to find the little nugget that is producing this information, I started diggings in to the Content Database Stored Procedures.
Then I came across two interesting stored procedures: proc_GetListSizes and proc_GetDocLibrarySizes
Both stored procedures take the Site Collection GUID. When test ran, the result was the same as the above Storage space allocation page.
 

So what I did with these stored procedures?

I took these two stored procedures, ran against the content database of my portal site collection. Merged the results of both stored procedures. Used some rudimentary caching  to store and retain the results set. Now you can list or get size for your given SPList object in MB.

Warning

This is not recommended against the production system as this is not a Microsoft Supported operation. Also be aware that the results could be huge and may impact your system.

Well here is all you are waiting for… the code base:

I have tested this against the MOSS 2007 and have not had chance to validate for any other SharePoint versions yet.
  • I have also included the Web Lists Size GetWebSizeWithUnitMeasure().
  • I have also included the Lists Size GetListSizeInBytes().
  • Then I have also included a function to determined the best way to represent the size in Bytes or KB or MB or GB or TB DefineSizeWithUnitMeasure

Your calls examples

To get size of a list:
string SizeWithUnitMeasure;
double webSizeInBytes = GetWebSizeWithUnitMeasure(web, out withUnitMeasure);
//Use the SizeWithUnitMeasure to print with the measure.


To get size of a web:


string listSizeWithMeasure;
double listSizeInBytes = GetListSizeWithUnit(list, out listSizeWithMeasure);
//Use the listSizeWithMeasureto print with the measure.



Common Function




 




public static double DefineSizeWithUnitMeasure(double sizeInBytes, out string unitMeasure)
       {
           unitMeasure = "Bytes";
           double size = sizeInBytes;

           if (size > 1024)
           {
               size = sizeInBytes / 1024d;//KB
               unitMeasure = "KB";
           }
           if (size > 1024)
           {
               size = size / 1024d;//MB
               unitMeasure = "MB";
           }
           if (size > 1024)
           {
               size = size / 1024d; //GB
               unitMeasure = "GB";
           }

           if (size > 1024)
           {
               size = size / 1024d; //TB
               unitMeasure = "TB";
           }

           return size;
       }

       public static double GetWebSizeWithUnitMeasure(SPWeb web, out string withUnitMeasure)
       {
       
           double storageUsage = 0d;

           
           foreach (SPList list in web.Lists)
           {
               storageUsage += (double) GetListSizeInBytes(list);
           }
           


           string unitMeasure = "";
           double webSize = DefineSizeWithUnitMeasure(storageUsage, out unitMeasure);

           withUnitMeasure = string.Format("{0} {1}", webSize.ToString("f"), unitMeasure);

           return storageUsage;
       }
       

       


       public static double GetListSizeWithUnit(SPList list, out string withUnitMeasure )
       {
           double listSizeinBytes = (double) GetListSizeInBytes(list);
           string unitMeasure = "";
           double listSize = DefineSizeWithUnitMeasure(listSizeinBytes, out unitMeasure);

           withUnitMeasure=string.Format("{0} {1}", listSize.ToString("f"), unitMeasure);

           return listSizeinBytes;
       }



        public static long GetListSizeInBytes(SPList list)
        {
            long listSize = 0;

            string filter = string.Format("tp_id='{0}'", list.ID);

            DataTable myDataTable = GetCachedSiteCollectionListSizes(list.ParentWeb.Site);
            DataRow[] dataRows = myDataTable.Select(filter);

            if (dataRows.Length > 0)
            {
                listSize = (long)dataRows[0]["TotalSize"];
            }

            return listSize;
        }



        private static DataTable m_SiteCollectionListSizes;
        private static Guid m_SiteCollectionListSizesSiteID;

        private static DataTable GetCachedSiteCollectionListSizes(SPSite site)
        {
            if (m_SiteCollectionListSizes == null || m_SiteCollectionListSizesSiteID != site.ID)
            {
                m_SiteCollectionListSizes = GetSiteCollectionListSizes(site);
                m_SiteCollectionListSizesSiteID = site.ID;
            }

            return m_SiteCollectionListSizes;

        }

        private static DataTable GetSiteCollectionListSizes(SPSite site)
        {

            DataTable dataTable = GetDocLibSizes(site);
            //Combine both list and doc lib size results
            dataTable.Merge(GetListSizes(site));
            
            return dataTable;

        }

        private static DataTable GetDocLibSizes(SPSite site)
        {
           
            string connectionString = site.WebApplication.ContentDatabases[site.ContentDatabase.Id].DatabaseConnectionString;


           string storedProcName = "proc_GetDocLibrarySizes";
            
            System.Data.SqlClient.SqlConnection connection = null;
            System.Data.SqlClient.SqlDataReader reader = null;
            DataTable dataTable = null;
            
            try
            {
                connection = new System.Data.SqlClient.SqlConnection(connectionString);
                connection.Open();

                System.Data.SqlClient.SqlCommand command = new System.Data.SqlClient.SqlCommand(storedProcName, connection);
                command.CommandType = CommandType.StoredProcedure;

                command.Parameters.Add(new System.Data.SqlClient.SqlParameter("@SiteId", site.ID.ToString()));

                reader = command.ExecuteReader();

                dataTable = new DataTable();
                dataTable.Load(reader);

            }
            finally
            {
                if (reader != null)
                    reader.Close();
                if (connection != null)
                    connection.Close();
            }
            return dataTable;
        }

        private static DataTable GetListSizes(SPSite site)
        {

            string connectionString = site.WebApplication.ContentDatabases[site.ContentDatabase.Id].DatabaseConnectionString;
            string storedProcName = "proc_GetListSizes";

            System.Data.SqlClient.SqlConnection connection = null;
            System.Data.SqlClient.SqlDataReader reader = null;
            DataTable dataTable = null;

            try
            {
                connection = new System.Data.SqlClient.SqlConnection(connectionString);
                connection.Open();

                System.Data.SqlClient.SqlCommand command = new System.Data.SqlClient.SqlCommand(storedProcName, connection);
                command.CommandType = CommandType.StoredProcedure;

                command.Parameters.Add(new System.Data.SqlClient.SqlParameter("@SiteId", site.ID.ToString()));

                reader = command.ExecuteReader();

                dataTable = new DataTable();
                dataTable.Load(reader);
               
            }
            finally
            {
                if (reader != null)
                    reader.Close();
                if (connection != null)
                    connection.Close();
            }
            return dataTable;
        }

Friday, March 8, 2013

Search Configuration Best Practices in SharePoint 2010


This is the best practice I had find to configure search in SharePoint 2010:
    1. Add a crawl component to a Search Service Application
1)    In Central Administration, in the Application Management section, clickManage service applications.
2)    On the Service Applications page, click the name of the Search Service Application to which you want to add a crawl component.
3)    On the Search Administration page, in the Search Application Topologysection, click the Modify button.
noteNote: The SharePoint Search topology cannot be changed in Standalone installations.
4)     On the Manage Search Topology page, click New, and then click Crawl Component.
5)    In the Add Crawl Component dialog box, in the Server list, click the farm server to which you want to add the crawl component.
6)    In the Associated Crawl Database list, click the crawl database you want to associate with the new crawl component.
7)    In the Temporary Location of Index field, you can optionally enter the location on the server that will be used for creating the index files before propagating them to the query components. If you want to accept the default location, leave the contents of this field unchanged.
8)    Click OK to add the new crawl component to the job queue.
9)    On the Manage Search Topology page, click the Apply Topology Changesbutton to start the SharePoint timer job that will add the new crawl component to the farm on the specified server.
    1. Create or Edit a content source.  
From Search Administration Page, in the Crawling section at the quick navigation bar, click Content Sources.

To create a content source

1.    On the Manage Content Sources page, click New Content Source.
2.    On the Add Content Source page, in the Name section, in the Name box, type a name for the new content source.
3.    In the Content Source Type section, select the type of content that you want to crawl.
4.    In the Start Addresses section, in the Type start addresses below (one per line) box, type the URLs from which the crawler should begin crawling.
5.    In the Crawl Settings section, select the crawling behavior that you want.
6.    In the Crawl Schedules section, to specify a schedule for full crawls, select a defined schedule from the Full Crawl list. A full crawl crawls all content that is specified by the content source, regardless of whether the content has changed. To define a full crawl schedule, click Create schedule.
7.    To specify a schedule for incremental crawls, select a defined schedule from the Incremental Crawl list. An incremental crawl crawls content that is specified by the content source that has changed since the last crawl. To define a schedule, click Create schedule. You can change a defined schedule by clicking Edit schedule.
8.    To prioritize this content source, in the Content Source Priority section, on the Priority list, select Normal or High.
9.    To immediately begin a full crawl, in the Start Full Crawl section, select the Start full crawl of this content source check box, and then click OK.

 

To edit a content source

1.    You can edit a content source to change the schedule on which the content is crawled, the crawl start addresses, the content source priority, or the name of the crawl. Crawl settings and content type cannot be changed when editing a content source.
2.    On the Manage Content Sources page, in the list of content sources, point to the name of the content source that you want to edit, click the arrow that appears, and then click Edit.
3.    After you have made the changes that you want, select the Start full crawl of this content source check box, and then click OK.
    1. Add some crawling rules to include or exclude paths from crawling. 
Add rules to exclude the following paths:
    • http://*/_catalogs/*
    • http://*/_layouts/*
    • http://*/Lists/*
    • http://*/Documents/*
    • http://*/Forms/*
    • http://.*?/DocLib[0-9]*/.*?
Note: just for the last rule you have to check the option: “Use regular expression syntax for matching this rule”
You can add a rule, to exclude some URL, as in the following screenshot:
  Adding a rule to exclude some URL


Add the following rule to forcibly include pages as normal http pages. Select the option “Include all items in this path” And check the two options: “Crawl complex URLs” and “Crawl SharePoint content as normal http pages”:
    • http://*/pages/*.*
(Note: if you was using non publishing site you would then use “http://*/sitepages/*.*” and also the previous deny rules may have to be modified.)
You can do the previous include rule as in this screen shoot:
 Adding a rule to include all files that have extensions
           

Also, add the following rule to give the crawler the chance to browse the directories without including those directories in the search. Note: this role work with integration with the previous rule that include all files in the search.
You can do the previous include rule as in this screen shoot:
 Adding rule to allow searching inside directories without including the directories them self.


Unfortunately, the previous configuration will work well only for sites that do not have a redirection page at the root. An example, of a site that has a redirection page at the root, is a multilingual site with a variation redirection page at the root. However, if this is your case, simply do two things. First, replace the rule: "http://*/Forms/*" with an exclude rules for those three: "http://*/Forms/Thumbnails.aspx", "http://*/Forms/AllItems.aspx" and "http://*/Forms/DispForm.aspx". Second, do not use any include rules. This will make you rules list as following:
  • http://*/_catalogs/*
  • http://*/_layouts/*
  • http://*/Lists/*
  • http://*/Documents/*
  • http://*/Forms/Thumbnails.aspx
  • http://*/Forms/AllItems.aspx
  • http://*/Forms/DispForm.aspx
  • http://.*?/DocLib[0-9]*/.*?

 Note: just for the last rule you have to check the option: “Use regular expression syntax for matching this rule”.
Enjoy!