SharePoint and the FAST Enterprise Search Platform
A note from Mark Miller: This is our first article by Microsoft SharePoint MVP Natalya Voskresenskaya. Her expertise is in developing portal solutions, now almost exclusively in SharePoint. In this article, Natalya gives an overview of the current state of search in SharePoint and how it will be affected when FAST is added to the mix. Welcome, Natalya.
Since the early days of 2000, companies have come to realize how important it is to collect and store information, whether their own or their customer’s. Fast forward a few years, and they are now obligated to store this information adhering to government imposed rules and regulations.
The data that companies have accumulated over the years can provide valuable insight into business metrics, but only if the company is aware of the existence of this information and it can be found. The value of the information is greatly obfuscated by sheer volume and lack of structure.
Microsoft has been promoting the SharePoint Search engine as an Enterprise Search solution for the past few years and it did solve some of the problems that companies had been facing in the information findability arena.
Now-a-days, individuals want instant access to information. Spending time searching for information in multiple places, or re-creation of data that already exists but cannot be found, is counterproductive, to say the least. If intellectual capital such as documents, Excel files, presentations, videos, podcasts, blog posts and wiki entries cannot be located quickly, people will either re-create the data or make misinformed, information starved decisions.
Almost every company is facing this problem, but larger companies are hurt the most.
The Current Status of Search in SharePoint
Microsoft SharePoint Search provides the ability to search Exchange public folders, file shares, external web resources, and people profiles, but a great deal of corporate information is unstructured or structured only within the confinement of its respective application. Its information resides in documents, PowerPoint, spreadsheets, and other media. It lives in different places such as desktops, laptops, shared network folders and databases.
SharePoint allows people to organize documentation through metadata and Content Types. The search crawl uses this metadata to store information in the search database as it relates to the document. The metadata provides some level of insight into the type of information a document contains, the category of the document and aids in custom content source creation.
With the volume of information quickly growing within an enterprise, imposing any type of information taxonomy through the utilization of metadata is a resource intensive process as well as impractical for large companies. Subject matter experts must load individual documents into SharePoint and tag them one by one.
Proper information classification cannot be achieved through hiring an army of temporary workers to simply upload documents into SharePoint since they do not have the requisite expert knowledge. The true value of the search lies within the accuracy and relevance of the information brought back as a result of the search query submitted.
The FAST Enterprise Search Solution
FAST Enterprise Search Platform (ESP) allows the creation of multiple layers of Information Architecture onto the existing data.
FAST ESP is an enterprise search engine that provides an extensive content processing and indexing functionalities. FAST search for SharePoint is going to deliver high end search capabilities to enterprises. The ease of use of SharePoint Server with FAST Search high-end filtering and navigation features creates a new enterprise search, collaboration, and publishing solution.
FAST search is a highly customizable and configurable product that allows indexing structured, as well as unstructured, content and imposes multiple layers of Information Architecture. The accompanying diagram represents a high level overview of processes that take place within FAST ESP.
Content that is fed to FAST Connectors can be of any binary or text format, as long as this format can be handled by document processing pipelines. When content is fed into FAST, it is converted into documents*, documents are associated to the right “Collection”. That “Collection” serves as a logical grouping for content sources to be indexed. Based upon the collection that document belongs to, a content distributor passes the document to the “document processing pipeline” associated with this collection.
Note: only one pipeline can be associated with any given collection. One pipeline can be used by multiple collection.
When a document gets distributed to the right pipeline, it goes through “stages” that can be aggregated into three groups:
- Pre-processing
- Document manipulation
- Post-processing
In the pre-processing stages, pipeline based document information gets extracted and associated with a given document, such as: document ID, collection, language, and character normalization.
The majority of data analysis and attribute assignments takes place within document processing stages of the document manipulation group. These stages define what content should be sent to index and what content can be eliminated.
Stages within this group perform entity extraction using dictionaries, generate teasers, and apply business logic to manipulate ranking, relevance, etc. They can also perform content analysis based upon external business applications.
Post-processing stages include linguistics and scope tagging, and send document to the index. When the document is sent to be indexed, FAST also generates an FXML or FastXML version of the document.
Once the document is indexed it can be found by the search engine and served in the query result set to the end user. But before the information is returned to the end user, it goes through a query side document processing pipeline, which further refines and ranks the content to be returned.
All pipelines are highly customizable to fit specific business needs and many of them are being shipped with FAST OOTB.
Conclusion
With FAST being integrated into SharePoint, the concept of dynamic portal navigation being driven by advanced search keyword analysis becomes a reality.
Guest Author: Natalya Voskresenskaya, Microsoft SharePoint MVP
Natalya has been working in the IT field for over 10 years. With experience in design, architecture, development and deployment of Web-based applications, Natalya has been developing and implementing portal solutions since 2000 and working with SharePoint since version 2003.
How can a developer get a hold of FAST, I can’t download it via my MSDN subscription? I think the fact that devs can’t get it to play with, might hold it back.
by any chance FAST is Integrated in sharepoint 2010 Alfa Release?
Microsoft marketing they will we integrating FAST with Sharepoint2010.
@aswath: AFAIK, Microsoft is positioning 2 flavors of FAST ESP. . one integrated with SharePoint 2010( and tuned for intranet) and another one will be full-fledged FAST ESP for Internet Business.