Content Migration

Content Migration

Content Migration is the process of moving information stored on a Web content management system(CMS), Digital asset management(DAM), Document management system(DMS), or flat HTML based system to a new system. Flat HTML content can entail HTML files, Active Server Pages (ASP), JavaServer Pages (JSP), PHP, or content stored in some type of HTML/JavaScript based system and can be either static or dynamic content.

Content Migrations can solve a number of issues ranging from:

  • Consolidation from one or more CMS systems into one system to allow for more centralized control, governance of content, and better Knowledge management and sharing.
  • Reorganizing content due to mergers and acquisitions to assimilate as much content from the source systems for a unified look and feel.
  • Converting content that has grown organically either in a CMS or Flat HTML and standardizing the formatting so standards can be applied for a unified branding of the content.

There are many ways to access the content stored in a CMS. Depending on the CMS vendor they offer either an Application programming interface (API), Web services, rebuilding a record by writing SQL queries, XML exports, or through the web interface.

  1. The API[1] requires a developer to read and understand how to interact with the source CMS’s API layer then develop an application that extracts the content and stores it in a database, XML file, or Excel. Once the content is extracted the developer must read and understand the target CMS API and develop code to push the content into the new System. The same can be said for Web Services.
  2. Most CMSs use a database to store and associate content so if no API exists the SQL programmer must reverse engineer the table structure. Once the structure is reverse engineered, very complex SQL queries are written to pull all the content from multiple tables into an intermediate table or into some type of Comma-separated values (CSV) or XML file. Once the developer has the files or database the developer must read and understand the target CMS API and develop code to push the content into the new System. The same can be said for Web Services.
  3. XML export creates XML files of the content stored in a CMS but after the files are exported they need to be altered to fit the new scheme of the target CMS system. This is typically done by a developer by writing some code to do the transformation.
  4. HTML files, JSP, ASP, PHP, or other application server file formats are the most difficult. The structure for Flat HTML files are based on a culmination of folder structure, HTML file structure, and image locations. In the early days of content migration, the developer had to use programming languages to parse the html files and save it as structured database, XML or CSV. Typically PERL, JAVA, C++, or C# were used because of the regular expression handling capability. JSP, ASP, PHP, ColdFusion, and other Application Server technologies usually rely on server side includes to help simplify development but makes it very difficult to migrate content because the content is not assembled until the user looks at it in their web browser. This makes is very difficult to look at the files and extract the content from the file structure.
  5. Web Scraping allows users to access most of the content directly from the Web User Interface. Since a web interface is visual (this is the point of a CMS) some Web Scrapers leverage the UI to extract content and place it into a structure like a Database, XML, or CSV formats. All CMSs, DAMs, and DMSs use web interfaces so extracting the content for one or many source sites is basically the same process. In some cases it is possible to push the content into the new CMS using the web interface but some CMSs use JAVA applets, or Active X Control which are not supported by most web scrapers. In that case the developer must read and understand the target CMS API and develop code to push the content into the new System. The same can be said for Web Services.

The basic content migration flow

1. Obtain an inventory of the content.
2. Obtain an inventory of Binary content like Images, PDFs, CSS files, Office Docs, Flash, and any binary objects.
3. Find any broken links in the content or content resources.
4. Determine the Menu Structure of the Content.
5. Find the parent/sibling connection to the content so the links to other content and resources are not broken when moving them.
6. Extract the Resources from the pages and store them into a Database or File structure. Store the reference in a database or a File.
7. Extract the HTML content from the site and store locally.
8. Upload the resources to the new CMS either by using the API or the web interface and store the new location in a Database or XML.
9. Transform the HTML to meet the new CMSs standards and reconnect any resources.
10. Upload the transformed content into the new system.

Vendors

References

  1. ^ What the Content Migration APIs Are Not

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Content storage management — (CSM) is a technique for the evolution of traditional media archive technology used by media companies and content owners to store and protect valuable file based media assets. CSM solutions focus on active management of content and media assets… …   Wikipedia

  • Content-addressable storage — Content addressable storage, also referred to as associative storage or abbreviated CAS, is a mechanism for storing information that can be retrieved based on its content, not its storage location. It is typically used for high speed storage and… …   Wikipedia

  • Content Repository for Java Technology API — (JCR) ist eine Spezifikation für eine Java Plattform API, um auf Content in einer einheitlichen Methode zuzugreifen.[1][2] Content Repositories werden von den unterschiedlichsten Informationssystemen genutzt, die beliebige Dokumente zusammen mit… …   Deutsch Wikipedia

  • Content inventory — A content inventory is the process and the result of cataloging the entire contents of a website.[1] An allied practice a content audit is the process of evaluating that content.[2][3][4] A content inventory and a content audit are closely… …   Wikipedia

  • Migration Period spear — The spear together with the sword, the longsax and the shield was the main equipment of the Germanic warriors during the Migration period and the Early Middle Ages. Contents 1 Terminology 2 Ger 3 Framea …   Wikipedia

  • Indo-Aryan migration — For other uses, see Indo Aryan migration (disambiguation). Indo European topics Indo European languages (list) Albanian · Armenian · Baltic Celtic · Germanic · Greek Indo Iranian …   Wikipedia

  • Enterprise content management — (ECM) is a set of technologies used to capture, store, preserve and deliver content and documents and content related to organizational processes. ECM tools and strategies allow the management of an organization s unstructured information,… …   Wikipedia

  • Diel vertical migration — Diel vertical migration, also known as diurnal vertical migration, is a pattern of movement that some organisms living in the ocean and in lakes undertake each day. Usually organisms move up to the epipelagic zone at night and return to the… …   Wikipedia

  • Enterprise Content Management System — Für Enterprise Content Management Systeme (ECMS) werden die unterschiedlichsten ECM Komponenten und Techniken kombiniert, die zum Teil auch als eigenständige Lösungen sinnvoll nutzbar sind ohne den Anspruch an ein unternehmensweites System[1].… …   Deutsch Wikipedia

  • Enterprise-Content-Management-System — Für Enterprise Content Management Systeme (ECMS) werden die unterschiedlichsten ECM Komponenten und Techniken kombiniert, die zum Teil auch als eigenständige Lösungen sinnvoll nutzbar sind ohne den Anspruch an ein unternehmensweites System.[1]… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”