Digital Libraries/Web Publishing

Module name
Web Publishing

Scope
This module covers the general principles of Web publishing and the various paradigms that can be used for storing and retrieving content within digital libraries. This module introduces various techniques to publish information in digital libraries and compares and contrasts them. It also discusses how the various paradigms can be used in varied scenarios for different applications.

Note': This module includes an exercise of creating a wiki (Web-based publishing system) to help coordinate and conduct the course activities. The instructor uses the wiki to upload class documents and the students use the wiki to upload their assignments. Our recommendation is to teach this module in the beginning phases of the entire course as the tools used and developed during this class will be useful all throughout the course.

Learning objectives:
By the end of this module, the student will be able to:
 * a. Identify the fundamental concepts, theoretical models and the process of generation and maintenance of online publication in digital libraries.
 * b. Determine the criticalities of choosing the correct paradigm for Web publishing over others depending upon the purpose of the digital archive.
 * c. Successfully create and manage a small scale personal or collaborative online environment for publishing or sharing content over the Web.

5S characteristics of the module:

 * a. Stream: Web publishing generates a stream of digital data that enters the Web based digital archive. The data that is displayed over the Web also accounts for another stream of data outputted by the Web.
 * b. Structure: The relevance of structures pertaining to Web publishing deals with the different Web based architectures that can be used to publish information online. The composition of documents, for example XML formats, HTML format, Channel Definition Format (definition of a website's content and structure), PDF, etc depending upon their formats is a structure.
 * c. Spaces: The space required for storing data over the Web and the Web servers involved in publishing it would be pertaining to the space domain. The medium used for displaying information over the Web such as a monitor or a PDA screen can be classified as a space.
 * d. Scenario: Scenarios include the steps involved in the process of Web publishing such as Submission, Acquisition, Quality Control, Production and Delivery.
 * e. Society: The collaboration involving content published over the internet that could involve multiple users to modify/moderate it to achieve eventual consistency could be discussed related to society.

Level of effort required:

 * a. In-class time: 1 1/2 hours.
 * b. Out-of-class time:
 * i. Preparation / Reading: 2-3 hours
 * ii. Assignment: 2 hours

Relationships with other modules:

 * a. 3-b: Digitization
 * Digitization encompasses the procedure for digitizing information and the various technical standards that are involved in different types of digital data sets such as images, text, etc. Web publishing includes the publication of digitized books or articles online. Therefore, 3-b Digitization module can be taught before the Web publishing module.
 * b. 3-d Document and e-publishing / presentation markup
 * Document and e-publishing is very similar to Web publishing as it covers the process of creating documents and publishing them. Document and e-publishing would also include tagging within documents. However, Web publishing also deals with the various paradigms (Wiki, RSS) used to publish content and the differences amongst them.
 * c. 6-c Sharing, networking, interchange.
 * Sharing, networking and interchange deals with the human perspective of collaborating together on a common / virtual space to interact with other users and share information. Web publishing focuses on the aspect of publishing data online, which may involve collaboration with other users depending upon the publishing paradigm used. The materials published on a wiki or a blog are shared by visitors, who often have similar interests. Blogs and wikis could be networked with others that provide similar or complementary content.  Web publishing is more information centric whereas this module on sharing, networking and interchange is more user-focused and revolves around collaborative interaction.

Prerequisite knowledge required:

 * a. None

Introductory remedial instruction:

 * a. None

Basic Concepts and Definitions

 * 1. What is Web publishing?
 * a. Digital content vs. Web content
 * b. Web publishing - "Web publishing is an activity of collecting Web pages, images, videos and other digital assets and hosting them on a particular domain on the World Wide Web"
 * c. A Web service that automates information services that are conducted over the Internet, using standardized technologies and formats/protocols that simplify the exchange and integration of large amounts of data over the Internet.
 * d. Traditional publishing vs. Web publishing - The traditional approach of publishing is an expensive and time consuming procedure. It categorizes different roles for specific persons or organizations and is not an iterating process. However, Web publishing is a flatter and more collaborative approach towards publishing content where each contributor plays multiple roles in the publishing procedure rather than a specific pre-assigned process as shown in the figure below [3].
 * e. Especially Wiki, RSS and Blogs will be the focus of this section. Wikis allow users to collaboratively create, edit, link and organize content over the Web. RSS (Really Simple Syndication or Rich Site Summary) makes it possible for people to keep up with dynamic Web content in an automated manner that can be piped into special programs or filtered displays. Blogs are either personal diaries or Websites providing news or commentary arranged in a reverse chronological order.


 * 2. Social Infrastructure of publication
 * a. Understand the roles of people involved with the publication and the social infrastructure that is involved in the process of Web publishing.
 * b. Those involved with publishing can be categorized using the following list:
 * i. Authors - The one who owns the content and creates it originally is named as the author of the content.
 * ii. Publishers - Publishers are brokers between authors who wish to disseminate their thoughts, ideas and knowledge and the readers / consumers of the content published.
 * iii. Third Party Institutions - Institutions include schools, professional organizations, research labs and companies that have affiliation with the author or the publishers.
 * iv. Consumers - Interested parties such as learners and readers can be classified as consumers of the published material.
 * c. Each entity can be involved in multiple operations in the entire process. For example, a publisher also can be a consumer of the published material.


 * 3. Intellectual Property Rights
 * a. Copyright Laws bestow ownership to the author of the published work.
 * b. Authors possess the powers to modify, update and manage the published content.
 * c. The information owned by the author cannot be printed, reproduced, or otherwise communicated, either directly or with the aid of a medium, without the authors consent. (US Copyright Laws)
 * d. The copyright laws also can be transferred to other people or third party organizations.
 * e. In an instance where a third party or a consumer uses the work published by an author to derive a hypothesis or to state a fact, a mandatory reference has to be made to the author's work in order to acknowledge credit to the author for his work.


 * 4.	Access to material
 * a. Open access (OA)
 * i. Open access is free, immediate, permanent, full-text, online access, for any user, Web-wide, to digital scientific or scholarly material.
 * ii. Open access means that any user, anywhere may link, read, download, store or data mine the digital content of that article.
 * b. Open source offers practical accessibility to a product's source including its knowledge and goods. In this context, it implies free access to all the resources categorized as open source.
 * c. Material is often accessible though license agreements. Licenses specify permission to access the material. It is granted by the licensor to the licensee as an element of agreement between them.
 * d. Some material is only accessible to a community or a group demonstrating a sense of privacy for the material. The Web's security systems can protect such limited accessing.

Benefits of Web publishing

 * 1. Makes it easier to conduct work across organizations regardless of the types of operating systems, hardware/software, and databases that are being used. Web browsers make this possible since they form an interface between the system and the online repository.
 * 2. Easy to update, modify and restore information.
 * 3. Enabling remote access to resources.
 * 4. Enabling concurrent access of resources by multiple users.
 * 5. Collaboration with multiple users to maintain/manage correct information.
 * 6. Preservation of online material is easier due to the digital nature of data. Creating back-up copies of published data for circulation or storage is comparatively easier, faster and precise than traditionally published hard copies of the material.
 * 7. As compared to traditional publishing, Web publishing saves on of cost and effort.

Generate, Publish and Maintain Web Content

 * 1.	Issues before publishing
 * a. Selection of material - The topic and the content that may undergo the publishing process.
 * b. Enhancing content value before publishing - The structure and flow of information can be modified for a more logical understanding of concepts.
 * c. Copyright for the protection of resources.
 * d. Copyright check for not infringing/overlapping others' information.
 * e. Authentication for preservation and restricted access for updating / maintenance.


 * 2. Publishing Procedure
 * A general model of publishing can be categorized into the following steps.
 * 2.1 Submission
 * a. Collection of material from the author.
 * b. Submission of final content for publishing.
 * c. For instance, if one wishes to publish an e-book for a particular topic, the initial course of action would be to gather all the material written or collected by the person. Creating a final version of the content one would like to publish as an author will be the following step.


 * 2.2 Acquisition
 * a. Content passes through a reception process on receiving from a source. This may include activities such as management / segregation of content (creating folders or directories to hold them) and registering the item within the publication records. In the example, the digital data should be separated depending upon their relevance into different chapters.
 * b. Acknowledging the submission and notification to author.


 * 2.3 Quality Control
 * a. Correctness and authenticity of data. In our example scenario, the data to be published should be verified with previously certified papers or journals. If a contradicting fact is proved then it should be backed by appropriate results, evaluation and proof.
 * b. Check formatting and grammatical mistakes. Some journals have specific pre-determined format. (e.g., IEEE & ACM Journals) If the article or journal being published has a particular format, then that should be used for the matter being published.
 * c. Link check (ensure the validity of all outward Web links, if applicable) In most Web articles or books, links to various other published content are present that provide more information about a specific topic. The working of those links should be checked.
 * d. Peer review in case of official journals. Certain journals require an official team to review the content before publishing. The team could comprise of professional personnel pertinent to that field of study and they need to approve the authenticity of the data being published.


 * 2.4 Production
 * a. Ready for publication. In this phase, the data has undergone a series of changes and is now ready for being published.
 * b. Depending upon the technology being used such as RSS or Wiki or Blogs, the author needs to decide the layout of how the data will be displayed online. In the example, the author could make a tabbed interface for all the chapters of the book or have hyperlinks to all the chapters in the index.


 * 2.5 Delivery
 * a. Online release for public use. One the data has been published over the Web, the author makes an official release of the content and makes it accessible to other Web users.
 * b. Periodic maintenance. Once published, the content can be updated with new findings or with the latest updates pertaining to that field of study.
 * c. Announcement for advertisement and awareness. This task involves publicizing the published data to spread awareness. In the example, hosting a press release or having advertisements about the book online on other Web pages would help convey the message.

Web 2.0

 * 1. Web 2.0 is a term describing the trend in the use of World Wide Web technology and Web design that aim to enhance creativity, information sharing, and, most notably, collaboration among users.
 * 2. These concepts have led to the development and evolution of Web-based communities and hosted services such as social networking sites, wikis and blogs

Web Publishing Paradigms

 * 1. RSS
 * a. RSS (i.e. Really Simple Syndication) is a Web based format used to frequently publish Web content such as news headlines, event updates, podcasts.
 * b. It generates a RSS document called a "feed" or "Web feed" that encompasses a summary / full text of the content being updated by the associated Web site.
 * c. This RSS content can be viewed using a "RSS Reader"
 * d. The process involves the following steps,
 * i. The user subscribes to a feed by entering the feed's link into the reader. The user can also click on the RSS icon on the Web page and subscribe to the feed.
 * ii. The new feeds automatically populate in the users RSS Reader and notify the user of any new content.
 * iii. The user can then view or download any updates.
 * e. RSS Readers are of two types
 * i. Web-based - All the feeds come to a Web reader which the user access online by visiting the Web reader's home page and logging into it.
 * ii. Program-based - In this case the user can directly run the RSS reader program to download the feeds onto one's computer.


 * f Web Publishing Tool for RSS - Google Reader
 * i. Several Web-based tools exist for RSS. Google Reader is one of the more popular ones. Many program-based tools also exist online which can be downloaded for free from News Gator (www.newsgator.com).
 * ii. Google Reader constantly checks your favorite Web sites for new content and updates their status immediately.
 * iii. It simplifies your reading experience by showing all your favorite Websites in one convenient place. It is analogous to your personalized mail box for the entire internet.
 * iv. Google Reader also allows you to collaborate with your friends by recommending feeds to them and helps you better manage your content.


 * 2. Wiki
 * a. Wiki is software that allows users to collaboratively create, edit, link and organize content online.
 * b. The term comes from Hawaii (wiki means 'quick' in Hawaiian) and refers to the "fast" speed of publishing.
 * c. Popular systems like Wikipedia, Wikibooks or Wikiversity are built atop wiki technology.
 * d. This online content is generally for reference purposes but wikis are also used by educational groups to collaborate on work.
 * e. Wikis are also commonly used in business to provide an affordable and effective intranet and for knowledge management.
 * f. Wiki makes it possible for all Web users to edit any Webpage or create new pages using any normal browser. (e.g., Mozilla Firefox, Internet Explorer, Safari, etc.)
 * g. Wiki promotes meaningful associations between Web pages by easily linking them together.
 * h. Wiki also provides a common collaborative environment where information is built and modified by several users and comes to an eventual consistency.


 * i. Web Publishing Tools for Wiki
 * i. There are many wiki tools, including MediaWiki
 * ii. PBwiki allows registered users to create wikis for free for educational and business purposes.
 * iii. They offer several features to search your content and add tags to it for better content management. The sidebar navigation view also makes it ease to browse through all your Web pages.
 * iv. It allows you to invite others by providing them a unique Web URL (link).
 * v. PBwiki has a comments section on every wiki where users can leave their views about your content and suggest changes to enhance it.


 * 3. Blogs
 * a. The name 'blog' came from 'Web log' (Web + log -> We + blog -> blog).
 * b. It is a Web site which is used in a diverse and interactive way. Examples include a personal diary, corporate blogs, products and services promotion, showcase tutorials, political soap box, news outlet, or space to express your opinions.  Readers of a blog can respond to the posts by leaving comments.  Blogs can be connected to other blogs.  This type of asynchronous interaction allows communication between the blog author and the readers.
 * c. The blog posts are usually displayed in a reverse chronological order so that the most up-to-date posts can show up at the top. Although most blogs are primarily textual, multimedia such as video clips, music and images are also included as its content.   There are content-focused blogs such as MP3 blogs, photoblogs (images), vlogs (video clips), podcasting blogs, etc.  Blogs can be classified by genre such as travel blogs, project blogs, classical music blogs, education blogs, etc.
 * d. Blog Publishing Systems
 * i. WordPress
 * First release in March 2003, evolved from its precursor, b2/cafelog.
 * Popular blog publishing application & content management system, which incorporates PHP that talks with MySQL database
 * Templates and pre-defined themes allow easy customization of the blog
 * Support for tagging, clean permalink, and plug-ins to extend its capabilities
 * Content can be sent from mobile phones
 * WordPress MU (multi-user) supports running of several blogs in one installation
 * http://wordpress.org/
 * ii. Blogger
 * First launched in 1999
 * Provides easy creation and customization of blogs - drag-and-drop page elements, templates and custom color/fonts managements, etc.
 * Content such as images can be sent from mobile phones
 * Team blog feature supports a group of people to develop a single blog collaboratively
 * https://www.blogger.com/


 * e. Blog search engines: They search blogs in the Blogosphere, which is a social network of interconnected blogs.
 * i. Technorati
 * http://technorati.com/
 * It provides keyword, URL and tag search.
 * ii. BlogScope
 * http://www.blogscope.net/
 * Analysis and visualization tool, developed as part of a project at the University of Toronto. It tracks 37.93 million blogs with 905.74 million posts (as of September 2009).
 * Visualization of
 * iii. BlogPulse
 * http://www.blogpulse.com/
 * Browse interface organized based on Top Videos, Top Blogs, Top News Stories, Top Key People/Posts, Top Phrases, etc.
 * Statistics on Blogosphere (as of September 2009)
 * a. Total identified blogs: 107,076,044
 * b. New blogs in last 24 hours: 70,663
 * c. Blog posts indexed in last 24 hours: 825,338


 * f. Video clips
 * i. Blogs in Plain English
 * http://www.youtube.com/watch?v=NN2I1pWXjXI&feature=channel
 * ii. What is a blog
 * http://video.google.com/videoplay?docid=1162704503530698690&hl=en#
 * iii. Technorati - blog search engine
 * http://www.youtube.com/watch?v=oECM8PeQMwo
 * ill http://www.youtube.com/watch?v=orcOJ96xqLA

Recommended reading

 * i. Cunningham, W. and Leuf, B. (2002). What Is Wiki. Retrieved September 1, 2009, from http://www.wiki.org/wiki.cgi?WhatIsWiki
 * ii. Wiki. (n.d.). Retrieved September 1, 2009, from Wikipedia: http://en.wikipedia.org/wiki/Wiki
 * iii. Nottingham, M. RSS Tutorial for Content Publishers and Webmasters. (2005). Retrieved September 1, 2009, from mnot's Web log: http://www.mnot.net/rss/tutorial/
 * iv. Blog. (n.d.). Retrieved September 1, 2009, from Wikipedia: http://en.wikipedia.org/wiki/Blog
 * v. Geneva Henry. (2003). Online Publishing in the 21st century: Challenges and Opportunities. D-Lib Magazine, 9(10). http://www.dlib.org/dlib/october03/henry/10henry.html
 * vi. Tony Hammond, Timo Hannay and Ben Lund. (2004). The Role of RSS in Science Publishing: Syndication and Annotation on the Web. D-Lib Magazine, 10(12). http://www.dlib.org/dlib/december04/hammond/12hammond.html

Video clips for Exercises

 * i Wiki in Plain English by Lee LeFever at http://www.youtube.com/watch?v=-dnL00TdmLY
 * ii. RSS in Plain English by Lee LeFever at http://www.youtube.com/watch?v=0klgLsSxGsU
 * iii. Blogs in Plain English by Lee LeFever at http://www.youtube.com/watch?v=NN2I1pWXjXI&feature=channel

Concept map
Note: IHMC Cmap Tools is an open source client tool to create concept maps. CmapServer enables the users to collaborate and share concept maps anywhere on the internet. Both software can be downloaded freely for educational purposes from http://cmap.ihcm.us/download/index.php

Exercise 1

 * 1. Class activity would involve creating and maintaining a class wiki for future exercises and discussions.
 * 2. Each student should create an account at PBwiki. The class instructor should choose a name for the class and create a wiki for the class of that name. Example: If the class name is Digital Libraries, then a wiki of the name digitallibraries.pbwiki.com should be created.
 * 3. All the documents for the class and reading material should be uploaded on this wiki page by the instructor and the students should access this wiki to read notes and should leave comments for the same.
 * 4. This wiki should be used for further modules and in case of team projects, each team should create their own wiki and upload the description of the project, team member names and the progress report of the project.

Exercise 2

 * 1. Watch the video "Wiki in Plain English" and discuss in class about the following questions.
 * a. Go to http://www.youtube.com/watch?v=-dnL00TdmLY
 * b. How different is a wiki from normal Web pages?
 * c. Is a wiki convenient to use and share information with multiple users?
 * d. Can wiki be used as a trusted source of information?
 * e. Does all the information in a wiki reach a stage of eventual consistency?
 * f. How can we increase the quality of information in a wiki?


 * 2. Watch the video "RSS in Plain English" and discuss in class about the following questions.
 * a. Go to http://www.youtube.com/watch?v=0klgLsSxGsU
 * b. Is RSS a preferred technology by you to read your frequently viewed Websites?
 * c. Does RSS help organize information or scatter more into confusing feeds?
 * d. Does RSS make browsing on the internet faster?


 * 3. Watch the video "Blogs in Plain English" and discuss in class about the following questions.
 * a. Go to http://www.youtube.com/watch?v=NN2I1pWXjXI&feature=channel
 * b. Compare and contrast Wikis and blogs (and use of RSS feed along with blogs) in terms of quality of information posted and the spreading of information in the community

Glossary

 * a. Wiki - A collaborative Web site comprises the perpetual collective work of many authors.
 * b. RSS - RSS is the acronym used to describe the de facto standard for the syndication of Web content. RSS is an XML-based format and while it can be used in different ways for content distribution, its most widespread usage is in distributing news headlines on the Web.
 * c. Blog - Short for Web log, a blog is a Web page that serves as a publicly accessible personal journal for an individual.

Contributors

 * a. Developer:
 * i. Pratik Karia - Virginia Tech
 * b. Reviewers:
 * i. Seungwon Yang - Virginia Tech, Reviewer.
 * ii. Dr. Edward A. Fox - Virginia Tech, Reviewer.
 * iii. UNC-CH DL curriculum project team