Web application file-reference methods

Faculty of Engineering and Technology &rarr; School of Computer Science &rarr; Database Department

This research project examines the different methods employed by web applications to refer, from within the database, to non-application files in the filesystem. We aim to determine what attributes (of database schema, filesystem, file, user, etc.) are used to determine the location of the files in the filesystem.

Introduction
Most database-backed web applications store non-application files, such as those uploaded by users, in the filesystem rather than as objects in the database. These files relate to tables and rows in the databases in various ways, and there are many different ways of recording these relationships.

The 'non-application' files are those that are considered part of the application's data, rather than 'resource' files such as icons and other files used as part of, and distributed with, the application. Generally, these are files that are uploaded by users, and there will often be ancillary versions of them created by the application (such as, in the case of images for example, lower-resolution versions for quicker web display).

Here we investigate the methods used by a number of web applications to store file reference information in their databases. We

The web applications examined here all: use a database; and store attached files in the filesystem.

For each application, we aim to find out what attributes are stored about files, and how they are used when accessing files. Ultimately, we are trying to determine what information is required, given an arbitrary database schema, to determine whether files exist that relate to any given row, and where and how to access such files.

WordPress
WordPress is a MySQL-backed blogging platform in which uploaded files are saved to a user-configurable uploads directory. Files in this directory can be organised into a number of different subdirectory structures (by default, named by year and month) under  in the installation directory. Files are accessed directly by the web server (and not read/written via the application). Each uploaded file has a single row in the  table (which may or may not be linked to an actual post row via the  key). The  column of this row has a value of 'attachment'. Rows in the  table are linked to this row. Where the  column of   is '_wp_attached_file', the   column is a filename, relative to the WP upload directory.

Drupal
Drupal is a general-purpose Content Management System, again using MySQL for its database. For the purposes of this reseach, Drupal 7.4 was installed, using the 'minimal-7.4' installation profile. File information is stored in the  table, with a custom-URI in the   column. Files can be stored in either the public or private hierarchies, depending on the prefix of the  value: or.

Bespoke Application 'A'
This is a bespoke database application built to manage personnel training records for a medium-sized organisation. The source code is not freely available, but the method used is explained here and intended to stand alone, as it could be taken as a hyphothetical example of how this type of file reference could be done and so there is no need to be able to examine the original source.

MediaWiki
MediaWiki keeps its files in directories whose names are constructed from the first two characters of the MD5 hash of the file's name.

Conclusion
Given an arbitrary database schema and the absolute path to the highest directory in which files are stored, it is possible to construct an SQL statement that will provide the latter part of the path to the file or files.