Applied Programming/Files

This lesson introduces files and file processing.

Objectives and Skills
Objectives and skills for this lesson include:
 * Construct and analyze code segments that perform file input and output operations
 * Open; close; read; write; append; check existence; delete; with statement
 * Construct and analyze code segments that perform console input and output operations
 * Read input from console; print formatted text; use of command line arguments

Readings

 * 1)  Computer file
 * 2)  File system
 * 3)  Directory (computing)
 * 4)  Directory structure
 * 5)  Text file
 * 6)  Binary file

Multimedia

 * 1) YouTube: File Systems
 * 2) Youtube: Reading and Writing to Files in Python
 * 3) Youtube: Reading and Writing to Files in Java
 * 4) YouTube: Writing Binary Files

Examples

 * /Java/
 * /JavaScript/
 * /Python3/

Activities

 * 1) Review  Password strength. Create a program that asks the user for an input password. If your programming language or library supports it, get the input without echoing the characters as they are entered. Determine the entropy of the input password based on length of password and the number of different character sets used in the password (e.g. Entropy/Strength Test). Use a separate function to determine password entropy. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
 * 2) Review  Dictionary attack. Enhance the program above by downloading a dictionary of English words as a text file: (i.e. GitHub) Use a separate function to check the password and see if it matches one of the dictionary words. Inform the user if their password is susceptible to a dictionary attack. Use exception handling for all file operations. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language.
 * 3) Enhance the program above by downloading a text file of common passwords: (i.e. GitHub) Use the function above to check the password and see if it matches a common password. Inform the user if their password susceptible to a common password attack. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language.
 * 4) Enhance the program above by saving all passwords entered by the user in a text file. Use the function above to check the password and see if it matches a previously entered password. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language. The final program will check password strength and validate passwords against an English dictionary, a common password list, and a recently-used password list.

Lesson Summary

 * Python allows you to open files in four different modes: read ("r"), write ("w"), append ("a"), and create ("x").
 * A computer's file system is a division of information and data into files and directories (folders).
 * A file system consists of two or three layers. Sometimes the layers are explicitly separated, and sometimes the functions are combined.
 * The logical file system is the first layer and is responsible for interaction with the user application.
 * The second optional layer is the virtual file system which allows support for multiple concurrent instances of physical file systems.
 * The third layer is the physical file system. This layer is concerned with the physical operation of the storage device and handles tasks such as buffering and memory management.
 * Windows makes use of the FAT, NTFS, exFAT, Live File System and ReFS file systems while macOS uses a HFS Plus file system along with the term "Mac OS Extended".
 * File systems include utilities to initialize, alter parameters of and remove an instance of the file system.
 * Directory utilities may be used to create, rename and delete directory entries.
 * File utilities create, list, copy, move and delete files, and alter metadata. They may be able to truncate data, truncate or extend space allocation, append to, move, and modify files in-place.
 * Some of the most important features of file system utilities involve supervisory activities which may involve bypassing ownership or direct access to the underlying device.
 * A directory-based file system is one where directories coordinate the organization and retrieval of information.
 * Although rare, some embedded computers have no directories (everything is a file) or no ability to store directories inside other directories (thereby flattening the computer's storage).
 * If one is referring to a container of documents, the term folder is more appropriate. The term directory refers to the way a structured list of document files and folders is stored on the computer.
 * Current operating systems typically allow for long filenames, more than 250 characters per pathname element.
 * A fully qualified filename is a textual string that includes the path of the file, as well as its unique identifier and extension (e.g., C:\Users\cklei\Desktop\hello_world.py).
 * UNIX-like operating systems use the Filesystem Hierarchy Standard. All files and directories appear under the root directory "/", even if they are stored on different physical devices.
 * On Windows, files get arranged in a hierarchical forest of trees, where each tree root is a "drive letter" labeling the memory space, such as C in the path C:\Program Files.
 * Text files are employed for human-readable storage of information, notable for their simplicity.
 * On Windows, a newline is signaled by the carriage return and line feed characters in unison (CRLF); on UNIX-like systems, including macOS devices, the newline is simply communicated by the line feed character.
 * Binary files are usually imagined to be a series of bytes, which is a group of eight bits itself.
 * They often represent things other than characters (otherwise, you'd likely opt for a text file).
 * Binary files will sometimes be handled by mechanisms that can only deal with textual data; Base64 is an encoding scheme that makes such a translation possible.
 * When two computers (or systems) can run the same executable, they are said to be 'binary compatible'.
 * Some software companies produce applications for Windows and the Macintosh that are binary compatible, which means that a file produced in a Windows environment is interchangeable with a file produced on a Macintosh.
 * This also makes it possible to run programs built for deprecated versions of Windows on newer systems.
 * A hex editor is specially designed to view binary files as chunks of hexadecimal (or decimal or binary) values. If a binary file is opened in a text editor, meanwhile, each group of eight bits is usually translated into a single character.
 * When the 'with' keyword is used together with the open method in Python, the file doesn't need to be closed. It will close automatically after the execution of the 'with' statement.

Key Terms

 * absolute path
 * A full path or location of a file, including the root directory, that points to the same location in a file system regardless of current working directory.


 * binary file
 * A file that is non-textual—a raw sequence of bytes.


 * directory structure
 * Files are stored in a hierarchical tree structure as the childless leaves of directories (or folders).


 * directory
 * A structure that contains references to other computer files and possibly other subdirectories.


 * file system
 * The way in which data is stored and retrieved on the machine.


 * file utilities
 * File utilities allow users to create, list, copy, move and delete files, and alter metadata.


 * fully-qualified filename
 * A string that uniquely identifies a file stored on the computer by including the path, name, and extension of the file.


 * metadata
 * Information that is typically associated with each file within a file system. File systems might store the file creation time, the time it was last accessed, etc.


 * parent & children
 * A parent directory houses "children" files or subdirectories. A child is a file or subdirectory housed in a parent directory.


 * path
 * The location of a file among the hierarchy of directories (possibly indicating a storage device, as well).


 * relative path
 * A path or location of a file that starts from the current working directory.


 * root
 * The highest-level directory in the file system hierarchy, found in UNIX-like operating systems.


 * text file
 * A file that is structured as a sequence of lines of electronic text, it exists stored as data within a computer file system.