Python Programming/Files

This lesson introduces Python file processing.

Objectives and Skills
Objectives and skills for this lesson include:
 * Standard Library
 * os module
 * sys module
 * Input Output
 * Files I/O

Readings

 * 1)  File system
 * 2)  Directory (computing)
 * 3)  Directory structure
 * 4)  Text file
 * 5) PythonLearn: Automating common tasks on your computer
 * 6) Python for Everyone: Files

Multimedia

 * 1) YouTube: Python for Informatics - Chapter 7 Files
 * 2) YouTube: Python - How to Read and Write Files

The os.getcwd Method
The os.getcwd method returns a string representing the current working directory.

Output: Current working directory: /home/ubuntu/workspace

The os.chdir Method
The os.chdir method changes the current working directory to the given path.

Output: Current working directory: /home/ubuntu/workspace Changed to: /home/ubuntu Changed back to: /home/ubuntu/workspace

The os.path.isdir Method
The os.path.isdir method returns True if the given path is an existing directory.

Output: Current working directory exists.

The os.path.join Method
The os.path.join method joins one or more path components intelligently, avoiding extra directory separator (os.sep) characters.

Output: path: /home/ubuntu/workspace directory: /home/ubuntu/workspace/__python_demo__

The os.mkdir Method
The os.mkdir method creates a directory with the given path.

The os.rmdir Method
The os.rmdir method removes (deletes) the directory with the given path.

Output: Created directory: /home/ubuntu/workspace/__python_demo__ Changed to: /home/ubuntu/workspace/__python_demo__ Changed back to: /home/ubuntu/workspace Removed directory: /home/ubuntu/workspace/__python_demo__

The os.walk Method
The os.walk method generates the subdirectories and files in a given path as a 3-tuple of a path string with subdirectory list and filename list.

Output: ... 

The os.path.isfile Method
The os.path.isfile method returns True if the given path is an existing file.

Output: File does not exist.

The open Function
The open function opens the given file in the given mode (read, write, append) and returns a file object.

The file.write Method
The file.write method writes the contents of the given string to the file, returning the number of characters written.

The file.close Method
The file.close method closes the file and frees any system resources taken up by the open file.

Output: Created /home/ubuntu/workspace/__python_demo.tmp

The file.read Method
The file.read method reads the given number of bytes from the file, or all content if no size is given, and returns the bytes that were read.

Output: File text: Temporary Python Demo File

Reading Lines
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code.

Output: Temporary Python Demo File

The file.tell Method
The file.tell method returns an integer giving the file object’s current position in the file.

Output: Open file position: 26 Write file position: 60

The file.seek Method
The file.seek method moves the file position to the given offset from the given reference point. Reference points are 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file.

Output: Open file position: 60 Seek file position: 0 File text: Temporary Python Demo File - Appended to the end of the file

The os.rename Method
The os.rename method renames the given source file or directory the given destination name.

Output: Renamed /home/ubuntu/workspace/__python_demo.tmp to /home/ubuntu/workspace/__python_demo2.tmp

The os.remove Method
The os.remove method removes (deletes) the given file.

Output: Removed /home/ubuntu/workspace/__python_demo2.tmp

The sys.argv Property
The sys.argv property returns the list of command line arguments passed to a Python script. argv[0] is the script name.

Output: sys.argv[0]: /home/ubuntu/workspace/argv.py sys.argv[1]: test1 sys.argv[2]: test2

Tutorials

 * 1) Complete one or more of the following tutorials:
 * 2) * TutorialsPoint
 * 3) ** Files I/O
 * 4) * Codecademy
 * 5) ** File Input and Output
 * 6) * Wikiversity
 * 7) ** Python/Files
 * 8) * Wikibooks
 * 9) ** A Beginner's Python Tutorial/File I/O

Practice

 * 1) Create a Python program that displays high, low, and average quiz scores based on input from a file. Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to parse the file and add each score to a list. Display the list of entered scores sorted in descending order and then calculate and display the high, low, and average for the entered scores. Include error handling in case the file is formatted incorrectly. Create a text file of names and grade scores to use for testing based on the following format:
 * 2) Create a Python program that asks the user for a file that contains HTML tags, such as:     Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to search for and remove all HTML tags from the text, saving each removed tag in a dictionary. Print the untagged text and then use a function to display the list of removed tags sorted in alphabetical order and a histogram showing how many times each tag was used. Include error handling in case an HTML tag isn't entered correctly (an unmatched ). Use a user-defined function for the actual string processing, separate from input and output. For example:
 * 3) Create a Python program that asks the user for a file that contains lines of dictionary keys and values in the form:             Keys may contain spaces but should be unique. Values should always be an integer greater than or equal to zero. Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to parse the file and build a dictionary of key-value pairs. Then display the dictionary sorted in descending order by value (score). Include input validation and error handling in case the file accidentally contains the same key more than once.
 * 4) Create a Python program that checks all Python (.py) files in a given directory / folder. Check for a folder path parameter passed from the command line. If there is no parameter, ask the user to input a folder path for processing. Verify that the folder exists and then check all Python files in the folder for an initial docstring. If the file contains an initial docstring, continue processing with the next file. If the file does not start with a docstring, add a docstring to the beginning of the file similar to:     Add a blank line between the docstring and the existing file code and save the file. Test the program carefully to be sure it doesn't alter any non-Python files and doesn't delete existing file content.

File Concepts

 * A file system is used to control how data is stored and retrieved. There are many different kinds of file systems. Each one has different structure and logic, properties of speed, flexibility, security, size and more.
 * File systems are responsible for arranging storage space; reliability, efficiency, and tuning with regard to the physical storage medium are important design considerations.
 * File systems allocate space in a granular manner, usually multiple physical units on the device.
 * A filename (or file name) is used to identify a storage location in the file system.
 * File systems typically have directories (also called folders) which allow the user to group files into separate collections.
 * A file system stores all the metadata associated with the file—including the file name, the length of the contents of a file, and the location of the file in the folder hierarchy—separate from the contents of the file.
 * Directory utilities may be used to create, rename and delete directory entries.
 * File utilities create, list, copy, move and delete files, and alter metadata.
 * All file systems have some functional limit that defines the maximum storable data capacity within that system.
 * A directory is a file system cataloging structure which contains references to other computer files, and possibly other directories.
 * A text file is a kind of computer file that is structured as a sequence of lines of electronic text.
 * MS-DOS and Windows use a common text file format, with each line of text separated by a two-character combination: CR and LF, which have ASCII codes 13 and 10.
 * Unix-like operating systems use a common text file format, with each line of text separated by a single newline character, normally LF.

Python Files

 * The os.getcwd method returns a string representing the current working directory.
 * The os.chdir method changes the current working directory to the given path.
 * The os.path.isdir method returns True if the given path is an existing directory.
 * The os.path.join method joins one or more path components intelligently, avoiding extra directory separator (os.sep) characters.
 * The os.mkdir method creates a directory with the given path.
 * The os.rmdir method removes (deletes) the directory with the given path.
 * The os.walk method generates the subdirectories and files in a given path as a 3-tuple of a path string with subdirectory list and filename list.
 * The os.path.isfile method returns True if the given path is an existing file.
 * The open function opens the given file in the given mode (read, write, append) and returns a file object.
 * The file.write method writes the contents of the given string to the file, returning the number of characters written.
 * The file.close method closes the file and frees any system resources taken up by the open file.
 * The file.read method reads the given number of bytes from the file, or all content if no size is given, and returns the bytes that were read.
 * For reading lines from a file, you can loop over the file object using a for loop. This is memory efficient, fast, and leads to simple code.
 * The file.tell method returns an integer giving the file object’s current position in the file.
 * The file.seek method moves the file position to the given offset from the given reference point. Reference points are 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file.
 * The os.rename method renames the given source file or directory the given destination name.
 * The os.remove method removes (deletes) the given file.
 * The sys.argv property returns the list of command line arguments passed to a Python script. argv[0] is the script name.
 * Python text mode file processing converts platform-specific line endings (\n on Unix, \r\n on Windows) to just \n on input and \n back to platform-specific line endings on output.
 * Binary mode file processing must be used when reading and writing non-text files to prevent newline translation.

Key Terms

 * catch
 * To prevent an exception from terminating a program using the try and except statements.


 * newline
 * A special character used in files and strings to indicate the end of a line.


 * Pythonic
 * A technique that works elegantly in Python. “Using try and except is the Pythonic way to recover from missing files”.


 * Quality Assurance
 * A person or team focused on insuring the overall quality of a software product. QA is often involved in testing a product and identifying problems before the product is released.


 * text file
 * A sequence of characters stored in permanent storage like a hard drive.

Assessments

 * Flashcards: Quizlet: Python File Processing Commands
 * Quiz: Quizlet: Python File Processing Commands