Python Concepts/Files

Objective

 * Learn a little bit about files.
 * Learn about the built-in function.
 * Learn how to read, write to, and seek in a file.
 * Learn about abstractions that act like files.
 * Learn about the optional parameters for.
 * Learn how to rename, remove, move, and create files.

What's A File
With work on a computer, files are usually used in daily tasks. You may spend your days writing word processor documents for a news company or you may like to listen to your mp3 files during your free time. You most likely already have an abstract idea of what a file is: a piece of information that's stored on a disk.

So what exactly is a file? A file is just a group of 1's and 0's that are stored on the disk. Since the operating system takes care of managing them, you don't have to worry about their technical details.

The data within a file may be as simple as a few words of text, or nothing at all ; the data may be the audio and video of one of your favorite movies; a file may contain enormous quantities of numbers to be used to predict the path of a hurricane or to predict the existence of extra-terrestrial planets, or to monitor the national debt.

Whatever the size and shape of a file, the usual operations on files are open, read and/or write, and close.

To open a single file in Python, use the built-in function  This code illustrates opening, reading and closing a file: The content of file 'test.txt': Hello, world! Hola, mundo! Goodbye, world!

Opening A File
The concept of opening a file seems simple enough, but it leads to significant questions:

Does the file exist? If not, why not?

If the file exists, is it where you expect it to be? If so, do you have permission to open it? You may be able to open it for reading, but what about writing or truncating the file?

If you open the file, is it OK if somebody else opens it while you have it open? You might decide to lock all others out of the file while you have it open. If so, normal etiquette requires that you do what you have to do quickly and then close it so that others may access it.

Computer scientists prepare for errors and handle them gracefully. Therefore, the above code is rewritten to handle errors:

Handling Errors
In the example immediately above there is more error-handling code than operational code. If you think this is unrealistic, remember that software engineers are notorious for overestimating their ability and underestimating the time to complete a given project. Simple mistakes can lead to disastrous and expensive consequences.

Milstar: Military Strategic and Tactical Radar --- The third launch in the series, 30 April 1999, failed because an engineer entered one parameter as -0.1992476 instead of the correct -1.992476. More than one billion dollars (that's billion with a 'b') was wasted.

"Milstar satellite overview" This page doesn't mention the failed third launch.

"History of Milstar" From Wikipedia.

"A single error can kill a mission"

"Examples from the Launch World"

Close to home -- The following code is copied directly from a famous instructional book for Perl: Can you spot the error? Rewrite the code to catch potential errors: Execution of this piece of code fails at  The code should be: Your attitude may be: "It doesn't matter about the  because the operating system closes the directory when the application exits." If so, the code should be: Catching errors often reveals simple mistakes in software that can go undetected for a long time.

python's statement
python's  statement simplifies handling errors during operations on files:

On exiting the body of the  statement (for any reason) python closes an open file: status1 = False status2 = True status3 = False Error detected in "with" statement. status4 = True

Function
Function  provides more information about the file. It returns a stat_result object.

On the Unix command line: $ ls -laid test1.txt 24051862 -rw-r--r-- 1 user  staff  249 Sep 15 06:53 test1.txt $ date -r 1505476401 Fri Sep 15 06:53:21 CDT 2017 $ is the inode displayed by the Unix command

is the size displayed by the Unix command

is the creation time displayed by the Unix command

Seeking within a text file
Within text files the method  has limited functionality:

f.read
may contain an optional argument  where   is numeric, in which case at most   bytes are read and returned. The reference states: When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory.

f.readline
To iterate using Hello, world! Hola, mundo! Goodbye, world! may contain an optional argument  where   is numeric, in which case at most   bytes are read and returned. 0 Hello, w 8 orld! 14 Hola, mu 22 ndo! 27 Goodbye, 35 world!

File object as iterable
Hello, world! Hola, mundo! Goodbye, world! Display all lines in the file that contain the word 'world':

Reading international text
File test1.txt contains: η ρωμαϊκή μυθολογία (Roman mythology) Всероссийская перепись населения 2010 года (2010 All-Russia Population Census) ..../...//..../...//..../...//..../...//..../...//..../...//..../...//..../...// The last line is included to facilitate counting characters. Without knowing anything about Greek or Russian we see immediately that the Greek characters for iota ϊ ί are different, as are the Russian characters и й. The next thing to notice is that the file contains 249 bytes, but each line contains 38, 79, 81 characters or 198 characters for the whole file. Not to worry. In text files Python performs the appropriate encoding and decoding nicely: 38 η ρωμαϊκή μυθολογία (Roman mythology) 79 Всероссийская перепись населения 2010 года (2010 All-Russia Population Census) 81 ..../...//..../...//..../...//..../...//..../...//..../...//..../...//..../...// The same again with different detail: 0 η ρωμαϊκή μυθολογία (Roman myt 47 hology) 55 Всероссийская перепись населен 113 ия 2010 года (2010 All-Russia 149 Population Census) 168 ..../...//..../...//..../...// 198 ..../...//..../...//..../...// 228 ..../...//..../...// 249 Length of first line in bytes = 55. Length of first line in characters = 38.

Length of second line in bytes = 168-55 = 113. Length of second line in characters = 79.

Length of third line in bytes = 249-168 = 81. Length of third line in characters = 81.

The first invocation of  read 30 characters (17 Greek and 13 English) in 47 (17*2 + 13) bytes, 47-30 = 17 extra bytes for 17 Greek characters. Similarly, the fourth invocation of  read 30 characters (6 Russian and 24 English) in 36 (149-113) or (6*2 + 24) bytes, 36-30 = 6 extra bytes for 6 Russian characters. When you're reading international text, the number of bytes read will be, almost certainly, more than the number of characters read.

Take care if you reposition the stream into the middle of international text: Traceback (most recent call last): File "t3.py", line 11, in    s = f.readline(30) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte

Reading from multiple input streams
The function  is the primary interface of this module: Successive invocations of the stream do not require reopening or resetting the stream:

The functions  provide information about the stream: filename = test.txt Hello, world! lineno = 1, filelineno = 1. filename = test.txt Hola, mundo! lineno = 2, filelineno = 2. filename = test.txt Goodbye, world! lineno = 3, filelineno = 3. filename = test1.txt η ρωμαϊκή μυθολογία (Roman mythology) lineno = 4, filelineno = 1. filename = test1.txt Всероссийская перепись населения 2010 года (2010 All-Russia Population Census) lineno = 5, filelineno = 2. filename = test1.txt ..../...//..../...//..../...//..../...//..../...//..../...//..../...//..../...//   lineno = 6, filelineno = 3. The null file  was silently ignored.

Writing text to a disk file
The function  may be used to write to a file in text mode or binary mode.

In text mode it returns the number of characters written, in binary mode the number of bytes.

length_of_s = 125 number_written = 125 end_of_file = 163

On the UNIX command line: $ wc test1a.txt 3     17     163 test1a.txt $ The UNIX executable   (word count) shows that the output file contains 3 lines, 17 words and 163 bytes. The size in bytes agrees with end_of_file above.

The file actually contains 16 words. sees 'μυθολογία' as 2 words, but Python performs the output without error.

The difference between 163 and 125 (38) is because of the inclusion of 17 Greek letters and 21 Russian. Each of the international characters requires 2 bytes.

Brief review of binary conversion
An  data type is conceptually a sequence of bytes. However, an  cannot be written to disk directly. Before an  can be written to disk in binary format, it must be converted to, eg, a   object or  This section illustrates the conversion to and from by means of examples.

Methods and
Method  simplifies conversion from int to bytes.

Class method  simplifies the reverse.

The following code ensures that the integer produced after encoding and decoding is the same as the original int:

Writing to disk in binary mode
Integer i1 contains binary data. Convert i1 to bytes and write to disk. i1 = 0x13aaf504e4bc1e62173f87a4378c37b49c8ccff196ce3f0ad2 # original int b1 = b'\x13\xaa\xf5\x04\xe4\xbc\x1eb\x17?\x87\xa47\x8c7\xb4\x9c\x8c\xcf\xf1\x96\xce?\n\xd2' # original int as bytes b1a is b1 expanded so that each byte is expressed as '\xHH': b1a = b'\x13\xaa\xf5\x04\xe4\xbc\x1e\x62\x17\x3f\x87\xa4\x37\x8c\x37\xb4\x9c\x8c\xcf\xf1\x96\xce\x3f\x0a\xd2' b1 == b1a : True # b1a matches the hex representation of i1.

len(b1) = 25 number_written = 25 end_of_file = 25

$ od -t x1 test.bin 0000000   13  aa  f5  04  e4  bc  1e  62  17  3f  87  a4  37  8c  37  b4 0000020    9c  8c  cf  f1  96  ce  3f  0a  d2                            0000031 $

Reading from disk in binary mode
The file on disk contains a large int in bytes format. Read the file from disk and convert to int. isinstance(b2, bytes): True len(b2) = 25 end_of_file = 25 i2 == i1: True

Reading text in binary mode
Data may be read from a text file in binary mode. The disadvantage is that python does not automatically perform the necessary decoding. isinstance(b5, bytes): True len(b5) = 163 bytes end_of_file = 163

b5 = ( b'\xce\xb7 \xcf\x81\xcf\x89\xce\xbc\xce\xb1\xcf\x8a\xce\xba\xce\xae ' + b'\xce\xbc\xcf\x85\xce\xb8\xce\xbf\xce\xbb\xce\xbf\xce\xb3\xce\xaf\xce\xb1 (Greek characters)\n' + b'\xd0\x92\xd1\x81\xd0\xb5\xd1\x80\xd0\xbe\xd1\x81\xd1\x81\xd0\xb8\xd0\xb9\xd1\x81\xd0\xba\xd0\xb0\xd1\x8f ' + b'\xd0\xbf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xbf\xd0\xb8\xd1\x81\xd1\x8c (Russian or Cyrillic)\n' + b'The Quick BROWN foX (English characters)\n' )

b5 decoded = η ρωμαϊκή μυθολογία (Greek characters) Всероссийская перепись (Russian or Cyrillic) The Quick BROWN foX (English characters)

length of b5 decoded = 125 characters

Seeking in binary mode
works as expected in binary mode:

Truncating a file in binary mode
Function  uses a file descriptor.

File is writable: True File is readable: True b = b'\x13\xaa\xf5\x04\xe4\xbc\x1eb\x17?\x87\xa47\x8c7\xb4\x9c\x8c\xcf\xf1\x96\xce?\n\xd2' # Original contents. Truncate the file to current size - 10 b = b'\x13\xaa\xf5\x04\xe4\xbc\x1eb\x17?\x87\xa47\x8c7' # Last 10 bytes removed. Truncate the file to current size + 10 b = b'\x13\xaa\xf5\x04\xe4\xbc\x1eb\x17?\x87\xa47\x8c7\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' # 10 null bytes added at end. Truncate the file to 4 bytes b = b'\x13\xaa\xf5\x04' # Truncated to 4 bytes.

External operations on files
External operations are similar to those executed on the Unix command line. They do not depend on having a file open before execution:

Creating a file
On the Unix command line: $ ls -la test1.txt test5.txt ls: test5.txt: No such file or directory -rw-r--r-- 1 user  staff  249 Sep 15 06:53 test1.txt $ On the Unix command line: $ ls -la test1.txt test5.txt -rw-r--r-- 1 user  staff  249 Sep 15 06:53 test1.txt -rw-r--r-- 1 user  staff    0 Oct  4 08:28 test5.txt # File was created. $

Truncating a file
On the Unix command line: $ ls -la test1.txt test5.txt -rw-r--r-- 1 user  staff  200 Oct  4 08:37 test1.txt -rw-r--r-- 1 user  staff  150 Oct  4 08:38 test5.txt $ od -h test5.txt 0000000     0000    0000    0000    0000    0000    0000    0000    0000 0000220      0000    0000    0000                                        0000226 $ # test5.txt contains 150 null bytes after truncation.

Accessing a file
may be used to determine the existence, readability, writability and executability of path.

$ ls -la t*n --w-r- 1 user  staff  25 Sep 29 13:12 test.bin -rw-r--r-- 1 user  staff  25 Sep 29 08:03 test1.bin $

Changing a file's u,g,o permissions
may be used to change the permissions for user, group and other.

On the Unix command line: $ ls -la t*n --w-r- 1 user  staff  25 Sep 29 13:12 test.bin -rw-r--r-- 1 user  staff  25 Sep 29 08:03 test1.bin $ On the python command line: On the Unix command line: $ ls -la t*n rwx--- 1 user  staff  25 Sep 29 13:12 test.bin # Permissions changed to r,w,x for group. -rw-r--r-- 1 user  staff  25 Sep 29 08:03 test1.bin $

Renaming a file
may be used to rename a file.

On the Unix command line: $ ls -la test1*t -rw-r--r-- 1 user  staff  200 Oct  4 08:37 test1.txt -rw-r--r-- 1 user  staff  163 Sep 18 18:07 test1a.txt $ On the python command line: On the Unix command line: $ ls -la test1*t -rw-r--r-- 1 user  staff  200 Oct  4 08:37 test1a.txt $ The old file  was silently deleted.

and  are almost identical with slight differences dependent on Operating System.

Removing a file
may be used to remove (delete) a file.

On the Unix command line: $ ls -la test1*t -rw-r--r-- 1 user  staff  200 Oct  4 08:37 test1a.txt $ On the python command line: On the Unix command line: $ ls -la test1*t ls: test1*t: No such file or directory $ The file  was deleted.

This function and  are semantically identical.

The terminal
On Unix each terminal window has its unique device name, a name that looks like a file name, eg, /dev/ttys003. Communication with the console may be achieved by treating the console like a file:

$ cat t5.py

$ python3.6 t5.py Name of my terminal is: /dev/ttys003 File object opened for writing to my terminal is: <_io.TextIOWrapper name='/dev/ttys003' mode='wt' encoding='UTF-8'> Enter your date-of-birth [mm/dd/yyyy]: 12/31/1999 # Enter dob followed by new-line and ^D for end-of-file. File object opened for reading from my terminal is: <_io.TextIOWrapper name='/dev/ttys003' mode='r' encoding='UTF-8'> You entered: 12/31/1999 $

Pipes
creates a pipe:

Standard input is usually file descriptor 0, standard output is 1, and standard error is 2. Further files opened by a process will then be assigned 3, 4, 5, and so forth. Hence file descriptors 3 and 4 above.

The pipe implements a fifo, first-in-first-out queue. Data added to the pipe is appended to the data in the pipe. Data removed from the pipe is read from the beginning of data in the pipe.

Before trying to read data from a pipe ensure that there is data in the pipe: number_of_bytes_in_pipe = 10 data = b'9' data = b'8' data = b'7' data = b'6' data = b'5' data = b'4' data = b'3' data = b'2' data = b'1' data = b'0' number_of_bytes_read_from_pipe = 10

It seems that functions do not work with pipes.

File objects and file descriptors
On the Unix command line: $ ls -l test.txt ; cat test.txt -rw-r--r-- 1 user  staff  43 Sep  9 17:26 test.txt Hello, world! Hola, mundo! Goodbye, world! $

The function  returns a file object and silently creates a file descriptor:

Multiple file objects with same file descriptor
Two or more file objects may have the same file descriptor: If you have two file objects associated with the same file descriptor and you close one of the file objects, the behavior of the other may be unpredictable. Unless you really know what you're doing, when you close one file object of many associated with the same file descriptor close them all. Also, don't close the file descriptor (with ) before closing the file object. This creates a really messy situation.

Temporary files
python's  module contains functions that can be used to generate temporary files. Depending on the function and parameters used, the file created may or may not be visible on the file system, it may or may not be deleted when the file is closed, and it may or may not be opened in binary mode.

Function  is representative of the functions available for file creation in module

Opening a temporary file in text mode for deletion on closing:

Opening a temorary file in binary mode for retention on closing:

The temporary file exists after closing: $ ls -la /var/folders/8s/dctgn1y57fgc9_h2mzckbqs80000gn/T/tmpdz14i44i.bin -rw--- 1 user  staff  26 Oct 16 07:22 /var/folders/8s/dctgn1y57fgc9_h2mzckbqs80000gn/T/tmpdz14i44i.bin $

Further Reading or Review

 * Previous Lesson: Console Input
 * This Lesson:Files
 * Next Lesson: Directories
 * Course Home Page