next up previous contents
Next: UNIX miscellany Up: No Title Previous: Do you have to

Files

File names

File names in UNIX are arbitrary strings of up to 255 characters. If you wish you can have embedded spaces within the file name, and even control characters such as the ASCII bell. However, if you use such ridiculous names you will find it very difficult to refer to them with normal commands (partly because UNIX commands typed at the keyboard assume that files don't contain spaces). If you need to transfer files between computers running different operating systems, you should use compatible filenames. As a general rule, restrict names to lowercase alphabetics or numbers, followed by a dot, followed by up to three alphabetics/numbers, e.g., file.txt, myprogram.f, thesis.tex.

Note that UNIX does not have version numbers like the VMS operating system.

It is important to realise that UNIX distinguishes between upper and lower-case characters, file.txt, FILE.TXT, and FiLe.txt are entirely different files. In general, it is best to use lowercase letters only. If you transfer files from a VAX to UNIX by using ftp you may find that the filenames appear in your UNIX directory in uppercase with what appear to be version numbers appended. You can get rid of the trailing version number by using

mmv "*\;*" "=1"
Type man mmv for documentation on mmv. Note that mmv is a locally added piece of public domain software (it appears to be a bit flaky, e.g., the lowercase conversion option doesn't work).

A more powerful alternative to mmv is rename, a perl script that uses regular expressions to provide extraordinary flexibility in changing filenames. For example, to convert filenames from uppercase to lowercase, you could do

rename y/A-Z/a-z/ *
Have a look in the file /usr/local/bin/rename for documentation, and in the perl manual for details on regular expressions.

You can achieve similar results using standard UNIX commands. For example, suppose that you have a number of VMS FORTRAN files ending in `.FOR;1' and you want to convert them to the UNIX standard `.f' extension without the version number, then you could type

foreach dummy (*.FOR;1)
  mv $dummy $dummy:r.f
end
Where dummy is a temporary variable which is used to hold the filenames to be processed, `:r' is a special syntax which says ``strip everything from the filename after (and including) the first period'', and the `.f' simply indicates to add a `.f' to the result. See man csh for more details.

Files that begin with a period `.' are treated specially by the ls command. In particular, they do not appear in your directory listing unless you use the -a (for all) option (i.e, ls -a). The reason for this is that such files are generally used during the operation of certain system commands. If they did appear in the directory list, users may be tempted to delete them to tidy up their directories, thereby causing all sorts of problems. As you get more experienced with UNIX you will learn what all the `.' files are for, but initially just leave them alone.

Displaying the contents of a file

To examine the contents of a file use the less command, e.g.,

less thesis.tex

If the file is too big to fit on one page, less will display the first page and then wait for you to type a command. Commands are single characters, and do not require a carriage-return to terminate them. Type h to get a list of all the possible commands. The most common commands are the space bar to get the next full page of the file, carriage-return to go forward one line, b to go back a page, q to quit.

less is actually a locally added enhancement, and is an improvement on the usual UNIX pager more. One of the main advantages of less over more is that you can page backwards through pipes, e.g.,

ps -aux | less

will allow you to scroll up and down in the output from the ps -aux command, whereas more throws away output once you have read it. less also handles non-graphic ASCII characters better than more.

On an Xterminal you can use xless, which has a graphical user interface, instead of less.

Editing a file

  The editor emacs is the most powerful and flexible editor available on the workstation. It is also a standard in the UNIX world. Use it!!

!!

Alternatives are axe (a nice X11 editor with pull-down menus), vi (a standard editor that you will find on all UNIX systems; quite powerful, but very confusing at first--do not try using vi until you have read about it since the command to exit vi is impossible to guess; there is a tutorial available--just type vilearn), ed (a simple line-editor that is sufficiently similar to other line-editors that it can be picked up quickly), dxnotepad (a window-based editor that you can use from an Xterminal--it is very easy to use, but quite limited in functionality--avoid), xedit (another Xterminal editor, from the standard MIT distribution), and joe (Joe's Own Editor--a simple full screen editor that has been specifically designed to be easy to use).

Recommendations:

emacs has the advantage of being available on many different operating systems (e.g., MS-DOS, VMS, UNIX), and (with the possible exception of vi) is the most widely used and supported editor in the UNIX environment. However, emacs can be a bit daunting when you first come to use it, particularly if you see the 300 page manual (and the 600 page supplement). Nevertheless, emacs is a superb text editor and well worth learning. Fortunately there is an on-line tutorial (see below for details) which explains most of the commands you need to know to get going.

There are two ways of using emacs. The first is the traditional way of starting it up for each new file that you want to edit, e.g.,

emacs file! not the best way to use emacs

You then edit the file, and when you have finished use C-x C-c (i.e., control-X followed by control-C) to save the file and exit emacs. However, there is a more efficient and convenient way of using emacs, and that is to start it up once at the beginning of your login session by typing

emacs

and then read-in a file to edit with the C-x C-f command. You can read in multiple files and edit them in separate windows on your terminal screen (if you have an Xterminal). When you have finished, type C-x s (i.e., control-X followed by s) to save the files, and then C-z to suspend emacs. If you type jobs you will see your emacs editing session listed as a stopped job. The trick is that you can now reconnect to that same copy of emacs and continue editing right where you left off. To reconnect to emacs, type

%emacs

In fact, most times you could type %em or %e to reconnect (or even simply % if you have no other jobs), or %n where n is the number of the emacs job in the jobs list. See man csh for more details.

Don't make the mistake of starting a new emacs each time you want to edit a new file. Rather, reconnect to your existing emacs and read-in a new file with C-x C-f. You will find this is a much faster way of editing once you get used to it.

Anyway, once you have emacs running, try the on-line tutorial (which is very good) by typing C-h t. emacs makes a lot of use of the ``meta'' key, e.g., it may ask you to type M-x to perform some function. You can get the effect of the ``meta'' key by pressing and holding down the COMPOSE key on an Xterminal, or by pressing and releasing the ESC key (F12 on many terminals, or C-[ if you are really desperate).

Note: emacs finds out what sort of terminal you have by examining the environment variable TERM. emacs will refuse to begin an editing session if it thinks that your terminal is not powerful enough (e.g., it doesn't have the ability to scroll). This can happen if you login from a modem pool, in which case TERM is often set to network. To stop the problem (if you really do have a powerful enough terminal), try

setenv TERM vt100

One problem with emacs is that the author, Richard Stallman, has used C-q and C-s as commands. The difficulty with this is that these two characters are used by most terminals and computers for flow control (i.e., if a computer receives a C-s it stops sending output to the terminal until it receives a C-q). There are two ways around this problem: (1) to prevent the computer/terminal from interpretting C-q, C-s processing (also known as XON/XOFF), and (2) to tell emacs to use different characters for the commands that C-q and C-s would normally invoke. The former method is ideologically superior, but is often tricky to do, particularly if you are logged in via modems and terminal switches all of which can potentially intercept C-q and C-s.

Printing

Various line printers and PostScript laser printers are connected to newt via Xterminals. To find a definitive list of the available printers, have a look at the file /etc/printcap (the ``printer capabilities'' file). As of laste 1994 there were six printers available in the School of Physics:

The laser printers accept PostScript files only. The colour printer accepts Postscript or HP PCL 5. You can recognize PostScript files by the fact that their first line usually starts with the characters %! and their contents look a bit like a terse programming language. Also, the file name usually ends with .ps. If you try to send any other sorts of files the printer will ignore them, and you won't get any indication of what happened. To display a PostScript file on an Xterminal before printing it, use GhostScript (i.e., gs file.ps), or ghostview (i.e., ghostview file.ps).

Important note: laser printers are expensive devices on which to print so please try and keep your usage of them to the necessary minimum. It is particularly unfriendly, for example, to print out numerous copies of a long article or thesis chapter simply to correct a small typesetting error on page 43. Also, don't use the laser printer as a photocopier or for jobs that would normally be printed on a line printer (e.g., dozens of pages of numerical results from a program). Use options on your software to only print those pages that you want (e.g., dvips -n1 -p43 chapter16 will print one page beginning at page 43 of the file chapter16.tex).

The command for printing files is lpr. You specify the printer that you want to use by using a command-line switch (e.g., -Pp206), or by setting the environment variable PRINTER. The default printer is ``p206''.

Some examples will make this clearer:

To print a PostScript file to the default printer (usually ``p206'') use

lpr file.ps

to print on the HP LJIII in the second year lab, use

lpr -PHP2 file.ps

To change the default printer from p206 to, say, p59, do the following

setenv PRINTER p59

which will affect all lpr commands in the current login session. Put the above line in your .login file if you always want to use this printer.

The lpr command works by copying the file you specify to a special place on the hard disk and adds it to the queue of files to be printed (so any modifications that you make to the file between typing the lpr command and when the file is printed, do not appear on the printed output). You can examine the queue for the default printer with

lpq

or use

lpq -Pp1

for printer p1, and you can remove files from the queue with

lprm job_number

which takes an optional argument giving the number of the file in the queue (obtained from the output of lpq), e.g.,

lprm 7823

will remove job number 7823. As you would expect, lprm can be used with the -P option to delete files from other print queues.

To print an ASCII text file on a laser printer, you have to convert the text to PostScript first, and you can do this with the program a2ps. For example,

a2ps shortprogram.f | lpr

will convert the ASCII text file shortprogram.f (presumably containing the text of a small FORTRAN program) into PostScript, and pipes the result to the printer (see ``man a2ps'' for a list of all the options that a2ps supports). However, you are strongly urged not to use the laser printer for printing large ASCII text files, since it is much cheaper to print them on a line printer.

Numerous other printers can be used, have a look in the file /etc/printcap for a complete list.

Problems with printing

It occasionally happens that you may submit a job to the printer and nothing comes out. If your job has disappeared from the queue (as found by using lpq), it may still be being processed by the printer (some of which have large input buffers). You can determine what the HP printers are up to by examining their status displays. In some circumstances you may need to reset the printer by pushing the off-line button and then pushing (and holding for a few seconds) the continue/reset button. Other things to look for include:

To check whether your file is a valid PostScript file, you can preview it on an Xterminal using gs file.ps&. gs stands for GhostScript, and is a public domain PostScript previewer written by GNU. ghostview is a graphical interface to GhostScript, and may be prefered by some people.

Directories

Files are stored in directories. Directories may have subdirectories, and so on without limit. When you login you will be in positioned in your ``login directory''. You can find out what files are in your directory using

ls

which stands for ``list directory''. To get an extended list of the files, try

ls -l

which will tell you the dates the files were last modified, the number of bytes in each file, who has privileges to access them, and a few other bits of information. The ls command has lots of other options (for example, ls -lt will list your files in increasing order of age), see man ls for details. If the directory listing scrolls off the page, use less to page it, i.e, type ls -l|less.

If you are familiar with MS-DOS programs such as XTREE, you might like to try the UNIX program utree. See man utree and man utree.prlist for more information. Briefly, utree gives you a graphical display of your files and directories in a form which you can easily navigate.

To change your current directory, use the

cd

command, which stands for ``change directory''. Some examples follow,

cd                ! returns to your login directory
cd ~mcba          ! goes to the login directory of user mcba
cd subdir         ! goes to the subdirectory subdir
cd ..             ! goes up one level of directories
cd /usr/local/man ! goes to the directory /usr/local/man
cd ~/tex          ! goes to the subdirectory tex in your login directory
cd ~/..           ! goes to the directory above your login directory
To compare with MS-DOS and VMS,
cd ~john/data                            ! UNIX
cd \john\data                            ! MS-DOS
set default sys$login_device:[john.data] ! VMS
Note that cd by itself under UNIX will return you to your login directory, whereas under MS-DOS it prints your working directory. To print your current directory under UNIX, type

pwd

which stands for ``print working directory''.

Note that there is no equivalent to the MS-DOS disk drive designators ( A:, B:, C:, etc.) or the VMS disk names (CSDVAXDBA0:, CSDVAXDBA1:, etc.). This makes for a much simpler and more logical file system structure. In UNIX, disks are mounted as parts of the file system, e.g., /usr/users is a directory which contains many subdirectories, several of which are actually other disks. In general you don't need to worry about this. You can use

df

to find out where the disks are mounted, and how much room is available on each disk, and

du

to obtain details of the amount of storage that you are using from the current directory downwards.

Copying, renaming, and deleting files

To copy a file use

cp file.in file.out       ! where file.out is the output file name
cp thesis.tex ~jtp/tex    ! if ~jtp/tex is a directory, then the new
                          ! file will be called ~jtp/tex/thesis.tex
To rename a file

mv original.name new.name ! mv stands for move

Note that on newt the default behaviour of cp and mv has been altered so that these commands will prompt you before overwriting existing files. You will find that this is not the case on any other UNIX system on campus.

To move a lot of files to another directory

mv data.* ~/subdir
Note that you can not rename files by doing things like mv DATA.* data.*. If you want to things like that, have a look at the file /usr/local/bin/rename, or check out man mmv and/or read man csh carefully.

To copy an entire directory tree do

(cd old_dir; tar cf - .) | (cd new_dir; tar xpf -)

To delete a file

rm file.to.delete! rm stands for remove

Note that this command will ask you whether you really want to delete the file. Answer y if you want to, just hit carriage-return if you don't want to. On most UNIX systems the rm command will not prompt you, and will quietly do its job. To force the deletion of some files (useful in the case that there are many files that you wish to delete, and you don't want to answer y to all of them), do

rm -f *.dat

where the -f stands for ``force''.

Note that newt also has a public domain delete/undelete package which allows you to recover files removed with the delete command (but not with the rm command). delete works by renaming the specified files with leading periods so that they don't appear in a directory listing. undelete simply renames the files back again. At intervals the operating system will delete (permanently) all the files that you marked for deletion. You may like to use this system in preference to the ``sudden-death'' operation of the rm command.

File protection

  UNIX allows you to choose who can have access to your files. File have three types of permissions: read, write, and execute (abbreviated r, w, and x). Normally you have read and write permission on all your files, and read, write, and execute permission on all your programs and on directories (a directory must have execute permission otherwise you can't search it for filenames). When a person tries to access a file, UNIX distinguishes three types of people: the owner of the file, people in the same group as the owner, and everyone who is not in either of the first two categories. You can decide, as the owner of the file, what permissions you give to the three classes of users. To find out what the permissions are, use the following command

ls -l myfile.tex

which will result in a line something like

-rw-rw-rw- myfile.tex

The first strange group of 10 characters is the permission matrix for the file. The first character is special, don't worry about it for the time being (if it is `d' the file is a directory). The next 9 characters are the read, write, and execute permissions for the owner, the group, and others (i.e., everyone else), respectively. To change the permissions to something else you need to use the chmod command (stands for ``change mode''). There are two ways of using chmod: the first is to specify the file permissions absolutely using a three digit octal number, and the second is use a mnemonic argument to add or subtract individual permissions.

To explain the octal code method, imagine the 9 last characters of the permission matrix as representing the bits in three octal numbers, for example, rw-rw-rw- would be 666, rwx--- would be 700, rwxr-xr-x would be 755, and so on (don't be worried if this sounds too complicated, there is an easier method to use which will be explained shortly). Now, to make a file accessible to users within your group (normally those within your department), use the command

chmod 640 file

which will change the permission matrix to rw-r---, which means that you can read and write the file, members of your group can read it, and everyone else has no access. You can use wildcards in the file specification to change the mode of many files at once.

To find out which group you are in, use the id command. To find out who else is in your group, look in the /etc/group and /etc/passwd files. The command

ls ~/..
will give you a list of all the people in your default login group. Note that you can belong to more than one group. You can display the group associated with a file using

ls -lg

and you can change the group with the chgrp command (see man chgrp for more information). If you wish to change the groups that you are in, or would like to set up a new group for some purpose (such as sharing files) then send mail to mcba (Michael Ashley) with your request.

To allow all users read access to one of your directories, use

chmod 755 dirname

The default permission for the Physics' workstation is 600 for normal files and 700 for directoriesgif, which means that no other users can read or write your files (although you top level directory will most likely be readable by people in your group). If you like, you can change the default permissions by editing your .login file. 600 is good for security since it means that if you are working on sensitive documents, the files you create will automatically have maximum protection. However, it is a real nuisance if you want to share files with other users.

One thing to watch out for is that if someone has write permission to a directory, then they can delete any files in that directory whether or not they have any access to the individual files.

The alternative (and usually more convenient) method of using the chmod command is to use the mnemonics u, g, and o for user (i.e., you), group, and other, respectively, and r, w, and x for read, write, and execute, and = to set permissions, and + and - to make relative changes. For example,

chmod go+r *

will set read permission for all users except yourself (i.e, group and other) for all files in the current directory.

chmod u+x file

will make the file executable by you (this is often used to allow you to run shell scripts by typing the filename). A more complex example, albiet not a particularly useful one is

chmod -R u+w,g-r,o=rwx *

which will add write permission for you, take away read permission for users in you group, and give all other users read, write, and execute permission on all the files in your current directory and all lower directories (that's the meaning of the -R, for recursive, option).

It often happens that you want to give users in your group read access to a directory tree (i.e., a directory and all files/directories below it), and execute access to any programs that happen to be there. To do this, type

chmod -R g=u,g-w directory

which recursively gives everyone in your group the same access as you have, and then removes write access. If you want everyone to have access, simply replace the g in the above example with go, for `group and other'.

Giving another user access to a file

It often happens that you need to give another user access to one or more of your files. There are many ways of doing this, for example, the file can be mailed if it is a text file and small, or you could deprotect the file (and the directory in which it is stored) using the chmod command in the previous section. Sometimes the easiest way of giving a file to another user is to simply copy the file to the /tmp directory (which everyone can read) and deprotect it. The user who wants the file should use the mv command to ``move'' the file from /tmp to avoid extra copies lying around. For example,

cp ~user1/file_to_share /tmp            ! copy the file to /tmp
chmod 666 /tmp/file_to_share            ! deprotect the file so it can be read

mv /tmp/file_to_share ~user2/           ! the other user now copies the file
chmod ??? ~user2/file_to_share          ! protect the file as desired

Backing up files

The workstation has two backup devices available:

The Exabyte is ideal for storing large amounts of data and/or backing up important files. Before using the Exabyte drive you must seek instructions on its operation from Michael Ashley.

Before using any of the tape drives, you should first use the allocate command to give yourself sole control of the device, in order to prevent other users from accidently reading or deleting the files on your tape.

allocate

will allocate the Exabyte, and

allocate -t 1

will allocate the TK-50. Use these commands before you put your tape in the device. If someone else already has the device allocated, the allocate command will inform you of this, and you should then speak with the person to find out if they are still using the device. When you have finished with it yourself, remove your tape and then type deallocate (or deallocate -t 1).

To archive files to tape you generally use the tar (tape archive) command. For example to write a new tape containing all the files in your login directory and all subdirectories below it, use

mt -f /dev/nrmt0h rew       ! this rewinds the tape, if necessary
tar cvf /dev/nrmt0h ~       ! this writes the new tape archive
the c option to the tar command stands for ``create'', and means that you are creating an archive, the v option stands for ``verbose'', i.e., tar lists the filenames on your terminal as it writes them to tape, the f option allows you to specify the device that you are writing to (/dev/nrmt0h -- (the Exabyte tape drive). /dev/nrmt0h stands for ``no-rewind mag-tape number 0, high density''. The significance of the ``no-rewind'' is that after the operation is completed, the tape doesn't rewind to the beginning of the tape (as it would if you specified /dev/rmt0h).

To understand how to read/write tapes you need to know something about the format of data on tapes. A tape begins with a beginning-of-tape marker, and ends with an end-of-tape marker (usually pieces of reflective metal foil on the tape). Data is written as 8-bit bytes in records of user selectable length. Records are delimited by inter-record gaps. A file consists of one or more records, and is delimited by a tape-mark (a special pattern of magnetisation on the tape that can not be duplicated by data). The last file on the tape has two tape-marks after it, and this serves to indicate the logical end-of-tape (as opposed to the physical end-of-tape, which is indicated by the end-of-tape marker).

When you write data to a tape, you must start either at the beginning-of-tape, or immediately after a tape-mark. Any data that was on the tape after the starting point is irrevocably lost. It is not possible to extend tar archives by writing new records before the end-of-file tape-mark.

To move around a tape, you use the mt command. For example,

mt -f /dev/nrmt0h rew        ! rewinds the tape
mt -f /dev/nrmt0h fsf 1      ! moves forward by one file-mark
mt -f /dev/nrmt0h bsf 2      ! moves backwards by one file-mark
Note that you must use the ``no-rewind'' magnetic tape device, otherwise the tape will rewind itself after performing any mt command.

The tar command produces an archive of the files and directories that you request, and writes the archive as one file on tape. After writing the file, tar writes two tape-marks (to indicate the new logical end-of-tape position) and then positions the tape between the two tape-marks. You can then write another tar archive by simply issuing another tar command immediately after the first.

If you wish to add a tar archive to the end of tape that has been rewound, you need to first position the tape after the last file by using

mt -f /dev/nrmt0h fsf n
where the number ``n'' equals the number of tar archives on the tape. If you don't position the tape correctly prior to writing then you will lose all the information that existed downstream of your starting point.

To restore files from a tar archive to disk, you first position your working directory to where you wish the files to be written, then position the tape to the tar archive you wish to use, and finally use a tar extract command, e.g.,

cd ~/data
mt -f /dev/nrmt0h fsf 4
tar xpf /dev/nrmt0h

Note that you should take extreme care when using tar since it is easy to overwrite all your files. For example, if you accidently use the ``c'' option rather than the ``x'' option when trying to extract files from a tape archive, you will irrevocably damage the tar archive on tape (unless you have the ``write protect'' switch set on the tape cartridge), and it will be exceedingly difficult to recover any information downstream of the error.

We also have gtar available which is GNU's public domain version of tar. In general gtar is superior to tar, particularly in its ability to read tapes that have a few errors on them, and in saving files with long filenames, and in numerous other ways. So, instead of using tar in all the above examples, use gtar instead.

A convenient way to backup small files is to compress them (to reduce the space requirements) and send them to another computer (e.g., CSDVAX, don't forget to use a binary mode of transfer). This is a very sensible thing to do if you are working on important documents. To compress a file, use

compress myfile.tex

this will create a file called myfile.tex.Z which is often only about 35% of the size of the original file. To uncompress the file, use

uncompress myfile.tex.Z

and to send the file to another computer, see §gif on file transfer. To send a bunch of files, make a tar saveset of the files, compress the saveset, and then ftp it:

tar cvf junk.tar mydirectory
compress junk.tar
ftp csdvax.csd.unsw.edu.au

Buying tapes

To purchase a tape for use in the Exabyte drive, try the Computing Sales shop up the hill. The tapes are available in several different lengths, up to about 120 (minutes?), for a cost of around $20. They can be purchased overseas for about one-third of this cost. Do not be tempted to use cheaper tapes (ones that are not specifically designed for storing data) since they may cause damage to the Exabyte drive itself. In fact, Exabyte Corp. recommends only its own tapes, made specially by Sony, and available from Com Net Solutions in Sydney for about $30 each. However, University tests have shown that the high quality Sony data tapes are quite reliable.

If you wish to use a TK-50 tape, you can purchase these for $25 (half our original purchase price) from the Department of Astrophysics, speak to Michael Ashley. In general, Tk-50's are obsolete, but they may be useful when transfering data to another DEC computer.

Finding files

Suppose that you have a complex directory tree and somewhere in it you have some data files called data.001 through data.009 that you want to locate. This is done with the find command. First position yourself at the top of the directory tree, then run find. For example,

cd ~/datafiles
find . -name "data.00[0-9]" -print

If you expect pages of output from the find command, then simply pipe the output to less using

find . -name "data.*" -print|less

Note: you should give find as much assistance as possible by positioning yourself as low down in the directory tree as you can, to reduce the number of directories and sub-directories that find has to look through. Additionally, using find on an NFS-mounted filesystem is exceedingly inefficient and results in an enormous amount of network traffic: to avoid this, use find on the computer that has the filesystem mounted on a local disk.

The command utree may also be useful to find files.

Disk quotas

Currently we do not have disk quotas enforced since there is an adequate amount of space available. To make this work we require a cooperative spirit from users to keep their file usage to a reasonable minimum and to delete unwanted files. As the need arises we will be introducing disks for scratch storage that will be erased on a regular basis (perhaps daily or weekly). If you have need for a large amount of disk space, then you should consider purchasing your own disk for connection to the workstation. For example, the Departments of Astrophysics and Biophysics have each bought a number of disks for storing images.

Scratch disk space

There is a disk called /erased_at_5am_monday, with 1.5 Gbytes of space available. As its name implies, it is erased each Monday morning at 5am. You can use this for scratch space for storing data/programs/whatever during the week. Once erased, there is no way of recovering the data, so please be careful.


next up previous contents
Next: UNIX miscellany Up: No Title Previous: Do you have to

Michael C. B. Ashley
Fri Jun 28 13:34:23 EST 1996