SWC @ University of Twente (General information)
November 11, 09:00-14:00 CEST
Néstor DelaPaz-Ruíz
Joel H. Nitta
Download shell-lesson-data.zip and move the file to your Desktop.
Unzip/extract the file. You should end up with a new folder called shell-lesson-data on your Desktop.
For instructions by operating system, see the Shell Lesson
Humans interact with computers using GUI (graphical user interface) or CLI (command-line interface).
GUI: Intuitive, menu-driven, but not efficient for repetitive tasks.
CLI (Unix shell): Efficient for repetitive tasks, automates tasks quickly.
The shell interprets and runs the commands typed by the user.
Popular Unix shell: Bash (Bourne Again SHell).
Benefits of using the shell:
When you open the shell, you should see something like this:
$
The $
is the prompt, where you type your commands
Depending on your setup, it may look a little different, for example:
nelle@localhost $
ls
The first command we will learn is ls
, which lists the content of your current directory (we will come back to this later):
Desktop Downloads Movies Pictures
Documents Library Music Public
goostats.sh
to measure protein abundance.Using a GUI, Nelle would need to manually run 1520 files, taking over 12 hours. Can Nelle do this more efficiently with the shell?
How can I move around on my computer?
How can I see what files and directories I have?
How can I specify the location of a file or directory on my computer?
Use pwd
to show your current working directory (where you “are” in your computer)
/
character.
For example, Nelle’s files are stored in /Users/nelle.
ls
-F
option to adjust the output:
/
indicates that this is a directory@
indicates a link*
indicates an executableYou can clear a cluttered terminal with clear
Get a help menu by adding --help
:
Or, add man
in front of the command:
ls
If pwd
displays /Users/backup
, and -r
tells ls
to display things in reverse order, what command(s) will result in the following output:
pnas_sub/ pnas_final/ original/
ls pwd
ls -r -F
ls -r -F /Users/backup
cd
..
..
takes you one directory higher
..
Note that if you use ls -a
to show everything, you will see ..
~
You can use ~
to move to your home directory
-
You can use -
to move back to the directory you just came from
If you type a path that does not start with /
, it means you are talking about a folder or file relative to your current location
If you type a path that starts with /
, it means you are talking about a path from the root of the file system
If pwd
displays /Users/thing
, what will ls -F ../backup
display?
../backup: No such file or directory
2012-12-01 2013-01-08 2013-01-27
2012-12-01/ 2013-01-08/ 2013-01-27/
original/ pnas_final/ pnas_sub/
--
(ls --all
)-
(ls -a
)ls-F
means the command “ls-F”)ls -s
vs ls -S
)The shell will finish typing the names of files and folders for you when you press the tab key
Try it from ~/Desktop/shell-lesson-data/
(press the tab key twice to see what files start with goo
)
How can I create, copy, and delete files and directories?
How can I edit files?
mkdir
Make sure we are in shell-lesson-data
, then enter exercise-data/writing
Have a look around, then create a new directory called thesis
:
mkdir
You can create a nested directory using -p
Check what you did (-R
is the option to ls
that will list all nested subdirectories within a directory):
-
(dash)..
(period or ‘full stop’), -
(dash) and _
(underscore).nano
is a text editor program. It will create a file and open it for editing.
Press Ctrl+o
to save (as indicated by ^O
), then Ctrl+x
to exit
touch
rm
rm
is forever! (no recycle bin). Be very careful when you use it.mv
Enter shell-lesson-data/exercise-data/writing
:
Let’s rename draft.txt
:
(check the results with ls
)
mv
rm
, there is no “undo” button for mv
: it will over-write any file with the same name, so use carefully!Let’s move quotes.txt
into our current directory:
Check ls thesis
cp
cp
is similar to mv
, but copies instead of movescp
cp
a folder:cp: -r not specified; omitting directory 'thesis'
cp
-r
(recursive) optionrm -r
-r
for cp
, you need -r
to delete a folder:shell-lesson-data/exercise-data
:Moving multiple files at once is handy, but that was a lot of typing
We can use *
and ?
to match multiple file names. These are called “wildcards”
Consider the files in shell-lesson-data/exercise-data/alkanes
:
*
: Represents zero or more characters.
*.pdb
matches ethane.pdb
, propane.pdb
, etc.p*.pdb
matches pentane.pdb
, propane.pdb
.?
: Represents exactly one character.
?ethane.pdb
matches methane.pdb
.*ethane.pdb
matches ethane.pdb
, methane.pdb
.???ane.pdb
matches cubane.pdb
, ethane.pdb
, octane.pdb
.ls *.pdf
in a directory with only .pdb
files results in an error.When run in the alkanes
directory, which ls
command(s) will produce this output?
ethane.pdb methane.pdb
ls *t*ane.pdb
ls *t?ne.*
ls *t??ne.pdb
ls ethane.*
Jamie is working on a project, and she sees that her files aren’t very well organized:
The fructose.dat
and sucrose.dat
files contain output from her data analysis. What command(s) covered in this lesson does she need to run so that the commands below will produce the output shown?
You’re starting a new experiment and would like to duplicate the directory structure from your previous experiment so you can add new data.
Assume that the previous experiment is in a folder called 2016-05-18
, which contains a data folder that in turn contains folders named raw and processed that contain data files. The goal is to copy the folder structure of the 2016-05-18
folder into a folder called 2016-05-20
so that your final directory structure looks like this:
2016-05-20/
└── data
├── processed
└── raw
Which of the following set of commands would achieve this objective? What would the other commands do?
How can I combine existing commands to produce a desired output?
How can I show only part of the output?
pdb
files?Nelle needs to determine the pdb
file with the fewest lines of text in the shell-lesson-data/exercise-data/alkanes
directory.
She can do this with wc
, which counts text in a file:
wc
to all filesCheck the number of lines of text in all .pdb
files:
This works for a few, but what if we had thousands?
Let’s send the results to a file with >
and read out the contents with cat
:
(note that >>
will add to an existing file)
Next, use sort
to sort the output, then save it to a file, then finally get the first entry of that file with head
:
Whew! We did it!
tail
is similar to head
, but prints lines from the end of a file instead.
Consider the file shell-lesson-data/exercise-data/animal-counts/animals.csv
. After these commands, select the answer that corresponds to the file animals-subset.csv
:
animals.csv
animals.csv
animals.csv
animals.csv
So that worked, but it relied on two intermediate text files (lengths.txt
and sorted-lengths.txt
). That is confusing.
We can streamline the analysis by sending the results of one command directly into the input of another with the pipe: |
north-pacific-gyre
300 NENE02040B.txt
300 NENE02040Z.txt
300 NENE02043A.txt
300 NENE02043B.txt
5040 total
Z
?A
or B
; by convention, her lab uses Z
to indicate samples with missing information. To find others like it, she does this:ls *Z.txt
NENE*A.txt
or NENE*B.txt
Suppose you want to delete your processed data files, and only keep your raw files and processing script to save storage.
The raw files end in .dat
and the processed files end in .txt
.
Which of the following would remove all the processed data files, and only the processed data files?
rm ?.txt
rm *.txt
rm * .txt
rm *.*
A file called animals.csv
(in the shell-lesson-data/exercise-data/animal-counts
folder) contains the following data:
2012-11-05,deer,5
2012-11-05,rabbit,22
2012-11-05,raccoon,7
2012-11-06,rabbit,19
2012-11-06,deer,2
2012-11-06,fox,4
2012-11-07,rabbit,16
2012-11-07,bear,1
What are the contents of final.txt
? (the sort -r
command sorts in reverse)
Scenario: Extract classification from genome files.
Files: basilisk.dat
, minotaur.dat
, unicorn.dat
in exercise-data/creatures
Structure:
Our goal: Print the classification (2nd line) for each file.
General form of a loop:
For our situation:
$ for filename in basilisk.dat minotaur.dat unicorn.dat
> do
> echo $filename
> head -n 2 $filename | tail -n 1
> done
$filename
is a variable that gets filled in by the shell
The shell prompt changes from $
to >
and back again as we were typing in our loop.
A semicolon, ;
, can be used to separate two commands written on a single line.
If the shell prints >
or $
then it expects you to type something, and the symbol is a prompt.
If you type >
or $
yourself, it is an instruction from you that the shell should redirect output or get the value of a variable.
You can put the variable name in curly braces: ${filename}
. This makes it easier to distinguish the variable from surrounding text (like ${file}name
)
for filename
and $filename
, but we could just have easily said for x
and $x
. Is this a good idea?No, because it is not clear what the variable refers to. It is better to use variable names that convey their meaning.
This exercise refers to the shell-lesson-data/exercise-data/alkanes
directory. ls *.pdb
gives the following output:
What is the output of the following code?
Now, what is the output of the following code?
Why do these two loops give different outputs?
What would be the output of running the following loop in the shell-lesson-data/exercise-data/alkanes
directory?
cubane.pdb
, octane.pdb
and pentane.pdb
are listed.cubane.pdb
is listed.How would the output differ from using this command instead?
cubane.pdb
and octane.pdb
will be listed.octane.pdb
will be listed.In the shell-lesson-data/exercise-data/alkanes
directory, what is the effect of this loop?
cubane.pdb
, ethane.pdb
, methane.pdb
, octane.pdb
, pentane.pdb
and propane.pdb
, and the text from propane.pdb
will be saved to a file called alkanes.pdb
.cubane.pdb
, ethane.pdb
, and methane.pdb
, and the text from all three files would be concatenated and saved to a file called alkanes.pdb
.cubane.pdb
, ethane.pdb
, methane.pdb
, octane.pdb
, and pentane.pdb
, and the text from propane.pdb
will be saved to a file called alkanes.pdb
.Also in the shell-lesson-data/exercise-data/alkanes
directory, what would be the output of the following loop?
cubane.pdb
, ethane.pdb
, methane.pdb
, octane.pdb
, and pentane.pdb
would be concatenated and saved to a file called all.pdb
.ethane.pdb
will be saved to a file called all.pdb
.cubane.pdb
, ethane.pdb
, methane.pdb
, octane.pdb
, pentane.pdb
and propane.pdb
would be concatenated and saved to a file called all.pdb
.cubane.pdb
, ethane.pdb
, methane.pdb
, octane.pdb
, pentane.pdb
and propane.pdb
would be printed to the screen and saved to a file called all.pdb
.Context: Nelle needs to process protein sample files using goostats.sh
.
Script: goostats.sh
calculates statistics from a protein sample file.
Nelle decides to build commands step-by-step.
Step 1: Select the right input files.
A
or B
, not Z
.Next step: decide what to call the files that the goostats.sh
analysis program will create.
Prefixing each input file’s name with stats
seems clear:
The ;
has the same effect as a line-break
goostats.sh
just writes out the results file without printing anything to the screen. Let’s kill the script with Ctrl + c
, then add an echo
to display the name of the file:
for datafile in NENE*A.txt NENE*B.txt; do echo $datafile;
bash goostats.sh $datafile stats-$datafile; done
We can inspect the output by opening another shell window
alkanes/
.middle.sh
and open it with nano
:cd alkanes
nano middle.sh
Type this in the script (it should look familiar):
Now we can run the script:
This script would be much more useful if we could run it on any file, not just octane.pdb
.
Open it again in nano
and modify it like so:
The "$1"
means ‘the first filename (or other argument) on the command line’. Try it out!
In the alkanes
directory, imagine you have a shell script called script.sh
containing the following commands:
While you are in the alkanes
directory, you type the following command:
Which of the following outputs would you expect to see?
.pdb
in the alkanes
directory.pdb
in the alkanes
directoryalkanes
directory*.pdb
So far, we have been able to input one file into a script with something like "$1"
But what if we have many files we want to input?
Solution: "$@"
"$@"
means ‘All of the command-line arguments to the shell script’sorted.sh
Try it!
For this question, consider the shell-lesson-data/exercise-data/alkanes
directory once again. This contains a number of .pdb
files in addition to any other files you may have created. Explain what each of the following three scripts would do when run as bash script1.sh *.pdb
, bash script2.sh *.pdb
, and bash script3.sh *.pdb
respectively.
grep
?“To grep something” has become a verb kind of like “To google something”
grep
is a computer program that searches for text
Our examples will use haiku about programming that were featured in Salon magazine
grep
examplenot
:The
Thesis
. How can we match only the whole word The
?
-w
(for “word”)-n
-n
: Show the line number-w
: Only match whole words-i
: Make search case-insensitive(you can also combine these into -nwi
)
-v
:-r
:The re
in grep
stands for “Regular Expressions”
Regular expressions are kind of like wildcards: they can match certain patterns in text
This is one of the most powerful features of grep
For example, this finds any text with an “o” as the second character (-E
turns on matching via regular expressions, the ^
matches the start of a line, and the .
matches any single character):
grep
Which command would result in the following output:
grep "of" haiku.txt
grep -E "of" haiku.txt
grep -w "of" haiku.txt
grep -i "of" haiku.txt
find
grep
finds lines in files, the find
command finds files themselves.Try it out from shell-lesson-data/exercise-data
:
-type d
-type f
-name
Wait a sec - I thought there were more text files?
find . -name numbers.txt
-name
wc
and find
.txt
files:grep
and find
.txt
files in the current directory:By combining relatively simple, small programs using techniques like $()
and the pipe (|
), we can achieve very powerful results with a small amount of code
This is the beauty of the shell!
Remember, the -v
option to grep
inverts pattern matching, so that only lines which do not match the pattern are printed. Given that, which of the following commands will find all .dat files in creatures
except unicorn.dat
? Once you have thought about your answer, you can test the commands in the shell-lesson-data/exercise-data
directory.
find creatures -name "*.dat" | grep -v unicorn
find creatures -name *.dat | grep -v unicorn
grep -v "unicorn" $(find creatures -name "*.dat")