2.1 - moving, renaming, and copying files
when using a CLI-based linux system, it is often necessary to move or copy files to new directories, or to rename them. in a GUI-based environment, this could be accomplished by click-and-dragging a file icon to a new window, or by right-clicking it, and selecting "rename" or "copy".
to move a file from one location to another, the user would execute the mv
command, which is
short for "move". to use the command, both the source and the destination must be specified, like so:
mv sourcefilepath destinationfilepath
the path to both source and destination must be accurate down to the filename, or the command will not produce the desired result.
the mv
command can also be used to rename a file, like so:
mv oldname newname
if the filenames don't match when moving a file, it will also be renamed. this can cause a lot of confusion if done
unintentionally. additionally, if a file of the same name already exists in destination, it could be overwritten.
adding -i
to the command will require mv
to prompt the user before
overwriting a file.
if desired, the user can also move a copy of a file to a new location, while leaving an identical copy in the original
directory. for this, the cp
command is used (short for "copy"). it takes mostly the same form
as mv
:
cp sourcefilepath destinationfilepath
if destinationfile doesn't already exist, it will be created. if it does, its contents are overwritten with the contents of sourcefile.
2.2 - gathering information about files
in section 1.4, the file
command was
introduced as a way to determine the data type of a file or set of files.
this is a very useful tool for learning how to work with cetain files, but it is not the only important utility to know.
the du
command, short for "disk usage", will return information about the size of a file or directory.
it has several helpful options:
-h
: "human readable". outputs size values in bytes rather than kilobytes.-s
: "summarize". reduces lines for individual items into one line describing the size of the directory.-a
: "all". show and analyze all individual files in the directory.--time
: shows the time the file was last changed.-c
: "total". adds a line to the output describing the total size of the combined items.
another useful utility to know is xxd
. this command creates a
hex dump of a specified file. it is used like this:
xxd file
if file is already a hexdump, it can be reversed by adding the -r
option ("revert"). normally,
xxd
prints to the terminal, but the output can be redirected to a file like so:
xxd inputfile outputfile
xxd
is useful because it can expose the file signature
of a file. this can then be used to determine what type of file is being examined, and help determine how to work with it. many file
types have unique signatures, which positively
identify them.
2.3 - extracting content from files
there are several options available to find specific content within files. often, they rely on regular expressions and sorting data into a specific order.
the first option is grep
, which stands for "global regular expression search and print". given a string or more complex
regular expression as an argument, grep
will scan a file (or files) for strings matching the pattern. it will then return
the line, in its entirety, where the match is found, for each instance it finds. it is also used to scan the output of a script or command using
grep
.
grep
takes several options, following this usage:
grep options pattern file1 file2 ...
grep -f patternfile
: "file". use patterns from patternfile rather than a command line argument.grep -i pattern
: "ignore case". makes the search case-insensitive.grep -v pattern
: "invert". returns lines that do NOT match pattern.
another useful tool is strings
, which will scan through a non-human-readable file and extract any human readable strings it
finds. this is useful for analyzing binaries or other machine-language objects.
sort
and uniq
are often used together to order data and find anomalies and elements that stand out.
sort
will organize the lines within a file into a specified order. if no specification is given, the default scheme is alphabetical
order. however, these other options can come in handy:
sort -n
: numerical order, based on numbers found in lines.sort -r
: reverse order, can be combined with other options.sort -f
: ignore case, i.e. treat "a" and "A" the same.
usage:
sort options inputfile optionaloutputfile
uniq
scans a file or input and returns unique lines. it works by removing adjacent duplicate lines, meaning that if a line appears more
than once in a row, the repeats will be filtered out and the line will only be reported once. however, if the duplicate lines are not adjacent, but separate,
uniq
will not recognize them as duplicates. in this case, even though the line is not actually unique, it will be reported as if it is.
to prevent this, the user can first sort
the file, then direct that output into uniq
, like so:
sort options inputfile | uniq options >> optionaloutputfile
2.4 - editing and altering strings
the way data is presented and written can be changed via the CLI. a common style of writing data is as encoded data, using a standard encoding method. one method commonly encountered is base 64, which involves translating data from its normal base 10 (decimal) ASCII values into a base64 character set, and vice-versa. the purpose of this is to allow data to be transmitted across channels that are only able to handle text.
linux systems use a built in utility, base64
, to encode a string as base 64 data. the -d
option is used to decode the string.
another built in utility, tr
("translate"), is used for simple find-and-replace operations. using the format tr oldcharset newcharset
,
the command replaces instances of certain characters with new characters. this is useful for many things, including capitalizing, simple ciphers, etc.
the output of one command can be directed into the tr
command, which can in turn be directed elsewhere. example:
cat names.txt | tr [a-z] [A-Z] >> capitalnames.txt
this string of commands will send the contents of "names.txt" to tr
, which will translate the text to all capital letters,
and then write that data to "capitalnames.txt".
2.5 - compressing and decompressing files
file compression is a commonly encountered procedure. the purpose of file compression is to save space and ease transmission without compromising the integrity of the data. two types of compression exist: "lossless", where unnecessary bits are removed by identifying and eliminating redundancies, and "lossy", where less important bits are removed. as the names imply, no information is lost in lossless compression, but some information is lost in lossy compression.
many different algorithms exist for the purpose of data compression. each has its own advantages, disadvantages, and optimizations. some important ones to know are:
- gzip: lossless algorithm, generally has a ".gz" file extension.
- bzip2: lossless algorithm, generally has a ".bz2" file extension.
- jpeg: lossy algorithm for compression of images, generally has a ".jpeg" or ".jpg" file extension.
- tar: not technically a compression utility, but does have the ability to compress and decompress tar archives. generally has a ".tar" file extension.
each utility, having its own way of doing things, leaves a unique file signature.