Post-fundamentals of C, behind the veils of libs, includes and file access - By AcidFlux
First of all, if things are not clear, try to find
things on the internet. My closest friends are
www.altavista.com
astalavista.box.sk
www.yahoo.com
I take it you know how to search for things, but here
are some tips still. Use a + sign between two words
if you want links that have both words in the text,
not just one. If you try to find something about
GNU Public License for instance, try using the
main keywords you won't find anywhere else that
easily, like GNU+License, this will probably give
you many URL's already to look for more information,
more below...
To find any program on your Linux system, you can
use the find command, explained into detail in
the man pages. Just type "man find". I mostly
use find / -name <filename>, which will search
the complete system for the occurence of .
This document is intended to show you around a bit on
a linux system, see where your files you are actually
using are located and give you paths where to look for
more information on this subject. Let's start with some
practical things here.
UNIX is a commercial distribution for which you have to
pay money. However, Linux, as a UNIX clone, has the same
software that's used in UNIX. This software however, does
not come from the commercial distributions, but has been
written by many different programmers and freely contributed.
The software is subject to the GNU Public License, this
means that it is 'copyleft'. This is intended to prevent
others from putting 'copyright' on the software. A copy
can be found on the internet very easily. The software from
GNU closely mimics the software found on UNIX systems,
but is not the same.
Some software you'll need throughout the course:
GCC <== your C-compiler.
G++ <== your C++ compiler, which will be invoked through
GCC as well, when using flags.
gdb <== source code level debugger, I never used it though,
so good time for me to start as well.
gnumake <== this is a version of UNIX make. It actually compiles
installs source code through a compiler script.
If you've compiled the kernel before, you'll remember
typing things like ./configure, make and make install
make zlilo or whatever. The ./configure is where you
configure the script and the make compiles the sources
according to the script. make install links the object
files and places them in the correct subdirs. make zlilo
places your kernel on the / in the right place.
bison <== parser generator compatible with UNIX yacc (no experience
herewith)
bash <== that rings a bell, it's the normal command shell for users.
emacs <== my favourite for editing. use any editor you like though.
other examples are pico, vi, vim, joe, xjed...
some info:
First of all, you'll probably have compiled something before,
exploits or port scanners and placed them somewhere in some
bin directory. There's a difference between the root /bin and
the usr/local/bin, since only root uses /bin and users can only
execute progs from /usr/local/bin.
When DOS users compile a program, link it, they'll be able to
execute it in that directory immediately. It's an .exe or .com
there. However, in linux, you need to place it in the right
directory before execution OR run it with a "./" in front to
tell the kernel the program can be found in the current dir.
Let's browse around your system:
I'm running SuSE 6.0 and forgot where the files are for
other distributions, but I expect them to be the same.
"gcc" is in my /usr/bin directory. It executes as both
user and root.
your "library files" are located in some "lib" directory
under i486-linux-libc5. My full path is
"/usr/lib"
Other lib-directories exist, but many times these are
just versions of the /lib directory. I'm not completely
sure, but I believe some libraries do still exist in the
other directories under i486-linux-libc*x* that are
upgrades of previous versions or used for that version only.
DLL's in windows are like library files in UN*X (yeah, correct,
windows has in many ways actually been stolen from UN*X and
Apple, but I guess you already knew that)... nowadays
though, windows seems to have matured in it's own way
and diverts quite from any UN*X system. Once in a while
though, MS claims to have created this new technique or
software which has already been around on UN*X for ages,
but never advertised so much... getting back to the
subject.
So the libraries contain all the functions and procedures
you can use to talk to the system or use for your own
convenience. For instance, when you invoke a sleep() command
that pauses the program for a while, a procedure exists
in the library that does exactly that. it's not possible
to talk to the system with ASCII to make it do anything
you want, a function/procedure has to be written first.
Normally, smart people who know ASM or C/C++ really take
good care of this. By layering some interfaces, more dumber
people can access the same functionality without diving
into the code too much.
You must have seen #include in many occasions. Every program
contains at least one. We call the files that are included to
the program "header files". These files can include other
header files at their time and so on. Be careful not to
create cross-references, including a header file that includes
the previous, it may give strange results. However, the
compiler sees this, so don't worry too much.
In the header files is information about structures (oh shit,
haven't explained these yet)... umm... imagine a structure
to be something of an array. For instance, it can contain
three separate numbers, four characters and another structure,
why not? If you're not sure about this, just imagine a block
of variables. The header file also reserves possible
variable names and contains "links" to your library
files (what functions are stored and how they're called
and so on). Basically, you don't need to interfere too much
with the header files, not even when you've progressed,
so I won't bother your minds with em yet (here's a line that
many mentors use when they're not so sure about a subject
and want to hide their inexperience)
There are some subdirectories in there as well. Famous
for hackers is "/usr/include/net" or "/usr/include/netinet",
these header files are all about socket programming. Good
thing is that for every protocol your system understands
there's a header file. Try to find tcp.h and check it out,
there's the implementation of TCP for linux, caress it!
Especially, check out struct tcphdr, it's a TCP header,
you WON'T find this in windows, it's hidden through their
implementation. Therefore, TCP headers cannot be changed in
windows, except when you create a miniport driver and plug
into the vxd-stuff. It's a virtual device driver, very
low level that's passing the TCP packets on to the
TCP/IP stack. When passing through, you can catch the headers
and modify the headers there. Another way is to 'patch'
the winsock.dll, but ah well, without the source, you're
kinda lost in assembly there. These are the only ways
I could ever think of. Don't ask me to do it though, cause
it's beyond my level... You could use the info to look smart
on IRC though... :)
Anyway, since it's in the linux headers and we'll be using
raw sockets after some more, you have access to the
headers and can freely modify them. Not easy, but it's
not too hard as well.
Ok, for now, I have given you enough info. I'm sure you
want to get a kick out of yourself by programming your
first thing in linux. It'd be nice to actually see the
program do anything.
Pick up your favourite editor and clear the buffer. Next, type
in the following, it's just a simple "hello world!" program,
but by keeping the previous in mind, I'm sure you suddenly
get much more knowledge and insight in what the program
is actually doing and calling. Especially the library
calls and header files. Don't blame me for not giving you
the source to any "smurf" or IP spoofing program, but they
are freely around on the internet. If you just wish to
"look" smart, go ahead and download them and compile them.
We're here to actually learn something and get to understand
that. After knowing it, you'll see the need for doing
those things will dissolve.
#include <stdio.h> <== include the standard input/output header
find it in the "include" dir and study
it for a bit. Although you won't understand
it (I even don't), you'll still see the
function call "printf" somewhere. It's being
called from the library files :)
This file should be saved as "hello.c"
#include <:stdio.h>
int main() // main function, always present in a C/C++ program.
{
printf("Hello world!\n"); // printf function
exit(0); // exit function.
}
Ok, compile it with
"gcc -o hello hello.c"
next, run it with
"./hello"
/* There ya go. Let's notice some things already. // is used to
place any comment on the same line. If you wish to add more lines
of commenting, use /* to open it and */ to close your comments, like
is done in this paragraph. */
This way, the compiler knows what's there does not require compiling
and no errors will come up.
"printf" is a very simple print function that simply
displays something between the brackets on screen. The
\n is a special character called a newline, there are more
special characters like that, like \r for carriage return
and whatever. You can often recognize them being 'escaped' by
the \ - backslash character. The compiler interprets them
as ASCII codes 10 for newline and 13 for carriage return. It's
impossible to put these in the source code, so that's why the
escaping sequence is used!
Well, you're free to put this proggie into your /usr/local/bin
directory and run it once in a while to have your system greet
the world, but it's not really of any use, so just discard it
and browse some more through the system. I'm sure you have
some .c files somewhere. Look what header files they use. By
the way, a good tip. If you need to find information about
a structure, the way it has been declared that is, browse through
the header files, the declaration is in there!!! (kewl tip! kewl tip!)
Ok, I can understand your mind rings and you're numb now,
but don't be discouraged, sleep on it and ask me questions, that's
why I am mentor. I'll be hearing it when you guys are through
reading it and want to get to know more! We're actually
advancing quite fast, but well, I'm focusing on C/C++ put into
practice, not actually understanding completely in theory
what it's all about...
variables as substitutions in the line.
example:
printf("hehe %d is actually not really a number. know what %s? I think %d is better!", number1, string1, number2);
bill.c and fred.c
---------------------------
#include <stdio.h>
void bill(char *arg)
{
printf("bill: you passed %s\n", arg);
}
#include <stdio.h>
void fred(int arg)
{
printf("fred: you passed %d\n", arg);
}
Ok, so there exist some substitutions here and there. Here's a list:
%d is substituted by an integer
%i same
%u = unsigned int
%ld = long int
%f = float
%c = single character
%e = floats in exponential format (e.g: 1.2575484e+01)
%s = array of characters (array of char, e.g.: char title[255] = "ya dude!")
%p = pointer address
there are some other ways, but they're not really interesting at
this point...
Now, we should compile the bill.c and fred.c programs into object modules.
Do not use the -o option when compiling, cause this will also try to link it.
That fails, because no main routine is created in the program.
Type:
"gcc -c bill.c fred.c"
This has created files "fred.o" and "bill.o"
Next, we will start to create a program that calls the function "bill".
For that to work, we need to create a header file first.
lib.h:
----------------
/*
This is lib.h. it declares teh functions fred and bill for users
*/
void bill(char *);
void fred(int);
----------
program.c:
---------------
#include "lib.h"
int main()
{
bill("Hello world!");
exit(0);
}
----------
save this as "program.c"
so, what we have done now is create a reference to bill.o by including "lib.h".
This time, lib.h is not between <> characters. This is because <> specify the
include file should be looked for in the general "include" directory, while ""
characters specify it should be found in the current directory, that is, where
the program is located.
Let's do:
"gcc -c program.c"
This has created an object file program.o. We need to link this program.o with
bill.o in order for it to work:
gcc -o program program.o bill.o
This has linked program.o and bill.o together in "program". You can now execute
./program and it will look like as if the function bill was programmed
into "program.c".
Ok, next step is to actually create a library, not link object modules together.
"ar crv lib.a bill.o fred.o"
and:
"ranlib lib.a"
'ar' is a general UNIX utility for archiving. You see it can be used for libraries
as well. Because some UNIX systems, especially derived from Berkeley, need a table
of contents we use the ranlib utility which does that. it's harmless to use,
so just do it!
Final step:
"gcc -o program program.o lib.a"
there you go.... program has used an object module in the 'lib.a' archive and linked
that object code into it's own program. This is all possible because of the header
file and compiler options. executing "./program" does the same thing.
The bad thing is that the code is actually retrieved from the library and linked
inside the program. Sometimes, you may wish to use shared libraries, which load
themselves into memory and allow themselves to be shared across multiple programs.
DLL's work exactly the same way. The code is loaded in memory, but cannot be executed
by itself. However, the functions are ready for execution. once a program needs to
use a published function or procedure, it transfers execution to the address,
where the function/procedure is located and executes from there. Next, it returns
to the program and continues execution from there. Multiple programs have access to this
memory space, it works quite efficient. Linux has shared libraries as well,
called .so files. The .sa libraries or .a libs are static libraries and you can
link their functions in your own programs.
questions????
Escape characters:
Someone asked me a question about escape characters.
These characters are used to put in character codes
in the text, without doing something ugly.
Special character codes comprise characters before
code 32 dec or 20-hex. These are often characters
that will not show up, because they are the same as
system delimiters. For instance:
\a = ASCII bell character
\b = backspace character
\f = formfeed character
\n = newline character
\r = Carriage return (no linefeed)
\t = Horizontal tab
\v = Vertical tab
\\ = Backslash character
\' = single quote
\" = double quote
\? = Question mark
\nnn = ASCII value in octal
\xnnnn = ASCII value in hexadecimal
Don't ask me what a vertical tab is, I really wouldn't
know. All chars above have not been tested in Linux,
so I can't be sure if they work. For \n and \r I'm very
sure, because that is really standard.
If you would program something like
void function blabla(void)
{
printf("look at this");
printf(" line");
}
it would print all the stuff on one line. Now, by using
the \n at the end or somewhere in between where you like,
you can separate the lines so they start on different
lines, instead of writing the data on the same.
printf("look at this\n");
would be the solution to do it. Often, for system utils,
there's a help function. They often print multi lines,
in the program code, there are some printf's that use
the same concept.
Let's start some bits on the command shell, that # or $
you see so frequently. While in many ways it's similar to
DOS, it's much more and more powerful than DOS command.com.
A shell is available on every UN*X system and you can find
out if simple ideas work. I'm not talking about big progs
of course, but simple progs that will convert your
text files or basically, just gadgets you want to run now
and then.
UN*X has always been built very modular, that means it contains
utilities that can be re-used by other programs or users.
For instance, 'more' is a utility that enables you to
view an output of a list to a 'one-screen-at-a-time' view.
This is done by piping the output of the list to to this
utility, which receives the output and makes sure the
screen is not scrolled at the end.
You can pipe many times, like
"man bash | col -b | lpr"
This will print a reference copy of the bash-shell.
Col is a utility to take out backspaces and
format the output a bit. lpr is the printer spooler,
it prints the input to the printer as soon as
resources become available.
For an example of shell 'programming', have a look
at 'startx' , '.xinitrc', '/etc/rc.d' to see what
it's all about. It's not really C++, but definitely
some issues come forward that you want to know and
have a look at. For C++ programming, you can often
make use of utilities without having to reprogram the
same functionality in your own utility.
A shell by definition:
"A shell is a program that acts as the interface between
you and the UNIX system, allowing you to enter commands
for the operating system to execute."
Because of this, it resembles DOS, but hides the details
of Kernel operation for you. It's a sort of high
level programming language for UNIX itself.
If we go further into Tcl (pronounced 'tickle') or
Tk, we shall meet the 'tcsh' or 'wish' shells.
For now, let's continue.
If you've worked around UNIX for a while, you'll know
how to output the directory to a txt file or something,
so you can read it later with 'less', which is nothing
less than more, actually more.
"ls -l > lsoutput.txt"
Is the way to do it.
The standard output is in this way 're-directed' to the
file system by using ">". If the file exists, it's overwritten,
so by using ">>" you can append to any existing file.
By using the command "set -C" you can override the
existing default behavior to overwrite existing files using
redirection. It sets the 'noclobber' option.
There are three file descriptors:
0 = standard input to a program
1 = standard output
2 = standard error output
They are used by prefixing the ">" with those numbers.
If you want to use the 'kill' command from a script for
example, you can use these to generate error logs if
something can't be killed or redirect success logs for
everything that was successful.
"kill -1 1234 >killout.txt 2>killerr.txt"
This will 'kill -1' process '1234' and write a log
of this to 'killout.txt', except when it failed, it will
generate an error log to 'killerr.txt'
If you do not want to see any messages at all, there's
a nice feature in UNIX, called the 'bit bucket'. A bit
bucket is like a black hole, everything put in there
simply disappears.
"kill - 1 1234 >/dev/null 2>&1"
This tells the system to redirect standard output to
/dev/null and redirect standard error output to the same
place as standard output (& + file descriptor, 1 = standard
output).
A silly example to redirect input:
"more < killout.txt"
This will accept the file "killout.txt" as input to
"more", but it's silly, cause more can accept parameters
that will do the same thing.
More about pipes:
----------------
"ps > psout.txt"
"sort psout.txt > pssort.out"
will output the 'ps' command to 'psout.txt'. However,
psout.txt is sorted to 'pssort.out', where a sorted list
of all processes is stored. This can be piped to one single
line of command like:
"ps | sort > pssort.out"
output of 'ps' is hereby piped to 'sort', which will store
it's output to pssort.out.
I'm hammering so much on pipes, because later we'll do some
programming in this respect and you really need to understand
the concept.
"ps | sort | more"
is nice, it 'mores' out a list of ps, which is sorted.
how about this one ?
"ps -a | sort | uniq | grep -v sh | more"
it takes the output of 'ps', excluding shells, sorts it
in alphabetical order, extracts processes using 'uniq',
uses 'grep -v sh' to remove processes called 'sh',
and finally displays paginated on the screen.
For hacking, you may want to know some of this stuff in
order to take out things you do not want to show the
webmaster. By re-compiling the utilities and storing
them in the 'bin' again, you are cloaking your own
existence in this way.
Programming the shell:
-----------------------
It's not really programming, but more like scripting
the shell. All different shell commands and variables
can be done through any script. Example without script:
$for file in *
>do
>if grep -l POSIX $file
>then
>more $file
>fi
>done
Run these commands from your shell, notice the ">" which
tells you that you need to input more commands or
parameters for the process to start working.
$file is the output of a sub-process, a sub-process is
something like a 'find' routine in a script... like
"for file in *"
tells the system you want to search every directory on
the system and you call every iterated file 'file'.
Then, you can use it the same way you use a variable
with 'grep'. Now, it's the output of a sub-process.
grep -l prints every file you found and has "POSIX" in
it to the printer. Furthermore, it displays the contents
of that file with 'more' on the screen.
Now, create a script that does about the same:
----------
#!/bin/sh
#first.sh
# This file looks through all the files in the current
# directory for the string POSIX and then prints those
# files on the standard output.
for file in *
do
if grep -q POSIX $file
then
more $file
fi
done
exit 0
---------------
'grep -q' suppresses standard output and stops on the first
match. That means if a file has at least one match, it
will continue to the next.
Do a 'man grep' to find out much more about this really
useful tool.
This script is much like a cgi-script. I'd encourage you
to look at it in more detail, since it will help you if
you want to start cgi scripting later in your life or
possibly other lessons about this subject...
The script is treated essentially as standard input to
your shell, so setting your PATH parameter right, you
can reference any UNIX command in your script and have
it executed. It will give you the same authorisation
as yourself..
Administration of scripts:
--------------------------
Now that you have a script we can run it two way,
invoking the shell with the script filename parameter
or simply running it from your current shell...
"/bin/sh first.sh"
would do it, or first change the file mode to executable:
"chmod +x first.sh"
then use "first.sh" to execute it, or, if the PATH environmental
parameter is not set for you, run it with "./first.sh"
Hack:
-----
if the environmental parameters of 'root' are set like
'PATH=$PATH":."'
which means, look in the current directory to execute something,
and then go to other-dir. Creating a hack-script, you can have
the root user execute something he wouldn't have wanted. Remember,
that a script works because it's like standard input, someone
writing a malicious script can just have it sit there and
possibly trick 'root' to run it. Moreover, if root/bin is not
'write-protected' (dream on!), you're home free possibly...
After shell programming, we really start to dive in
into C/C++ programming, File access!!!
File access is really important and I'll try to
delve as much into it as I can. You'll need file
access a lot...
questions ????
At the moment I found a good book for developing
Linux applications in the X-environment using
GDK and GTK+. The advantage is that it gives
you a good idea what object programming is all
about. It's quite easy as well, if you understand
it's concept. Since some OO concepts are hidden
from the programmer in Windows, but not in GTK/GDK,
it'll give you much better understanding about
the OO - programming, what containers are etc...
Don't worry about it yet though, it's future work
and I'm here to tutor, not give a headache. And
we still need a prelude to OO before we can take
it on anyway.
This week I started on a Object - oriented C++
project in Windows. As soon as we get some more
done in C, I'll divide the text up a bit into
OO and advanced C (sockets/whatever).
Last time I showed some examples in Shell Programming.
I told you guys I'd carry on with that, but since
the content was quite basic and stuff uninteresting
regarding UNIX environment, I'll decide to carry on
with.... FILES!! ah yeah.. why so hard at once? hmm..
Cause the best way to learn things is by doing it.
If you can't get to compile or have problems, lemme
know, I may be able to give you some tips where to look.
FILES:
We all should know what files are by now, they're rather
the same in DOS, except that file access in Linux is
different. You won't be able to access your ext2 filesystem
in DOS, however, from Linux you can. That is because
many file system modules are inside the kernel or plugged
in as a module. I can work through my windows files
anyways, because I have told the system where it is
mounted and what file system it contains. But that's
Linux administration, let's carry on with C here.
The basic operations on files are creating/opening/
reading/writing and closing them. On top of that we
need to organize files in directories, so you want to
know how to create, scan and delete directories, for
example.
In Linux, everything is a file. Therefore, file I/O
and programming there-in is really important. When
compiling your files as a user, you won't be able to
cause havoc in the system though. Examples are the
programs you are using to browse through directories
or try to manage them. If you, as a user, are not
permitted, not even a program will give you that
access.
I hope you all know how files work in Linux. DOS has
no control about who can read/write/execute files.
NT has it though, in the implementation of NTFS,
UNIX was originally built that way as a multi-user
system (true multi-user, contrary to NT).
Information about a file is stored in the 'inode'.
Picture this as some sort of File Allocation Table.
The system will just use a number to access the
file, but for our convenience, all files have a
description, size and access privileges stored in
the 'inode' as well.
A directory is in essence a file as well, but to us
they seem transparent, as if files are stored 'within'
a directory. When you delete a directory (and thereby
a 'file'), you delete a 'link' to the file 'inode' and
the system looses track of the information in the file.
Just like DOS, deleting a file merely deletes it's name.
Or rather, it marks the file as 'not to be allocated'
anymore. Next time another file is created, it'll know
it can write to certain bytes on your hard disk. If you
have not written to disk yet and the information is still
there, you are able to 'undelete' the file by restoring
the file name and thereby the link. With UN*X, you can't
though, so files deleted will really be gone, unless you
use the trick that is documented in Linux documentation,
which should be available in your HOWTO's somewhere.
Because everything is a file, the hardware - specific
control programming can be done by accessing the files
through the device 'driver'. Suppose you want to search
for a file somewhere. Doing that on a tape or on a
harddisk or cdrom really differs. Luckily, these have been
made transparent to us, so what we simply do is call
"open", "write", "ioctl()", and other functions on our
devices and the driver will handle the hardware-specific
actions to take by itself.
Here are five system calls we can perform:
- open : to open a file or device for access
- read : to read from an open file or device
- write : to write to a file or device
- close : to close a file or device from access
- ioctl : specific control of the device itself
ioctl (input/output control) can for instance be used
to rewind a tape drive or set flow control characteristics
of a serial tape. Therefore, not all functionality can
be called for every device/file.
do a:
"man open"
"man close"
"man ioctl"
to find out more, especially the last one is useful.
Some problems that we are facing is with the making
of the system calls. Not that anyone cares to ring the
kernel, but because it causes so much overhead. You
actually step out of the program, execute its kernel
code and switch back. It's more expensive than own
function calls. Some hardware has restrictions on
the size of the data you want to write. For instance,
a tape can write 10K, but when you write 4K to the
tape, it will only write 4K but advance the tape for
10K. That's a limitation. File access ain't usually
that hard though.
The standard library functions we have for file access
are in the standard I/O library. We have discussed
header files and libraries in the previous three
lessons. To elaborate on this:
* A library is like a DLL in windows, it has function
calls you don't need to rewrite and they are already
stable. (as in not unstable).
* A header file is an included file in your own program.
The functions of the libraries are included in there,
so when your program sees a call to a function in your
program, it won't be scared shitless and provide no
error.
So, we have the standard I/O library somewhere in the
/usr/lib/ directory called "libstdio.a". Then we have
the header file <stdio.h> in the /usr/include directory.
If you browse through the <stdio.h> file, you'll notice
some things now, with clearer sight. I mentioned
"open", "write" and "ioctl" functions. Do a "less stdio.h"
and browse with me (asynchronously that is).
#defines are used to declare constants in a program.
Suppose you want to check for errors later on. If you
have a function for instance that is able to determine
whether the guy at "www.icepick.com" has something in
his fridge and you want to return one of the next codes:
0 = he has absolutely nothing in his fridge.
1 = he has something in his fridge.
So you start coding more functions with the same protocol.
This time functions that determine whether someone
rang his doorbell, sat on his toilet, been in his
kitchen or was sitting at his computer.
Now, when your boss wants 0 to be replaced by -1, because
it looks better, you have to change every function you
wrote and believe me, sometimes you cannot be sure
you had every function. This increases bugs. So, what
does a #define do?
Find "#define NULL 0" in the header file. This means that
when you create a:
return(NULL);
it will actually do a "return(0);". But if your boss wants
the return value to change to -1, you simply change the
define into:
"#define NULL -1"
This means the program (with "return(NULL);"), will actually
do a "return(-1);" this time. Isn't that smart?
FILES (for real this time):
===========================
There are three file descriptors already in your system.
These are 0 for standard input, 1 for standard output,
2 for standard error. We saw last time how to redirect
output to somewhere else.
A file descriptor is called a "handle" in windows. They
took that name, because it is a reference to your file.
The system will give you back a "handle" or descriptor
when you successfully open a file for write or read access
or whatever (you must specify what you want to do with
a file).
This file descriptor is just like the above numbers, just
an integer value. But it's used by the system to keep
track of files in some internal open files table.
(actually, an array of structures with dedicated information,
but let's not worry about that, have a look at the
struct _IO_FILE declaration in "libio.h".
Now, let's elaborate on file descriptors already.
Since your standard input, output and error are already
opened for use and default on your system, you can
already use them. We need access to a function called
write to be able to write to them. In "unistd.h" is one.
The next program illustrates files usage:
#include <unistd.h>
int main()
{
if ((write(1, "Here is some data\n", 18)) != 18)
write(2, "A write error has occurred on file descriptor 1\n", 46);
exit(0);
}
name this program anything you like (we call it <filename>).
compile using "gcc <filename> -o <filename>".
now run with "./<filename>".
See what it does!
Stepping through line by line:
* include unistd.h and thereby it's library where it's from.
* int main() or void main() or something. "main()" is your
stepin point for your program and every program should have
one. Remember that! It says int main() to tell the compiler
this function WILL return a value at the end. So it should
contain an exit(<number>) or return(<number>); at the
end!
* opening and ending braces "{" and "}" give the context
of the function, the code that belongs to this function
itself. It's possible to have more functions in the same
file, they are enclosed with braces as well.
* the "if-line" is weird and needs more explaining:
the function write you use in the if-line is declared in <unistd.h>
as follows:
size_t write(int fildes, const void *buf, size_t nbytes);
This means it takes a file descriptor (fildes)
to send the buffer to,
takes a buffer (const void *buf) to write to that file
and the size of that
buffer at the end (so you may need to dynamically measure
the size of the sent buffer when you as a programmer are not
sure what size it is. Suppose you are reading another text
file line by line. You don't know the size of the lines and
thus the buffers you are reading. You read them in, determine
the size of the buffer and write the buffers away again,
along with the measured size of the buffer.).
"const void *buf" means these things:
"*" means it sends along a pointer to a memory location.
In this case it's the location of where the program was
stored in memory and then the start of the line "Here is".
"buf" is a variable that will receive the pointer.
"void" it can contain integers, strings, chars whatever.
"const" means the buffer cannot be changed within the
function 'write'. This is for protection of the data,
a neat way actually of the program, neat code! Per
Bothner is quite a nice guy!
"size_t" is an integer, but GNU dedicated "size_t" as
an integer that counts bytes. Don't think too hard about
this, cause your program will work anyway!
So, what happens is you want to write the line :
"Here is some data\n" to file 1 (standard output).
\n is replaced by ASCII code 10 (newline, but \ marks
an escape character, in this case \n), and you tell
the function you want to write 18 bytes of data to
this file. "write" itself returns a value, which is
the actual amount of bytes written to this file,
or less, and you compare
this to 18. Now, if you get 18 back from "write",
you'll know it all worked out ok. If you get -1
back, something went totally wrong and you should
raise an error. If it was less,
a maximum block size could have been the problem.
However, if the
two values differ, you'll know something went wrong
(taken into account it's only standard output).
You want to alert the user to this and write the
"something went wrong blabla" line to the file
descriptor 2, standard error. There is no way to
get other feedback to the user, since if this doesn't
work out it simply means the user has no terminal
on the computer. Only thing you can still do is
write something to the log. Finally,
* exit(0);
means the program exits with return code 0. Which
means successful exiting the program without core
dumps or segmentation faults (better leave it at
that remark, don't ask me what it's about, cause
I don't. Ppl doing assembly language and CPU abuse
would!).
Next, let's get kicking with "read" for the first
time. It is declared as follows:
size_t read(int fildes, void *buf, size_t nbytes);
it's like "write" a lot, so I'll only explain what
it does:
* it reads up to nbytes of data. No more is read
with every function call.
* it returns the amount of bytes actually read.
* if it returns 0 it had nothing to read (end of file).
* if it returns -1 there was an error! (file not
open for reading or simply not open).
simple_read.c:
==============
#include <unistd.h>
int main()
{
char buffer[128];
int nread;
nread = read(0, buffer, 128);
if (nread == -1)
write(2, "A read error has occurred\n", 26);
if ((write(1, buffer, nread)) != nread)
write(2, "A write error has occurred\n", 27);
exit(0);
}
===============
compile again "gcc simple_read.c -o simple_read"
something funny happens when executing. It takes off
128 bytes, but the rest is sent to the shell as a
command. Therefore, anything you type after that will
be considered as if you wanted to execute a file.
Two more ways to run it:
"echo hello there | ./simple_read"
"./simple_read < some_filename_with_text_in_it.txt"
Assignment:
===========
Figure out the program logic.
- When is the read error line triggered and for what?
- When is the write error line triggered and for what?
- Look at previous explanation and I'm sure you'll know.
Open:
=====
We've played around with three files now. It was fun,
it was a bitch (possibly), but we want to do some more.
We actually want to create our own files (on disk or
on your soundcard or your parallel port and read/write
from it).
Declaration of open in the library:
int open(const char *path, int oflags);
int open(const char *path, int oflags, mode_t mode);
Look up this declaration in the "fcntl.h" header file
and read the remarks.
It will look like:
extern int open __P ((__const char *__file, int __oflags, ...));
this is because some dude with a serious 'underscore'
addiction wrote it. (Not really, but I was too lazy
to look it up why the underscores appear. I'm sure
it's got to do with the way looking up the actual
function in the library works. Remember though that
the word "open" if called "__open" should be called
with "__open" if it's called that way, unless it's
called "open" in which case it should be called with
"open". Take "__P" (that is pee) away though and the
other two underscores after that and you'll be able
to read it again...
Declarations of "open" and file related matter are in:
"fcntl.h"
"sys/types.h"
"sys/stat.h"
any includes referenced in "fcntl.h"
"open" returns a file descriptor. We now know what that
is. A path of access to a file, a "file handle" in
windows, a handle from a bucket to be able to fill it
with water, or empty it in the streams. Two users can
have access to the same file, but the file descriptors
are different. This means data may become overwritten
when write access is used. (but not part of our challenge
yet!).
oflags:
=======
O_RDONLY Open for read only
O_WRONLY Open for write-only
O_RDWR Open for reading and writing
oflags, the second parameter, is used to control
file access and the way data should be written, some
additional parameters. But wait, it's only an int,
that means an integer, how can I stuff all that data
into one number?
I'm quite sure you have done some binary arythmetics.
1, 2, 4, 8, 16, 32, 64, 128. (blabla)
When OR-ing these values together, the resulting value
can be used to determine whether one setting was made or
not. Basically, we use #defines here to give a constant
value to a variable. Browse through "fcntlbits.h" to find
O_RDONLY and the lots defined. Because "fcntl.h" should
be included within programs with file access and fcntl.h
itself includes "fcntlbits.h", we have access to these
constants and #defines. Their values are different
from the array above though... anyways, we don't need
to concern us with details. Here are other parameters
to open it (also oflags):
O_APPEND place written data at the end of the file.
O_TRUNC set length of file to zero (kinda like delete).
O_CREAT create the file, with permissions set in "mode".
O_EXCL ensures that the caller of function actually creates
the file. (it's a multi-user OS!).
All above parameters for oflags are added as follows,
by bitwise OR-ing them. (using the character | on the
keyboard, look below for an example, after the definition
of the other parameters, I got the character above my
backslash on a US win95 (*ouch*) keyboard),
It means that each constant can be used to heighten a value,
but it's details yet.
(I'm bound to get questions here). It'll look like:
When you use O_CREAT, you should specify the permissions,
something which is done automatically when saving a file
from emacs or whatever. Have a look into "/sys/stat.h"
for the definition. Look for :
S_IRUSR = read permission for owner
S_IWUSR = write permission for owner
S_IXUSR = execute permission for owner
substitute USR with GRP for group privileges,
OTH for others permissions. (and look in stat.h).
example:
========
open("myfile", O_CREAT, S_IRUSR | S_IXOTH);
when this code is used in a program, you get the result:
(it's not compiled on my machine, taken from a book).
"ls -ls myfile"
"0 -r-------x 1 stupid_user software 0 Sep 22 08:11 myfile"
It's boring to let you read so much, so let's carry on
with close and compile a program already... umask should
have been stated here but that basically is just three
digits after each other stating privileges for
owner, group and others permissions.
digit 1 = user permissions,
digit 2 = group permission,
digit 3 = others permissions,
value 0 = no disallowments,
value 4 = read disallowed,
value 2 = write disallowed,
value 1 = execute disallowed.
So, 032 (a umask) has 0 in digit 1, 3 in digit 2, 2 in digit 3.
this means:
0 = no disallowments for user.
3 = 2 & 1 = write and execute disallowments for group.
2 = write disallowed for others (but execute allowed (weird setting anyway!)).
AAAAHHHHHHHHHHHHHH!!!!!!!!!!
It's boring, boring, boring!! let's get busy!
File copy program (char_copy.c):
================================
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
char c;
int in, out;
in = open("file.in", O_RDONLY);
out = open("file.out", O_WRONLY|O_CREAT, S_IRUSR|S_IWUSR);
while(read(in, &c, 1) == 1)
write(out,&c,1);
exit(0);
}
compile with "gcc char_copy.c -o char_copy"
"file.in" should exist, just copy it over there from
any text file and rename it to file.in in the same
process.
this file copies a file character by character to file.out.
this is not very efficient, I already made this remark about
system calls and overhead shit... here's where it starts,
the last example in this chapter will cover block copying and
when doing large files, you may notice the difference... sure,
you think, as a single-user to your multi-user environment,
this won't hit me much on my Cray@home, but once doing some
C/C++ for your organization, related to server management,
things start oozing around the corner there....
Quick! block_copy is last one today, just pick up char_copy.c
and look for any changes, save as block_copy.c... two lines
changed, char block[1024] was added and the while line changed.
block_copy:
===========
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
char block[1024];
int in, out;
int nread;
in = open("file.in", O_RDONLY);
out = open("file.out", O_WRONLY | O_CREAT, S_IRUSR|S_IWUSR);
while((nread = read(in,block,sizeof(block))) > 0)
write(out,block,nread);
exit(0);
}
Beat your heart out:
====================
* three includes, we know what they do...
* int main() (what? don't know what it means? re-read previous text);
* char block[1024], create array of 1024 chars, called "block".
* int in,out (two integers who will hold file descriptors);
* open (we just covered that!);
* while line:
interesting, I'll explain that one...
read from file descriptor "in", put into "block", determine
the size of "block" (this is 1024, always, no matter if
end of file is reached), assign actual amount of bytes read
to nread, and as long as this value is larger than zero,
we continue to write to the "out" file (file.out), which
is the block we just read (block) and the amount of bytes (nread),
we just read. When end of file is reached at "in", no more
bytes can be read, nread will become "zero", and the while-
loop will be exiting...
all goes well, no core dumps, no segmentation fault, so
program exits with code zero (safe!).
i think that's enough for now!
questions????