Post-fundamentals of C, behind the veils of libs, includes and file access - By AcidFlux

First of all, if things are not clear, try to find things on the internet. My closest friends are

www.altavista.com
astalavista.box.sk
www.yahoo.com


I take it you know how to search for things, but here are some tips still. Use a + sign between two words if you want links that have both words in the text, not just one. If you try to find something about GNU Public License for instance, try using the main keywords you won't find anywhere else that easily, like GNU+License, this will probably give you many URL's already to look for more information, more below...

To find any program on your Linux system, you can use the find command, explained into detail in the man pages. Just type "man find". I mostly use find / -name <filename>, which will search the complete system for the occurence of .

This document is intended to show you around a bit on a linux system, see where your files you are actually using are located and give you paths where to look for more information on this subject. Let's start with some practical things here.

UNIX is a commercial distribution for which you have to pay money. However, Linux, as a UNIX clone, has the same software that's used in UNIX. This software however, does not come from the commercial distributions, but has been written by many different programmers and freely contributed.

The software is subject to the GNU Public License, this means that it is 'copyleft'. This is intended to prevent others from putting 'copyright' on the software. A copy can be found on the internet very easily. The software from GNU closely mimics the software found on UNIX systems, but is not the same.

Some software you'll need throughout the course:

GCC <== your C-compiler.
G++ <== your C++ compiler, which will be invoked through GCC as well, when using flags.
gdb <== source code level debugger, I never used it though, so good time for me to start as well.
gnumake <== this is a version of UNIX make. It actually compiles installs source code through a compiler script.
If you've compiled the kernel before, you'll remember typing things like ./configure, make and make install make zlilo or whatever. The ./configure is where you configure the script and the make compiles the sources according to the script. make install links the object files and places them in the correct subdirs. make zlilo places your kernel on the / in the right place.
bison <== parser generator compatible with UNIX yacc (no experience herewith)
bash <== that rings a bell, it's the normal command shell for users.
emacs <== my favourite for editing. use any editor you like though. other examples are pico, vi, vim, joe, xjed...

some info:

First of all, you'll probably have compiled something before, exploits or port scanners and placed them somewhere in some bin directory. There's a difference between the root /bin and the usr/local/bin, since only root uses /bin and users can only execute progs from /usr/local/bin.

When DOS users compile a program, link it, they'll be able to execute it in that directory immediately. It's an .exe or .com there. However, in linux, you need to place it in the right directory before execution OR run it with a "./" in front to tell the kernel the program can be found in the current dir.

Let's browse around your system:

I'm running SuSE 6.0 and forgot where the files are for other distributions, but I expect them to be the same.

"gcc" is in my /usr/bin directory. It executes as both user and root.

your "library files" are located in some "lib" directory under i486-linux-libc5. My full path is

"/usr/lib"

Other lib-directories exist, but many times these are just versions of the /lib directory. I'm not completely sure, but I believe some libraries do still exist in the other directories under i486-linux-libc*x* that are upgrades of previous versions or used for that version only.

DLL's in windows are like library files in UN*X (yeah, correct, windows has in many ways actually been stolen from UN*X and Apple, but I guess you already knew that)... nowadays though, windows seems to have matured in it's own way and diverts quite from any UN*X system. Once in a while though, MS claims to have created this new technique or software which has already been around on UN*X for ages, but never advertised so much... getting back to the subject.

So the libraries contain all the functions and procedures you can use to talk to the system or use for your own convenience. For instance, when you invoke a sleep() command that pauses the program for a while, a procedure exists in the library that does exactly that. it's not possible to talk to the system with ASCII to make it do anything you want, a function/procedure has to be written first.
Normally, smart people who know ASM or C/C++ really take good care of this. By layering some interfaces, more dumber people can access the same functionality without diving into the code too much.

You must have seen #include in many occasions. Every program contains at least one. We call the files that are included to the program "header files". These files can include other header files at their time and so on. Be careful not to create cross-references, including a header file that includes the previous, it may give strange results. However, the compiler sees this, so don't worry too much.

In the header files is information about structures (oh shit, haven't explained these yet)... umm... imagine a structure to be something of an array. For instance, it can contain three separate numbers, four characters and another structure, why not? If you're not sure about this, just imagine a block of variables. The header file also reserves possible variable names and contains "links" to your library files (what functions are stored and how they're called and so on). Basically, you don't need to interfere too much with the header files, not even when you've progressed, so I won't bother your minds with em yet (here's a line that many mentors use when they're not so sure about a subject and want to hide their inexperience)

There are some subdirectories in there as well. Famous for hackers is "/usr/include/net" or "/usr/include/netinet", these header files are all about socket programming. Good thing is that for every protocol your system understands there's a header file. Try to find tcp.h and check it out, there's the implementation of TCP for linux, caress it!

Especially, check out struct tcphdr, it's a TCP header, you WON'T find this in windows, it's hidden through their implementation. Therefore, TCP headers cannot be changed in windows, except when you create a miniport driver and plug into the vxd-stuff. It's a virtual device driver, very low level that's passing the TCP packets on to the TCP/IP stack. When passing through, you can catch the headers and modify the headers there. Another way is to 'patch' the winsock.dll, but ah well, without the source, you're kinda lost in assembly there. These are the only ways I could ever think of. Don't ask me to do it though, cause it's beyond my level... You could use the info to look smart on IRC though... :)

Anyway, since it's in the linux headers and we'll be using raw sockets after some more, you have access to the headers and can freely modify them. Not easy, but it's not too hard as well.

Ok, for now, I have given you enough info. I'm sure you want to get a kick out of yourself by programming your first thing in linux. It'd be nice to actually see the program do anything.

Pick up your favourite editor and clear the buffer. Next, type in the following, it's just a simple "hello world!" program, but by keeping the previous in mind, I'm sure you suddenly get much more knowledge and insight in what the program is actually doing and calling. Especially the library calls and header files. Don't blame me for not giving you the source to any "smurf" or IP spoofing program, but they are freely around on the internet. If you just wish to "look" smart, go ahead and download them and compile them. We're here to actually learn something and get to understand that. After knowing it, you'll see the need for doing those things will dissolve.

#include <stdio.h> <== include the standard input/output header find it in the "include" dir and study it for a bit. Although you won't understand it (I even don't), you'll still see the function call "printf" somewhere. It's being called from the library files :)

This file should be saved as "hello.c"

#include <:stdio.h>

int main() // main function, always present in a C/C++ program.
{
printf("Hello world!\n"); // printf function
exit(0); // exit function.
}



Ok, compile it with

"gcc -o hello hello.c"

next, run it with

"./hello"

/* There ya go. Let's notice some things already. // is used to place any comment on the same line. If you wish to add more lines of commenting, use /* to open it and */ to close your comments, like is done in this paragraph. */

This way, the compiler knows what's there does not require compiling and no errors will come up.

"printf" is a very simple print function that simply displays something between the brackets on screen. The \n is a special character called a newline, there are more special characters like that, like \r for carriage return and whatever. You can often recognize them being 'escaped' by the \ - backslash character. The compiler interprets them as ASCII codes 10 for newline and 13 for carriage return. It's impossible to put these in the source code, so that's why the escaping sequence is used!

Well, you're free to put this proggie into your /usr/local/bin directory and run it once in a while to have your system greet the world, but it's not really of any use, so just discard it and browse some more through the system. I'm sure you have some .c files somewhere. Look what header files they use. By the way, a good tip. If you need to find information about a structure, the way it has been declared that is, browse through the header files, the declaration is in there!!! (kewl tip! kewl tip!)

Ok, I can understand your mind rings and you're numb now, but don't be discouraged, sleep on it and ask me questions, that's why I am mentor. I'll be hearing it when you guys are through reading it and want to get to know more! We're actually advancing quite fast, but well, I'm focusing on C/C++ put into practice, not actually understanding completely in theory what it's all about...

variables as substitutions in the line.

example:

printf("hehe %d is actually not really a number. know what %s? I think %d is better!", number1, string1, number2);

bill.c and fred.c


---------------------------

#include <stdio.h>

void bill(char *arg)
{
printf("bill: you passed %s\n", arg);
}



#include <stdio.h>

void fred(int arg)
{
printf("fred: you passed %d\n", arg);
}


Ok, so there exist some substitutions here and there. Here's a list:

%d is substituted by an integer
%i same
%u = unsigned int
%ld = long int
%f = float
%c = single character
%e = floats in exponential format (e.g: 1.2575484e+01)
%s = array of characters (array of char, e.g.: char title[255] = "ya dude!")
%p = pointer address

there are some other ways, but they're not really interesting at this point...

Now, we should compile the bill.c and fred.c programs into object modules. Do not use the -o option when compiling, cause this will also try to link it. That fails, because no main routine is created in the program.

Type:

"gcc -c bill.c fred.c"

This has created files "fred.o" and "bill.o"

Next, we will start to create a program that calls the function "bill". For that to work, we need to create a header file first.

lib.h:

----------------
/*
This is lib.h. it declares teh functions fred and bill for users
*/

void bill(char *);
void fred(int);


----------


program.c:

---------------

#include "lib.h"

int main()
{
bill("Hello world!");
exit(0);
}

----------

save this as "program.c"

so, what we have done now is create a reference to bill.o by including "lib.h".
This time, lib.h is not between <> characters. This is because <> specify the include file should be looked for in the general "include" directory, while "" characters specify it should be found in the current directory, that is, where the program is located.

Let's do:


"gcc -c program.c"

This has created an object file program.o. We need to link this program.o with bill.o in order for it to work:

gcc -o program program.o bill.o

This has linked program.o and bill.o together in "program". You can now execute ./program and it will look like as if the function bill was programmed into "program.c".

Ok, next step is to actually create a library, not link object modules together.

"ar crv lib.a bill.o fred.o"

and:

"ranlib lib.a"

'ar' is a general UNIX utility for archiving. You see it can be used for libraries as well. Because some UNIX systems, especially derived from Berkeley, need a table of contents we use the ranlib utility which does that. it's harmless to use, so just do it!

Final step:

"gcc -o program program.o lib.a"

there you go.... program has used an object module in the 'lib.a' archive and linked that object code into it's own program. This is all possible because of the header file and compiler options. executing "./program" does the same thing.

The bad thing is that the code is actually retrieved from the library and linked inside the program. Sometimes, you may wish to use shared libraries, which load themselves into memory and allow themselves to be shared across multiple programs. DLL's work exactly the same way. The code is loaded in memory, but cannot be executed by itself. However, the functions are ready for execution. once a program needs to use a published function or procedure, it transfers execution to the address, where the function/procedure is located and executes from there. Next, it returns to the program and continues execution from there. Multiple programs have access to this memory space, it works quite efficient. Linux has shared libraries as well, called .so files. The .sa libraries or .a libs are static libraries and you can link their functions in your own programs.

questions????

Escape characters:

Someone asked me a question about escape characters.
These characters are used to put in character codes in the text, without doing something ugly.

Special character codes comprise characters before code 32 dec or 20-hex. These are often characters that will not show up, because they are the same as system delimiters. For instance:

\a = ASCII bell character
\b = backspace character
\f = formfeed character
\n = newline character
\r = Carriage return (no linefeed)
\t = Horizontal tab
\v = Vertical tab
\\ = Backslash character
\' = single quote
\" = double quote
\? = Question mark
\nnn = ASCII value in octal
\xnnnn = ASCII value in hexadecimal


Don't ask me what a vertical tab is, I really wouldn't know. All chars above have not been tested in Linux, so I can't be sure if they work. For \n and \r I'm very sure, because that is really standard.

If you would program something like

void function blabla(void)
{
printf("look at this");
printf(" line");
}


it would print all the stuff on one line. Now, by using the \n at the end or somewhere in between where you like, you can separate the lines so they start on different lines, instead of writing the data on the same.

printf("look at this\n");

would be the solution to do it. Often, for system utils, there's a help function. They often print multi lines, in the program code, there are some printf's that use the same concept.

Let's start some bits on the command shell, that # or $ you see so frequently. While in many ways it's similar to DOS, it's much more and more powerful than DOS command.com.

A shell is available on every UN*X system and you can find out if simple ideas work. I'm not talking about big progs of course, but simple progs that will convert your text files or basically, just gadgets you want to run now and then.

UN*X has always been built very modular, that means it contains utilities that can be re-used by other programs or users.

For instance, 'more' is a utility that enables you to view an output of a list to a 'one-screen-at-a-time' view. This is done by piping the output of the list to to this utility, which receives the output and makes sure the screen is not scrolled at the end.

You can pipe many times, like

"man bash | col -b | lpr"

This will print a reference copy of the bash-shell. Col is a utility to take out backspaces and format the output a bit. lpr is the printer spooler, it prints the input to the printer as soon as resources become available.

For an example of shell 'programming', have a look at 'startx' , '.xinitrc', '/etc/rc.d' to see what it's all about. It's not really C++, but definitely some issues come forward that you want to know and have a look at. For C++ programming, you can often make use of utilities without having to reprogram the same functionality in your own utility.

A shell by definition:

"A shell is a program that acts as the interface between you and the UNIX system, allowing you to enter commands for the operating system to execute."

Because of this, it resembles DOS, but hides the details of Kernel operation for you. It's a sort of high level programming language for UNIX itself.

If we go further into Tcl (pronounced 'tickle') or Tk, we shall meet the 'tcsh' or 'wish' shells. For now, let's continue.

If you've worked around UNIX for a while, you'll know how to output the directory to a txt file or something, so you can read it later with 'less', which is nothing less than more, actually more.

"ls -l > lsoutput.txt"

Is the way to do it.

The standard output is in this way 're-directed' to the file system by using ">". If the file exists, it's overwritten, so by using ">>" you can append to any existing file.

By using the command "set -C" you can override the existing default behavior to overwrite existing files using redirection. It sets the 'noclobber' option.

There are three file descriptors:

0 = standard input to a program
1 = standard output
2 = standard error output


They are used by prefixing the ">" with those numbers. If you want to use the 'kill' command from a script for example, you can use these to generate error logs if something can't be killed or redirect success logs for everything that was successful.

"kill -1 1234 >killout.txt 2>killerr.txt"

This will 'kill -1' process '1234' and write a log of this to 'killout.txt', except when it failed, it will generate an error log to 'killerr.txt'

If you do not want to see any messages at all, there's a nice feature in UNIX, called the 'bit bucket'. A bit bucket is like a black hole, everything put in there simply disappears.

"kill - 1 1234 >/dev/null 2>&1"

This tells the system to redirect standard output to /dev/null and redirect standard error output to the same place as standard output (& + file descriptor, 1 = standard output).

A silly example to redirect input:

"more < killout.txt"

This will accept the file "killout.txt" as input to "more", but it's silly, cause more can accept parameters that will do the same thing.

More about pipes:
----------------

"ps > psout.txt"
"sort psout.txt > pssort.out"

will output the 'ps' command to 'psout.txt'. However, psout.txt is sorted to 'pssort.out', where a sorted list of all processes is stored. This can be piped to one single line of command like:

"ps | sort > pssort.out"

output of 'ps' is hereby piped to 'sort', which will store it's output to pssort.out.

I'm hammering so much on pipes, because later we'll do some programming in this respect and you really need to understand the concept.

"ps | sort | more"

is nice, it 'mores' out a list of ps, which is sorted.

how about this one ?

"ps -a | sort | uniq | grep -v sh | more"

it takes the output of 'ps', excluding shells, sorts it in alphabetical order, extracts processes using 'uniq', uses 'grep -v sh' to remove processes called 'sh', and finally displays paginated on the screen.

For hacking, you may want to know some of this stuff in order to take out things you do not want to show the webmaster. By re-compiling the utilities and storing them in the 'bin' again, you are cloaking your own existence in this way.

Programming the shell:
-----------------------


It's not really programming, but more like scripting the shell. All different shell commands and variables can be done through any script. Example without script:

$for file in *
>do
>if grep -l POSIX $file
>then
>more $file
>fi
>done

Run these commands from your shell, notice the ">" which tells you that you need to input more commands or parameters for the process to start working.
$file is the output of a sub-process, a sub-process is something like a 'find' routine in a script... like
"for file in *"
tells the system you want to search every directory on the system and you call every iterated file 'file'.
Then, you can use it the same way you use a variable with 'grep'. Now, it's the output of a sub-process. grep -l prints every file you found and has "POSIX" in it to the printer. Furthermore, it displays the contents of that file with 'more' on the screen.
Now, create a script that does about the same:
----------


#!/bin/sh

#first.sh
# This file looks through all the files in the current
# directory for the string POSIX and then prints those
# files on the standard output.

for file in *
do
if grep -q POSIX $file
then
more $file
fi
done


exit 0

---------------

'grep -q' suppresses standard output and stops on the first match. That means if a file has at least one match, it will continue to the next.

Do a 'man grep' to find out much more about this really useful tool.

This script is much like a cgi-script. I'd encourage you to look at it in more detail, since it will help you if you want to start cgi scripting later in your life or possibly other lessons about this subject...

The script is treated essentially as standard input to your shell, so setting your PATH parameter right, you can reference any UNIX command in your script and have it executed. It will give you the same authorisation as yourself..

Administration of scripts:
--------------------------

Now that you have a script we can run it two way, invoking the shell with the script filename parameter or simply running it from your current shell...

"/bin/sh first.sh"

would do it, or first change the file mode to executable:

"chmod +x first.sh"

then use "first.sh" to execute it, or, if the PATH environmental parameter is not set for you, run it with "./first.sh"

Hack:
-----

if the environmental parameters of 'root' are set like

'PATH=$PATH":."'

which means, look in the current directory to execute something, and then go to other-dir. Creating a hack-script, you can have the root user execute something he wouldn't have wanted. Remember, that a script works because it's like standard input, someone writing a malicious script can just have it sit there and possibly trick 'root' to run it. Moreover, if root/bin is not 'write-protected' (dream on!), you're home free possibly...

After shell programming, we really start to dive in into C/C++ programming, File access!!!

File access is really important and I'll try to delve as much into it as I can. You'll need file access a lot...

questions ????
At the moment I found a good book for developing Linux applications in the X-environment using GDK and GTK+. The advantage is that it gives you a good idea what object programming is all about. It's quite easy as well, if you understand it's concept. Since some OO concepts are hidden from the programmer in Windows, but not in GTK/GDK, it'll give you much better understanding about the OO - programming, what containers are etc... Don't worry about it yet though, it's future work and I'm here to tutor, not give a headache. And we still need a prelude to OO before we can take it on anyway.

This week I started on a Object - oriented C++ project in Windows. As soon as we get some more done in C, I'll divide the text up a bit into OO and advanced C (sockets/whatever).

Last time I showed some examples in Shell Programming. I told you guys I'd carry on with that, but since the content was quite basic and stuff uninteresting regarding UNIX environment, I'll decide to carry on with.... FILES!! ah yeah.. why so hard at once? hmm.. Cause the best way to learn things is by doing it. If you can't get to compile or have problems, lemme know, I may be able to give you some tips where to look.

FILES:

We all should know what files are by now, they're rather the same in DOS, except that file access in Linux is different. You won't be able to access your ext2 filesystem in DOS, however, from Linux you can. That is because many file system modules are inside the kernel or plugged in as a module. I can work through my windows files anyways, because I have told the system where it is mounted and what file system it contains. But that's Linux administration, let's carry on with C here.

The basic operations on files are creating/opening/ reading/writing and closing them. On top of that we need to organize files in directories, so you want to know how to create, scan and delete directories, for example.

In Linux, everything is a file. Therefore, file I/O and programming there-in is really important. When compiling your files as a user, you won't be able to cause havoc in the system though. Examples are the programs you are using to browse through directories or try to manage them. If you, as a user, are not permitted, not even a program will give you that access.

I hope you all know how files work in Linux. DOS has no control about who can read/write/execute files. NT has it though, in the implementation of NTFS, UNIX was originally built that way as a multi-user system (true multi-user, contrary to NT).

Information about a file is stored in the 'inode'. Picture this as some sort of File Allocation Table. The system will just use a number to access the file, but for our convenience, all files have a description, size and access privileges stored in the 'inode' as well.

A directory is in essence a file as well, but to us they seem transparent, as if files are stored 'within' a directory. When you delete a directory (and thereby a 'file'), you delete a 'link' to the file 'inode' and the system looses track of the information in the file.
Just like DOS, deleting a file merely deletes it's name. Or rather, it marks the file as 'not to be allocated' anymore. Next time another file is created, it'll know it can write to certain bytes on your hard disk. If you have not written to disk yet and the information is still there, you are able to 'undelete' the file by restoring the file name and thereby the link. With UN*X, you can't though, so files deleted will really be gone, unless you use the trick that is documented in Linux documentation, which should be available in your HOWTO's somewhere.

Because everything is a file, the hardware - specific control programming can be done by accessing the files through the device 'driver'. Suppose you want to search for a file somewhere. Doing that on a tape or on a harddisk or cdrom really differs. Luckily, these have been made transparent to us, so what we simply do is call "open", "write", "ioctl()", and other functions on our devices and the driver will handle the hardware-specific actions to take by itself.

Here are five system calls we can perform:

- open : to open a file or device for access
- read : to read from an open file or device
- write : to write to a file or device
- close : to close a file or device from access
- ioctl : specific control of the device itself


ioctl (input/output control) can for instance be used to rewind a tape drive or set flow control characteristics of a serial tape. Therefore, not all functionality can be called for every device/file.

do a:

"man open"
"man close"
"man ioctl"


to find out more, especially the last one is useful.

Some problems that we are facing is with the making of the system calls. Not that anyone cares to ring the kernel, but because it causes so much overhead. You actually step out of the program, execute its kernel code and switch back. It's more expensive than own function calls. Some hardware has restrictions on the size of the data you want to write. For instance, a tape can write 10K, but when you write 4K to the tape, it will only write 4K but advance the tape for 10K. That's a limitation. File access ain't usually that hard though.

The standard library functions we have for file access are in the standard I/O library. We have discussed header files and libraries in the previous three lessons. To elaborate on this:

* A library is like a DLL in windows, it has function calls you don't need to rewrite and they are already stable. (as in not unstable).

* A header file is an included file in your own program. The functions of the libraries are included in there, so when your program sees a call to a function in your program, it won't be scared shitless and provide no error.

So, we have the standard I/O library somewhere in the /usr/lib/ directory called "libstdio.a". Then we have the header file <stdio.h> in the /usr/include directory.
If you browse through the <stdio.h> file, you'll notice some things now, with clearer sight. I mentioned "open", "write" and "ioctl" functions. Do a "less stdio.h" and browse with me (asynchronously that is).

#defines are used to declare constants in a program. Suppose you want to check for errors later on. If you have a function for instance that is able to determine whether the guy at "www.icepick.com" has something in his fridge and you want to return one of the next codes:

0 = he has absolutely nothing in his fridge.
1 = he has something in his fridge.

So you start coding more functions with the same protocol.
This time functions that determine whether someone rang his doorbell, sat on his toilet, been in his kitchen or was sitting at his computer.
Now, when your boss wants 0 to be replaced by -1, because it looks better, you have to change every function you wrote and believe me, sometimes you cannot be sure you had every function. This increases bugs. So, what does a #define do?

Find "#define NULL 0" in the header file. This means that when you create a:

return(NULL);

it will actually do a "return(0);". But if your boss wants the return value to change to -1, you simply change the define into:

"#define NULL -1"

This means the program (with "return(NULL);"), will actually do a "return(-1);" this time. Isn't that smart?

FILES (for real this time):
===========================

There are three file descriptors already in your system. These are 0 for standard input, 1 for standard output, 2 for standard error. We saw last time how to redirect output to somewhere else.

A file descriptor is called a "handle" in windows. They took that name, because it is a reference to your file. The system will give you back a "handle" or descriptor when you successfully open a file for write or read access or whatever (you must specify what you want to do with a file).

This file descriptor is just like the above numbers, just an integer value. But it's used by the system to keep track of files in some internal open files table. (actually, an array of structures with dedicated information, but let's not worry about that, have a look at the struct _IO_FILE declaration in "libio.h".

Now, let's elaborate on file descriptors already. Since your standard input, output and error are already opened for use and default on your system, you can already use them. We need access to a function called write to be able to write to them. In "unistd.h" is one.

The next program illustrates files usage:

#include <unistd.h>

int main()
{


if ((write(1, "Here is some data\n", 18)) != 18)
write(2, "A write error has occurred on file descriptor 1\n", 46);
exit(0);
}

name this program anything you like (we call it <filename>). compile using "gcc <filename> -o <filename>". now run with "./<filename>".

See what it does!

Stepping through line by line:

* include unistd.h and thereby it's library where it's from.
* int main() or void main() or something. "main()" is your stepin point for your program and every program should have one. Remember that! It says int main() to tell the compiler this function WILL return a value at the end. So it should contain an exit(<number>) or return(<number>); at the end!
* opening and ending braces "{" and "}" give the context of the function, the code that belongs to this function itself. It's possible to have more functions in the same file, they are enclosed with braces as well.
* the "if-line" is weird and needs more explaining:

the function write you use in the if-line is declared in <unistd.h> as follows:

size_t write(int fildes, const void *buf, size_t nbytes);

This means it takes a file descriptor (fildes) to send the buffer to, takes a buffer (const void *buf) to write to that file and the size of that buffer at the end (so you may need to dynamically measure the size of the sent buffer when you as a programmer are not sure what size it is. Suppose you are reading another text file line by line. You don't know the size of the lines and thus the buffers you are reading. You read them in, determine the size of the buffer and write the buffers away again, along with the measured size of the buffer.).

"const void *buf" means these things:

"*" means it sends along a pointer to a memory location. In this case it's the location of where the program was stored in memory and then the start of the line "Here is".

"buf" is a variable that will receive the pointer.

"void" it can contain integers, strings, chars whatever.

"const" means the buffer cannot be changed within the function 'write'. This is for protection of the data, a neat way actually of the program, neat code! Per Bothner is quite a nice guy!

"size_t" is an integer, but GNU dedicated "size_t" as an integer that counts bytes. Don't think too hard about this, cause your program will work anyway!

So, what happens is you want to write the line : "Here is some data\n" to file 1 (standard output). \n is replaced by ASCII code 10 (newline, but \ marks an escape character, in this case \n), and you tell the function you want to write 18 bytes of data to this file. "write" itself returns a value, which is the actual amount of bytes written to this file, or less, and you compare this to 18. Now, if you get 18 back from "write", you'll know it all worked out ok. If you get -1 back, something went totally wrong and you should raise an error. If it was less, a maximum block size could have been the problem.
However, if the two values differ, you'll know something went wrong (taken into account it's only standard output).
You want to alert the user to this and write the "something went wrong blabla" line to the file descriptor 2, standard error. There is no way to get other feedback to the user, since if this doesn't work out it simply means the user has no terminal on the computer. Only thing you can still do is write something to the log. Finally,

* exit(0);

means the program exits with return code 0. Which means successful exiting the program without core dumps or segmentation faults (better leave it at that remark, don't ask me what it's about, cause I don't. Ppl doing assembly language and CPU abuse would!).

Next, let's get kicking with "read" for the first time. It is declared as follows:

size_t read(int fildes, void *buf, size_t nbytes);

it's like "write" a lot, so I'll only explain what it does:

* it reads up to nbytes of data. No more is read with every function call.
* it returns the amount of bytes actually read.
* if it returns 0 it had nothing to read (end of file).
* if it returns -1 there was an error! (file not open for reading or simply not open).


simple_read.c:
==============


#include <unistd.h>

int main()
{
char buffer[128];
int nread;

nread = read(0, buffer, 128);
if (nread == -1)
write(2, "A read error has occurred\n", 26);
if ((write(1, buffer, nread)) != nread)
write(2, "A write error has occurred\n", 27);

exit(0);
}

===============

compile again "gcc simple_read.c -o simple_read"

something funny happens when executing. It takes off 128 bytes, but the rest is sent to the shell as a command. Therefore, anything you type after that will be considered as if you wanted to execute a file.
Two more ways to run it:

"echo hello there | ./simple_read"

"./simple_read < some_filename_with_text_in_it.txt"

Assignment:
===========


Figure out the program logic.
- When is the read error line triggered and for what?
- When is the write error line triggered and for what?
- Look at previous explanation and I'm sure you'll know.


Open:
=====

We've played around with three files now. It was fun, it was a bitch (possibly), but we want to do some more.
We actually want to create our own files (on disk or on your soundcard or your parallel port and read/write from it).
Declaration of open in the library:
int open(const char *path, int oflags);
int open(const char *path, int oflags, mode_t mode);

Look up this declaration in the "fcntl.h" header file and read the remarks.
It will look like:

extern int open __P ((__const char *__file, int __oflags, ...));

this is because some dude with a serious 'underscore' addiction wrote it. (Not really, but I was too lazy to look it up why the underscores appear. I'm sure it's got to do with the way looking up the actual function in the library works. Remember though that the word "open" if called "__open" should be called with "__open" if it's called that way, unless it's called "open" in which case it should be called with "open". Take "__P" (that is pee) away though and the other two underscores after that and you'll be able to read it again...

Declarations of "open" and file related matter are in:

"fcntl.h"
"sys/types.h"
"sys/stat.h"
any includes referenced in "fcntl.h"

"open" returns a file descriptor. We now know what that is. A path of access to a file, a "file handle" in windows, a handle from a bucket to be able to fill it with water, or empty it in the streams. Two users can have access to the same file, but the file descriptors are different. This means data may become overwritten when write access is used. (but not part of our challenge yet!).

oflags:
=======

O_RDONLY Open for read only
O_WRONLY Open for write-only
O_RDWR Open for reading and writing

oflags, the second parameter, is used to control file access and the way data should be written, some additional parameters. But wait, it's only an int, that means an integer, how can I stuff all that data into one number?

I'm quite sure you have done some binary arythmetics.
1, 2, 4, 8, 16, 32, 64, 128. (blabla)

When OR-ing these values together, the resulting value can be used to determine whether one setting was made or not. Basically, we use #defines here to give a constant value to a variable. Browse through "fcntlbits.h" to find O_RDONLY and the lots defined. Because "fcntl.h" should be included within programs with file access and fcntl.h itself includes "fcntlbits.h", we have access to these constants and #defines. Their values are different from the array above though... anyways, we don't need to concern us with details. Here are other parameters to open it (also oflags):

O_APPEND place written data at the end of the file.
O_TRUNC set length of file to zero (kinda like delete).
O_CREAT create the file, with permissions set in "mode".
O_EXCL ensures that the caller of function actually creates the file. (it's a multi-user OS!).

All above parameters for oflags are added as follows, by bitwise OR-ing them. (using the character | on the keyboard, look below for an example, after the definition of the other parameters, I got the character above my backslash on a US win95 (*ouch*) keyboard),
It means that each constant can be used to heighten a value, but it's details yet.
(I'm bound to get questions here). It'll look like:
When you use O_CREAT, you should specify the permissions, something which is done automatically when saving a file from emacs or whatever. Have a look into "/sys/stat.h" for the definition. Look for :
S_IRUSR = read permission for owner
S_IWUSR = write permission for owner
S_IXUSR = execute permission for owner

substitute USR with GRP for group privileges, OTH for others permissions. (and look in stat.h).

example:
========

open("myfile", O_CREAT, S_IRUSR | S_IXOTH);

when this code is used in a program, you get the result: (it's not compiled on my machine, taken from a book).

"ls -ls myfile"

"0 -r-------x 1 stupid_user software 0 Sep 22 08:11 myfile"

It's boring to let you read so much, so let's carry on with close and compile a program already... umask should have been stated here but that basically is just three digits after each other stating privileges for owner, group and others permissions.

digit 1 = user permissions,
digit 2 = group permission,
digit 3 = others permissions,
value 0 = no disallowments,
value 4 = read disallowed,
value 2 = write disallowed,
value 1 = execute disallowed.

So, 032 (a umask) has 0 in digit 1, 3 in digit 2, 2 in digit 3. this means:
0 = no disallowments for user.
3 = 2 & 1 = write and execute disallowments for group.
2 = write disallowed for others (but execute allowed (weird setting anyway!)).

AAAAHHHHHHHHHHHHHH!!!!!!!!!!

It's boring, boring, boring!! let's get busy!

File copy program (char_copy.c):
================================

#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>

int main()
{
char c;
int in, out;

in = open("file.in", O_RDONLY);
out = open("file.out", O_WRONLY|O_CREAT, S_IRUSR|S_IWUSR);
while(read(in, &c, 1) == 1)
write(out,&c,1);


exit(0);
}

compile with "gcc char_copy.c -o char_copy"

"file.in" should exist, just copy it over there from any text file and rename it to file.in in the same process.

this file copies a file character by character to file.out. this is not very efficient, I already made this remark about system calls and overhead shit... here's where it starts, the last example in this chapter will cover block copying and when doing large files, you may notice the difference... sure, you think, as a single-user to your multi-user environment, this won't hit me much on my Cray@home, but once doing some C/C++ for your organization, related to server management, things start oozing around the corner there....

Quick! block_copy is last one today, just pick up char_copy.c and look for any changes, save as block_copy.c... two lines changed, char block[1024] was added and the while line changed.

block_copy:
===========

#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>

int main()
{
char block[1024];
int in, out;
int nread;

in = open("file.in", O_RDONLY);
out = open("file.out", O_WRONLY | O_CREAT, S_IRUSR|S_IWUSR);
while((nread = read(in,block,sizeof(block))) > 0)
write(out,block,nread);

exit(0);
}

Beat your heart out:
====================

* three includes, we know what they do...
* int main() (what? don't know what it means? re-read previous text);
* char block[1024], create array of 1024 chars, called "block".
* int in,out (two integers who will hold file descriptors);
* open (we just covered that!);
* while line:

interesting, I'll explain that one...
read from file descriptor "in", put into "block", determine the size of "block" (this is 1024, always, no matter if end of file is reached), assign actual amount of bytes read to nread, and as long as this value is larger than zero, we continue to write to the "out" file (file.out), which is the block we just read (block) and the amount of bytes (nread), we just read. When end of file is reached at "in", no more bytes can be read, nread will become "zero", and the while- loop will be exiting...

all goes well, no core dumps, no segmentation fault, so program exits with code zero (safe!).

i think that's enough for now!

questions????