João Pedro Favoretti - Blog: March 2024

Saturday, March 30, 2024

How to create a Shell: Linux Pipes

After having a program that can execute new programs, the next step would be to add new features to the program. One such feature is the pipe, which should behave just like that:

$ ls -l
total 52
drwxrwxr-x 2 joao joao  4096 mar 11 19:09 include
-rw-rw-r-- 1 joao joao   137 mar 11 19:17 Makefile
-rwxrwxr-x 1 joao joao 28768 mar 30 08:49 shell
drwxrwxr-x 2 joao joao  4096 mar 29 18:19 src
drwxrwxr-x 2 joao joao  4096 mar 30 08:12 tests
$ ls -l | wc -l
6
$ ls -l | rev 
25 latot
edulcni 90:91 11 ram 6904  oaoj oaoj 2 x-rxwrxwrd
elifekaM 71:91 11 ram 731   oaoj oaoj 1 --r-wr-wr-
llehs 94:80 03 ram 86782 oaoj oaoj 1 x-rxwrxwr-
crs 91:81 92 ram 6904  oaoj oaoj 2 x-rxwrxwrd
stset 21:80 03 ram 6904  oaoj oaoj 2 x-rxwrxwrd
$ ls -l | rev | rev  
total 52
drwxrwxr-x 2 joao joao  4096 mar 11 19:09 include
-rw-rw-r-- 1 joao joao   137 mar 11 19:17 Makefile
-rwxrwxr-x 1 joao joao 28768 mar 30 08:49 shell
drwxrwxr-x 2 joao joao  4096 mar 29 18:19 src
drwxrwxr-x 2 joao joao  4096 mar 30 08:12 tests

In the first example using pipe, we can see that all the content from the ls -l command goes to the wc -l command. Therefore, the wc command can properly count the number of lines that were printed out by the ls command.

To do that, the shell has to create a pipe that can receive information at one end and push it to the other end so that another process can read information. To be more explicit, what really happens is something like this:

Fortunately, all the work that is associated with the information flow is handled by the operating system. The only thing the shell has to do is to create the pipe and define what is at each end.

But how can we create a pipe? Well, that is easy to create and a little hard to use. To create it, you can just call the syscall pipe(2), and use the provided example. However, if you want information about how it works, refer to the pipe(7) manual page.

One thing at a time. First, how do we call it?

#include <unistd.h>

int pipe(int pipefd[2]);

So, taking only the information from the beginning of the manual page, we could use it like this:

int main(int argc, char *argv[]) {
    int pipefd[2];

    pipe(pipefd);

    return 0;
}

It still doesn't do much, but first, let us understand what the array pipefd is. Remember back in the first post, File Descriptors are just integer numbers that we use to tell the operating system where to read from or write to. Well, the same thing can be used in pipes to tell the OS which end of the pipe we want to access. Each pipe has two ends:

So, the OS opens two different File Descriptors (stored at pipefd[0] and pipefd[1]), one for each end. We use pipefd[1] if we want to Write something to the pipe, and pipefd[0] if we want to Read something from the pipe.

First, here is an example using error handling and printing out each file descriptor returned by the pipe syscall:

int main(int argc, char *argv[]) {
    int pipefd[2];

    if (pipe(pipefd) == -1) {
        perror("pipe");
        return 1;
    }

    printf("pipefd[0]: %d\n", pipefd[0]);
    printf("pipefd[1]: %d\n", pipefd[1]);

    return 0;
}

Additionally, here is an easy example to understand how you can use the syscalls write and read using a pipe:

int main(int argc, char *argv[]) {
    char write_msg[] = "Hello, World!";
    char read_msg[sizeof(write_msg)];

    int pipefd[2];

    pipe(pipefd);

    write(pipefd[1], write_msg, sizeof(write_msg));    

    read(pipefd[0], read_msg, sizeof(read_msg));

    printf("read_msg: \"%s\"\n", read_msg);

    return 0;
}

* You can only write to pipefd[1] and read from pipefd[0].

Alternatively, you can think of a Pipe as an OS inner buffer that stores data and gives you two different file descriptors to access it. The special thing about pipes is that it is shared between processes, so if you adapt the fork example from the first post, you can read and write stuff between different processes:

int main(int argc, char *argv[]) {
    int pipefd[2];
    pipe(pipefd);

    if (!fork()) {
        char write_buffer[] = "Hello!";
        write(pipefd[1], write_buffer, sizeof(write_buffer));
        printf("Child wrote: \"%s\"\n", write_buffer);
    } else {
        char buffer[1024] = {0};
        read(pipefd[0], buffer, sizeof(buffer));
        printf("Parent read: \"%s\"\n", buffer);
        wait(NULL);
    }

    return 0;
}

That example should work, you can see that we used the Read End of the pipe in the Parent Process and the Write End of the pipe in the Child Process, so that both processes can communicate.

Since the fork(2) syscall copies the process, each process has its own copy of pipefd[0] and pipefd[1], however, neither process is using both File Descriptors, so we should close the File Descriptors that are not being used:

int main(int argc, char *argv[]) {
    int pipefd[2];
    pipe(pipefd);

    if (!fork()) {
        char write_buffer[] = "Hello!";
        close(pipefd[0]);
        write(pipefd[1], write_buffer, sizeof(write_buffer));
        printf("Child wrote: \"%s\"\n", write_buffer);
    } else {
        char buffer[1024] = {0};
        close(pipefd[1]);
        read(pipefd[0], buffer, sizeof(buffer));
        printf("Parent read: \"%s\"\n", buffer);
        wait(NULL);
    }

    return 0;
}

It is always good practice to close unused file descriptors, especially when handling multiprocessing and pipes. That is because if you have multiple processes with pipefd[0] open, the OS will figure that there still might be something to be entered, so it will hang the program, causing errors that are difficult to track because of multiprocessing.

But taking a few more steps, what if you want to try to create a pipe operation just like the one we tried in the shell:

$ ls -l | wc -l

Well, to do that you would have to create one pipe, execute both programs ls and wc, and redirect everything that goes to ls stdout to the Write End of the pipe and use everything that is at the Read End of the pipe as stdin for the wc program.

int main(int argc, char *argv[]) {
    int pipefd[2];
    pipe(pipefd);

    if (!fork()) {
        close(pipefd[0]);

        dup2(pipefd[1], 1);

        execl("/bin/ls", "ls", "-l", NULL);
    } else {
        close(pipefd[1]);

        dup2(pipefd[0], 0);

        execl("/usr/bin/wc", "wc", "-l", NULL);
    }

    printf("This line should not execute!\n");

    return 0;
}

And if you don't close the unused File Descriptors, that is where pipes start to become a problem, take a moment to try the following code, and see that it does not finish:

int main(int argc, char *argv[]) {
    int pipefd[2];
    pipe(pipefd);

    if (!fork()) {
        dup2(pipefd[1], 1);

        execl("/bin/ls", "ls", "-l", NULL);
    } else {
        dup2(pipefd[0], 0);

        execl("/usr/bin/wc", "wc", "-l", NULL);
    }

    printf("This line should not execute!\n");

    return 0;
}

Since the parent process still has pipefd[0] opened, the wc program still thought that there might be some more input available, so it didn't finish.

However, even though our second-last program executes the pipe operation correctly, what should we do if wanted our program to continue execution after running ls -l | wc -l. Well, you would have to fork() one more time, just like this:

int main(int argc, char *argv[]) {

    if (!fork()) {
        int pipefd[2];
        pipe(pipefd);
        
        if (!fork()) {
            close(pipefd[0]);

            dup2(pipefd[1], 1);

            execl("/bin/ls", "ls", "-l", NULL);
        } else {
            close(pipefd[1]);

            dup2(pipefd[0], 0);

            execl("/usr/bin/wc", "wc", "-l", NULL);
        }
    } else {
        wait(NULL);
    }

    printf("This line should execute!\n");

    return 0;
}

The bad side of that solution is that it will create an unsustainable amount of if-else conditions. So if you wanted to create one more pipe, it would start becoming messy. To solve that, you can do the same thing as the last example, using the following:

int main(int argc, char *argv[]) {
    int pipefd[2];
    pipe(pipefd);

    if (!fork()) {
        close(pipefd[0]);

        dup2(pipefd[1], 1);

        execl("/bin/ls", "ls", "-l", NULL);

    }

    if (!fork()) {
        close(pipefd[1]);

        dup2(pipefd[0], 0);

        execl("/usr/bin/wc", "wc", "-l", NULL);
    }

    close(pipefd[0]);
    close(pipefd[1]);        

    wait(NULL);

    printf("This line should execute!\n");

    return 0;
}

So now, if you want to nest more pipes, such as:

ls -l | rev | wc -l

You would have to create two pipes, doing as follows:

int main(int argc, char *argv[]) {
    int pipefd1[2];
    pipe(pipefd1);

    if (!fork()) {
        close(pipefd1[0]);

        dup2(pipefd1[1], 1);

        execl("/bin/ls", "ls", "-l", NULL);

    }

    int pipefd2[2];
    pipe(pipefd2);

    if (!fork()) {
        close(pipefd1[1]);

        dup2(pipefd1[0], 0);
        dup2(pipefd2[1], 1);

        execl("/usr/bin/rev", "rev", NULL);
    }

    close(pipefd1[0]);
    close(pipefd1[1]);
    
    if (!fork()) {
        close(pipefd2[1]);

        dup2(pipefd2[0], 0);

        execl("/usr/bin/wc", "wc", "-l", NULL);
    }
       
    close(pipefd2[0]);
    close(pipefd2[1]);        

    wait(NULL);

    printf("This line should execute!\n");

    return 0;
}

And that should be enough information to increment your shell with the pipe feature.

References

1. Implementing a Job Control Shell. https://www.gnu.org/software/libc/manual/html_node/Implementing-a-Shell.html

Sunday, March 17, 2024

How to create a Shell: Linux process creation

Any Linux user has already used a shell to get something compiled or installed. But just some have already built a shell of their own. Therefore, in this post, I would like to explain how processes are created and how to manipulate process file descriptors. With that said, you can create your own shell that might work just like this:

cmd> /usr/bin/ls
1.in  Makefile  hey  hey.c  in.txt  include  out.txt  output.txt  shell  src
cmd> /usr/bin/ls -l
total 72
-rw-rw-r-- 1 joao joao    76 Mar 12 19:50 1.in
-rw-rw-r-- 1 joao joao   137 Mar 11 19:17 Makefile
-rwxrwxr-x 1 joao joao 16064 Mar 15 14:21 hey
-rw-rw-r-- 1 joao joao   280 Mar 15 14:21 hey.c
-rw-rw-r-- 1 joao joao     5 Mar 12 19:47 in.txt
drwxrwxr-x 2 joao joao  4096 Mar 11 19:09 include
-rw-rw-r-- 1 joao joao     8 Mar 12 19:50 out.txt
-rw-rw-r-- 1 joao joao    11 Mar 11 23:13 output.txt
-rwxrwxr-x 1 joao joao 23320 Mar 12 19:48 shell
drwxrwxr-x 2 joao joao  4096 Mar 11 19:09 src
cmd> ./hey   
Enter your name: Joao
Hello Joao!
cmd> ./hey Joao
Hello Joao
cmd> ./hey Joao > out.txt 
cmd> /usr/bin/cat out.txt 
Hello Joao

* Note that in that simple shell, the program path is used (e.g. /usr/bin/ls instead of the alias ls). That is done to simplify the shell, but a good exercise would be to use the PATH environment variable to allow users to use those aliases.

During some of the explanations, I find it easier to understand certain functions if we know how they are implemented. Thus, I will take examples from the very first version of Linux, v0.01 [1]. That is due to its simplicity, and even though the main algorithm might be the same nowadays, the code might not be as friendly.

What is a System Call

In this section, I want to take a second and explain what a system call is, and how a user might use it.

When you are writing code in C, you might have to call your own functions to solve specific tasks, such as a program that might calculate the power of a number:

int power(int a, int b) {
    int result = 1;
    for (int i = 0; i < b; i++) {
        result *= a;
    }
    return result;
}

int main() {
    int a = 2;
    int b = 3;
    printf("2 ^ 3 = %d\n", power(a, b));
    return 0;
}

However, there are more complex tasks that you may want to use in a program, such as telling the shell to display a letter on the screen. Particularly, that is not a complex task to think about, but it is complex system-wise because you need to make your program communicate the letter to the shell so that it can display it on the screen.

Fortunately, someone very smart had this problem way before us, because they created a fundamental function called write() that allows your program to communicate that letter correctly. But how would I use it in a C code?

First of all, you would have to look at the manual page for that function to figure out what parameter it accepts, and which header files you have to include in your program to use it. You would face something like this:

#include <unistd.h>

ssize_t write(int fd, const void *buf, size_t count);

It simply means that, to call that system call you would have to indicate what file should the operating system make this write to, what buffer contains the data that will be used, and how many bytes should be used. As an example, there is a very simple program that uses the write() system call.

#include <unistd.h>

int main() {
    char buffer[] = "Hello, World!\n";
    write(1, buffer, sizeof(buffer));
}

After the buffer was defined, we simply told the operating system to write it to the stdout, where the shell might be expecting something as input to show the user.

Besides that, there are many other system calls shown on the syscalls manual page. Another example that can be used is the read() which has a similar reason to be used as the write() system call, that is why I am not spending time with it. Instead, I will show how the nanosleep() system call can be used.

Repeating the process that was done before, and looking at the nanosleep() manual page, we can see the parameters expected by that function.

#include <time.h>

int nanosleep(const struct timespec *req,
              struct timespec * rem);

That function declaration means that the function must receive a struct containing the time that the process was requested to sleep for, and a struct that the OS will use in case there are any problems during execution but there is still some time remaining. Next, all that is left is knowing what is inside the struct timespec which is easily solved if you are using an IDE. Nevertheless, if you are not, you can check the time.h header file and figure out for yourself:


#ifndef _STRUCT_TIMESPEC
#define _STRUCT_TIMESPEC
struct timespec {
	__kernel_old_time_t	tv_sec;		/* seconds */
	long			tv_nsec;	/* nanoseconds */
};
#endif

Finally, with all variable types searched, using the syscall is as simple as that:

#include <time.h>

int main() {
    struct timespec req = {2, 0};
    struct timespec rem;
    nanosleep(&req, &rem);
}

All things considered, we can go back to the idea of creating a shell.

Process Creation

In the example shown in the beginning, the program /usr/bin/ls was used to display the files in the current working directory. In this section, I will explain how to create a brand-new process with the content that is stored at /usr/bin/ls.

First of all, in the Linux operating system, the information about all processes is stored in a struct called task_struct:

struct task_struct {
/* these are hardcoded - don't touch */
	long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
	long counter;
	long priority;
	long signal;
	fn_ptr sig_restorer;
	fn_ptr sig_fn[32];
/* various fields */
	int exit_code;
	unsigned long end_code,end_data,brk,start_stack;
	long pid,father,pgrp,session,leader;
	unsigned short uid,euid,suid;
	unsigned short gid,egid,sgid;
	long alarm;
	long utime,stime,cutime,cstime,start_time;
	unsigned short used_math;
/* file system info */
	int tty;		/* -1 if no tty, so it must be signed */
	unsigned short umask;
	struct m_inode * pwd;
	struct m_inode * root;
	unsigned long close_on_exec;
	struct file * filp[NR_OPEN];
/* ldt for this task 0 - zero 1 - cs 2 - ds&ss */
	struct desc_struct ldt[3];
/* tss for this task */
	struct tss_struct tss;
};

Looking at that struct you can find out what are all the features that are stored by the OS for each process that is created, such as the state, priority, id. All that is needed to be known by the system to decide which process is executed at each time.

You don't really need to understand what all of those fields mean, I put it there just for you to understand deeply what each system call is doing. Additionally, as you would expect of a mu nniprogramming operating system, there is an array of task_struct, called task, to store all the created processes.

struct task_struct * task[NR_TASKS] = {&(init_task.task), };

Having said that, we should seek an operating system API (also known as system call) that adds a new process to that array. If you care to search for it, you will find the fork() system call, defined inside the Linux Kernel, which is the only routine that adds a newly allocated variable inside the array that stores all processes. For us to understand what it does, there is the assembly code that is executed when a user executes the fork() system call:

_sys_fork:
	call _find_empty_process
	testl %eax,%eax
	js 1f
	push %gs
	pushl %esi
	pushl %edi
	pushl %ebp
	pushl %eax
	call _copy_process
	addl $20,%esp
 	ret

Although there are many operations, we can focus just on the call operations that execute the functions _find_empty_process() and _copy_process(). Those function names are very interesting so we can find out what the syscall does by looking at them.

First, the OS looks for an empty place on the tasks array and then it defines a new task_struct to place it in there with exactly the same content that is inside the process that is called the fork() syscall. That behavior also gives the syscall its name, because it works like a "process mitosis".

Furthermore, it is possible to check even how the find_empty_process and copy_process functions work. However, besides some error-checking code, they work just like you would have thought they work:

int find_empty_process(void)
{
	int i;

	repeat:
		if ((++last_pid)<0) last_pid=1;
		for(i=0 ; i<NR_TASKS ; i++)
			if (task[i] && task[i]->pid == last_pid) goto repeat;
	for(i=1 ; i <NR_TASKS ; i++)
		if (!task[i])
			return i;
	return -EAGAIN;
}

/*
 *  Ok, this is the main fork-routine. It copies the system process
 * information (task[nr]) and sets up the necessary registers. It
 * also copies the data segment in it's entirety.
 */
int copy_process(int nr,long ebp,long edi,long esi,long gs,long none,
		long ebx,long ecx,long edx,
		long fs,long es,long ds,
		long eip,long cs,long eflags,long esp,long ss)
{
	struct task_struct *p;
	int i;
	struct file *f;

	p = (struct task_struct *) get_free_page();
	if (!p)
		return -EAGAIN;
	*p = *current;	/* NOTE! this doesn't copy the supervisor stack */
	p->state = TASK_RUNNING;
	p->pid = last_pid;
	p->father = current->pid;
	p->counter = p->priority;
	p->signal = 0;
	p->alarm = 0;
	p->leader = 0;		/* process leadership doesn't inherit */
	p->utime = p->stime = 0;
	p->cutime = p->cstime = 0;
	p->start_time = jiffies;
	p->tss.back_link = 0;
	p->tss.esp0 = PAGE_SIZE + (long) p;
	p->tss.ss0 = 0x10;
	p->tss.eip = eip;
	p->tss.eflags = eflags;
	p->tss.eax = 0;
	p->tss.ecx = ecx;
	p->tss.edx = edx;
	p->tss.ebx = ebx;
	p->tss.esp = esp;
	p->tss.ebp = ebp;
	p->tss.esi = esi;
	p->tss.edi = edi;
	p->tss.es = es & 0xffff;
	p->tss.cs = cs & 0xffff;
	p->tss.ss = ss & 0xffff;
	p->tss.ds = ds & 0xffff;
	p->tss.fs = fs & 0xffff;
	p->tss.gs = gs & 0xffff;
	p->tss.ldt = _LDT(nr);
	p->tss.trace_bitmap = 0x80000000;
	if (last_task_used_math == current)
		__asm__("fnsave %0"::"m" (p->tss.i387));
	if (copy_mem(nr,p)) {
		free_page((long) p);
		return -EAGAIN;
	}
	for (i=0; i < NR_OPEN;i++)
		if (f=p->filp[i])
			f->f_count++;
	if (current->pwd)
		current->pwd->i_count++;
	if (current->root)
		current->root->i_count++;
	set_tss_desc(gdt+(nr<<1)+FIRST_TSS_ENTRY,&(p->tss));
	set_ldt_desc(gdt+(nr<< 1)+FIRST_LDT_ENTRY,&(p->ldt));
	task[nr] = p;	/* do this last, just in case */
	return last_pid;
}

That might lead to a very interesting question: If all processes are forked, where does the first process come from? Well, if you look at your Linux Operating system to check which is the process with ID 1, you can see that it came from a program called init.

$ ps aux | head -n 2
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.1  0.0 167660 12848 ?        Ss   11:57   0:43 /sbin/init splash

Moreover, you can look inside the Linux source code and find what the init.c file really is, but it essentially initializes some code and runs the first shell of the system:

void main(void)		/* This really IS void, no error here. */
{			/* The startup routine assumes (well, ...) this */
/*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
 */
	time_init();
	tty_init();
	trap_init();
	sched_init();
	buffer_init();
	hd_init();
	sti();
	move_to_user_mode();
	if (!fork()) {		/* we count on this going ok */
		init();
	}
/*
 *   NOTE!!   For any other task 'pause()' would mean we have to get a
 * signal to awaken, but task0 is the sole exception (see 'schedule()')
 * as task 0 gets activated at every idle moment (when no other tasks
 * can run). For task0 'pause()' just means we go check if some other
 * task can run, and if not we return here.
 */
	for(;;) pause();
}

void init(void)
{
	int i,j;

	setup();
	if (!fork())
		_exit(("/bin/update",NULL,NULL));
	(void) open("/dev/tty0",O_RDWR,0);
	(void) dup(0);
	(void) dup(0);
	printf("%d buffers = %d bytes buffer space\n\r",NR_BUFFERS,
		NR_BUFFERS*BLOCK_SIZE);
	printf(" Ok.\n\r");
	if ((i=fork()) < 0)
		printf("Fork failed in init\r\n");
	else if (!i) {
		close(0);close(1);close(2);
		setsid();
		(void) open("/dev/tty0",O_RDWR,0);
		(void) dup(0);
		(void) dup(0);
		_exit(execve("/bin/sh",argv,envp));
	}
	j=wait(&i);
	printf("child %d died with code %04x\n",j,i);
	sync();
	_exit(0);	/* NOTE! _exit, not exit() */
}

That process is the father of all processes that are executed. In fact, the process that executes the fork() operation is called "father" and the process that is created is called "child". Furthermore, that relationship is so important for the OS, that it is stored inside the task_struct as the parameter "father".

Back to the track, how do you use the fork() system call? Repeating the important process of looking at the manual page we can find the following declaration:

#include <unistd.h>

pid_t fork(void);

Quite easy, it means that I program like that executes correctly the fork system call:

#include <sys/types.h>
#include <unistd.h>

int main() {

    fork();

    return 0;
}

Right, but after the child process is created, where is the next line that the father process executes? And what about the child process? If we try adding a print function just behind the fork call, we can find that out:

#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>

int main() {

    fork();

    printf("Hey!\n");

    return 0;
}

As we can see in the result, the print function will be executed twice, once by the father-process and later by the child-process. Interestingly, it is not executed necessarily in that order. In other words, the child-process can be executed first if the OS decides so.

But if both processes continue the execution just below the fork() function, it does not help us much if we can't control what each process executes independently. That is why the return value of the fork() function is used. If we take a look at what is return, we can see that there are different return values on each execution, so that we can control what each execution is doing.

int main() {
    pid_t pid = fork();

    printf("Hey! %d\n", pid);

    return 0;
}

If we execute it many times we will see that one of the PIDs printed changes, but there is one that is always zero. Therefore, to understand which process receives each value, we can go back to the man page at the RETURN VALUE section from the manual page:

RETURN VALUE
       On success, the PID of the child process is returned in the
       parent, and 0 is returned in the child.  On failure, -1 is
       returned in the parent, no child process is created, and errno is
       set to indicate the error.

All that means that the process that receives 0 is the child-process, and the process that something else is the father-process. So to control what each process executes, we can do something like that:

int main() {
    pid_t pid = fork();

    if (pid == 0) {
        printf("Child-Process\n");
    } else {
        printf("Parent-Process\n");
    }

    return 0;
}

Also, if you look closely at the init function, you can see that this exact strategy was used to create a new shell. But now there is a tricky part, what happens when we want the child-process to ask for the user input?

int main() {
    pid_t pid = fork();

    if (pid == 0) {
        printf("Child-Process\n");
        
        int number;
        printf("Enter a number: ");
        scanf("%d", &number);
        printf("You entered: %d\n", number);

    } else {
        printf("Parent-Process\n");
    }

    return 0;
}

You will figure out that the execution is canceled before you can try to type anything in the terminal. That is because both father and child processes are executed in parallel, but as soon as the father process is done, the shell prompt is executed again. Thus, to solve that we can execute another syscall in the father process for it to wait for the child to execute, such as the nanoseconds() syscall:

int main() {
    pid_t pid = fork();

    if (pid == 0) {
        printf("Child-Process\n");
        
        int number;
        printf("Enter a number: ");
        scanf("%d", &number);
        printf("You entered: %d\n", number);

    } else {
        printf("Parent-Process\n");

        struct timespec req = {10, 0};
        struct timespec rem;
        nanosleep(&req, &rem);
    }

    return 0;
}

Great, now you have 10 seconds to enter something inside the child process. However, the use case of having the father-process to wait for the child-process seems too regular for a solution like that. In fact, the solution to that process is to use another syscall called wait(), which was built exactly for this use case:

int main() {
    pid_t pid = fork();

    if (pid == 0) {
        printf("Child-Process\n");
        
        int number;
        printf("Enter a number: ");
        scanf("%d", &number);
        printf("You entered: %d\n", number);

    } else {
        printf("Parent-Process\n");

        wait(NULL);
    }

    return 0;
}

In that program, the father-process executes the print function and hangs until the child process is terminated.

Although using fork() that way is useful for parallel processing, there is another usage of process creation which is to create a new process with the content of another binary file. Going back to the shell idea, the process being executed is the shell process, so if the user might want to execute a binary to list the files in the current working directory (e.g. /usr/bin/ls) or a binary to print out the content of a specific file (e.g. /usr/bin/cat), we have to create a new process using fork() with the instructions inside those binary files.

Lucky for us, there is a syscall called execve() that can do just that. Referencing the manual page again we can find out the function declaration:

#include <unistd.h>

int execve(const char *pathname, char *const argv[], char *const envp[]);

That function will replace all the process memory with the content of the binary that you reference using the pathname variable. Consider the following example that uses the execve() syscall:

int main() {
    char *const argv[] = {"/usr/bin/ls", "-l", NULL};
    
    execve("/usr/bin/ls", argv, NULL);

    printf("This line should not be printed\n");

    return 0;
}

That function receives the path from the binary that you want it to use, and then it allows the user to give arguments to the binary using the argv array, and environment variables using the envp array. Those variables are the same ones that you can access if you create a C program like that:

int main(int argc, char *argv[], char *envp[]) {
    
    /* Your Code Here */

    return 0;
}

* Note that both arrays argv and envp arrays must finish with a NULL pointer, because otherwise, the program that is being executed will not be able to know when the array ends.

* Also, note that the process changes itself completely when the execve is executed, and even though the execve() example has a print function to execute after the execve, it will never be called.

As a result of that, we can improve the fork() example by making the child process execute another binary, just like that:

int main() {
    pid_t pid = fork();

    if (pid == 0) {
        
        char pathname[] = "/usr/bin/ls";
        char *argv[] = {pathname, "-l", NULL};
        char *envp[] = {NULL};

        execve(pathname, argv, envp);

    } else {
        wait(NULL);
    }

    return 0;
}

Manipulating File Descriptors

Next, the second feature of your Mini-Shell is being able to use the content of a file as input or redirect the output of the process to a certain file, just like in the example:

cmd> /bin/cat in.txt 
Joao
cmd> ./hey < in.txt
Enter your name: Hello Joao!
cmd> ./hey < in.txt > out.txt 
cmd> /bin/cat out.txt 
Enter your name: Hello Joao!

Every time you open a file to write or read it, you receive a File Descriptor representing that file that can be used just inside the process that called open(). That File Descriptor is an integer number that serves as an identifier of that opened file. With that ID, you can tell the OS which file it is supposed to "read from" or "write to". Take the following example:

int main() {
    int fd = open("in.txt", O_RDONLY);
    printf("fd: %d\n", fd);
    return 0;
}

If you run the example above you will see that the opened file is now identified as 3 to the process, and you will use that number on the read/write calls. Additionally, that value means that File Descriptors 0, 1, and 2 already exist. Indeed they do exist and they are:

0: Standard Input.
1: Standard Output.
2: Standard Error.

Those are not exactly files with bytes stored in your disk but they are files to the OS. And you use them when:

0: You want to read something from the user. So you can only read from it.
1: You want to write something to the user. So you can only write to it.
2: You want to write an error to the user. So you can only write to it.

There is a code to show an example of that usage:

int main() {
    char buffer[1024] = {0};
    
    ssize_t count = read(0, buffer, 1023);

    if (count == -1) {
        write(2, "An error occurred in the read.\n", 31);
    } else {
        write(1, buffer, count);
    }
}

In summary, the next opened File Descriptor was 3 because it was the next on the list of available identifiers.

Furthermore, you can go back to the Linux source code to check exactly what happens to the task_struct of the process that calls the syscall to open a file:

int sys_open(const char * filename,int flag,int mode)
{
	struct m_inode * inode;
	struct file * f;
	int i,fd;

	mode &= 0777 & ~current->umask;
	for(fd=0 ; fd<NR_OPEN ; fd++)
		if (!current->filp[fd])
			break;
	if (fd>=NR_OPEN)
		return -EINVAL;
	current->close_on_exec &= ~(1<<fd);
	f=0+file_table;
	for (i=0 ; i<NR_FILE ; i++,f++)
		if (!f->f_count) break;
	if (i>=NR_FILE)
		return -EINVAL;
	(current->filp[fd]=f)->f_count++;
	if ((i=open_namei(filename,flag,mode,&inode))<0) {
		current->filp[fd]=NULL;
		f->f_count=0;
		return i;
	}
/* ttys are somewhat special (ttyxx major==4, tty major==5) */
	if (S_ISCHR(inode->i_mode))
		if (MAJOR(inode->i_zone[0])==4) {
			if (current->leader && current->tty<0) {
				current->tty = MINOR(inode->i_zone[0]);
				tty_table[current->tty].pgrp = current->pgrp;
			}
		} else if (MAJOR(inode->i_zone[0])==5)
			if (current->tty<0) {
				iput(inode);
				current->filp[fd]=NULL;
				f->f_count=0;
				return -EPERM;
			}
	f->f_mode = inode->i_mode;
	f->f_flags = flag;
	f->f_count = 1;
	f->f_inode = inode;
	f->f_pos = 0;
	return (fd);
}

It is pretty simple, inside the task_struct of the process, there is a variable called filp, which is an array that will store information about each file that is opened by that process. So the filp[0] will store information about the Standard Input File, the filp[1] will store information about the Standard Output File, and so on. And that is what the File Descriptor is, just an index in the filp array that is returned to the user.

When you call the open system call, that function looks for the next available spot in the filp array and creates a struct file with important information about the file that you requested to open. In the future, the OS will look at this information to access the correct memory addresses on your disk.

However, a certain file on your disk can be opened more than once by the same process, so that it has the same struct file in two different spots of the filp array:

int main() {
   int fd = open("in.txt", O_RDONLY);

   printf("fd: %d\n", fd);

   int fd2 = open("in.txt", O_RDONLY);

   printf("fd2: %d\n", fd2);

   return 0;
}

That same process is done by the dup() system call, in the following way:

int main() {
    int fd = open("in.txt", O_RDONLY);

    printf("fd: %d\n", fd);

    int fd2 = dup(fd);

    printf("fd2: %d\n", fd2);

    return 0;
}

The previous code will get the next free index at the array filp and duplicate the same struct file in there. Furthermore, there is a dup2() system call, that is an extension of dup() because it allows the user to choose which File Descriptor to duplicate it to. Consider the following code:

int main() {
    int fd = open("in.txt", O_RDONLY);

    printf("fd: %d\n", fd);

    int fd2 = dup2(fd, 5);

    printf("fd2: %d\n", fd2);

    return 0;
}

In that code, we said that we want index 5 to store the struct from the file that we opened before. That is very serious, what might happen if we say that we want to duplicate it to the index 0?

int main() {
    int fd = open("in.txt", O_RDONLY);

    printf("fd: %d\n", fd);

    int fd2 = dup2(fd, 0);

    printf("fd2: %d\n", fd2);

    return 0;
}

Remember that inside the index 0 of the filp array, there is information about the Standard Input File? It means that every time you use a function that tries to ask for information from the user (e.g. scanf()), it will use the File Descriptor 0 for it. Therefore, if you duplicate the File Descriptor of an ordinary file to that specific index 0, you will modify where those functions get the information from. For example:

int main() {
    char buffer[1024] = {0};
    
    int fd = open("in.txt", O_RDONLY);

    int fd2 = dup2(fd, 0);

    scanf("%[^\n]", buffer);

    printf("Buffer: %s\n", buffer);

    return 0;
}

In that example, you won't have to write anything to the console, because the scanf() will get the information it needs from the "in.txt" file.

The same result can be done both to the Standard Output FIle and the Standard Error File. Using the same logic, the following code example stores all the information printed to the "out.txt" file:

int main() {
    int fd = open("out.txt", O_WRONLY | O_CREAT, 0644);
    
    dup2(fd, 1);

    printf("Hello, world!\n");

    return 0;
}

Conclusion

That sums it all up, with all that information and some string parsing you will be able to create something that looks just like that simple shell I showed you in the beginning.

References

Linux Version 0.01 Source Code. https://elixir.bootlin.com/linux/0.01/source
System calls in the Linux kernel. https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-1.html
The TTY demystified. https://www.linusakesson.net/programming/tty/