So far we have two ways to generate output from kernel modules: we can register a device driver and mknod a device file, or we can create a /proc file. This allows the kernel module to tell us anything it likes. The only problem is that there is no way for us to talk back. The first way we'll send input to kernel modules will be by writing back to the /proc file.
Because the proc filesystem was written mainly to allow the kernel to report its situation to processes, there are no special provisions for input. The proc_dir_entry struct doesn't include a pointer to an input function, the way it includes a pointer to an output function. Instead, to write into a /proc file, we need to use the standard filesystem mechanism.
In Linux there is a standard mechanism for file system registration. Since every file system has to have its own functions to handle inode and file operations4.1, there is a special structure to hold pointers to all those functions, struct inode_operations, which includes a pointer to struct file_operations. In /proc, whenever we register a new file, we're allowed to specify which struct inode_operations will be used for access to it. This is the mechanism we use, a struct inode_operations which includes a pointer to a struct file_operations which includes pointers to our module_input and module_output functions.
It's important to note that the standard roles of read and write are reversed in the kernel. Read functions are used for output, whereas write functions are used for input. The reason for that is that read and write refer to the user's point of view -- if a process reads something from the kernel, then the kernel needs to output it, and if a process writes something to the kernel, then the kernel receives it as input.
Another interesting point here is the module_permission function. This function is called whenever a process tries to do something with the /proc file, and it can decide whether to allow access or not. Right now it is only based on the operation and the uid of the current used (as available in current, a pointer to a structure which includes information on the currently running process), but it could be based on anything we like, such as what other processes are doing with the same file, the time of day, or the last input we received.
The reason for put_user and get_user is that Linux memory (under Intel architecture, it may be different under some other processors) is segmented. This means that a pointer, by itself, does not reference a unique location in memory, only a location in a memory segment, and you need to know which memory segment it is to be able to use it. There is one memory segment for the kernel, and one of each of the processes.
The only memory segment accessible to a process is its own, so when writing regular programs to run as processes, there's no need to worry about segments. When you write a kernel module, normally you want to access the kernel memory segment, which is handled automatically by the system. However, when the content of a memory buffer needs to be passed between the currently running process and the kernel, the kernel function receives a pointer to the memory buffer which is in the process segment. The put_user and get_user macros allow you to access that memory.
/* procfs.c - create a "file" in /proc, which allows both input and * output. */ /* Copyright (C) 1998 by Ori Pomerantz */ /* The necessary header files */ /* Standard in kernel modules */ #include <linux/kernel.h> /* We're doing kernel work */ #include <linux/module.h> /* Specifically, a module */ /* Deal with CONFIG_MODVERSIONS */ #if CONFIG_MODVERSIONS==1 #define MODVERSIONS #include <linux/modversions.h> #endif /* Necessary because we use proc fs */ #include <linux/proc_fs.h> /* The module's file functions ************************************* */ /* Here we keep the last message received, to prove that we can process * our input */ #define MESSAGE_LENGTH 80 static char Message[MESSAGE_LENGTH]; /* Since we use the file operations struct, we can't use the special proc * output provisions - we have to use a standard read function, which is * this function */ static int module_output(struct inode *inode, /* The inode read */ struct file *file, /* The file read */ char *buf, /* The buffer to put data to (in the * user segment) */ int len) /* The length of the buffer */ { static int finished = 0; int i; char message[MESSAGE_LENGTH+30]; /* We return 0 to indicate end of file, that we have no more information. * Otherwise, processes will continue to read from us in an endless loop. */ if (finished) { finished = 0; return 0; } /* We use put_user to copy the string from the kernel's memory segment * to the memory segment of the process that called us. get_user, BTW, is * used for the reverse. */ sprintf(message, "Last input:%s", Message); for(i=0; i<len && message[i]; i++) put_user(message[i], buf+i); /* Notice, we assume here that the size of the message is below len, or * it will be received cut. In a real life situation, if the size of the * message is less than len then we'd return len and on the second call * start filling the buffer with the len+1'th byte of the message. */ finished = 1; return i; /* Return the number of bytes "read" */ } /* This function receives input from the user when the user writes to * the /proc file. */ static int module_input(struct inode *inode, /* The file's inode */ struct file *file, /* The file itself */ const char *buf, /* The buffer with the input */ int length) /* The buffer's length */ { int i; /* Put the input into Message, where module_output will later be * able to use it */ for(i=0; i<MESSAGE_LENGTH-1 && i<length; i++) Message[i] = get_user(buf+i); Message[i] = '\0'; /* we want a standard, zero terminated string */ /* We need to return the number of input characters used */ return i; } /* This function decides whether to allow an operation (return zero) or * not allow it (return a non-zero which indicates why it is not allowed). * * The operation can be one of the following values: * 0 - Execute (run the "file" - meaningless in our case) * 2 - Write (input to the kernel module) * 4 - Read (output from the kernel module) * * This is the real function that checks file permissions. The permissions * returned by ls -l are for referece only, and can be overridden here. */ static int module_permission(struct inode *inode, int op) { /* We allow everybody to read from our module, but only root (uid 0) * may write to it */ if (op == 4 || (op == 2 && current->euid == 0)) return 0; /* If it's anything else, access is denied */ return -EACCES; } /* The file is opened - we don't really care about that, but it does mean * we need to increment the module's reference count. */ int module_open(struct inode *inode, struct file *file) { MOD_INC_USE_COUNT; return 0; } /* The file is closed - again, interesting only because of the reference * count. */ void module_close(struct inode *inode, struct file *file) { MOD_DEC_USE_COUNT; } /* Structures to register as the /proc file, with pointers to all the * relevant functions. ********************************************** */ /* File operations for our proc file. This is where we place pointers * to all the functions called when somebody tries to do something to * our file. NULL means we don't want to deal with something. */ static struct file_operations File_Ops_4_Our_Proc_File = { NULL, /* lseek */ module_output, /* "read" from the file */ module_input, /* "write" to the file */ NULL, /* readdir */ NULL, /* select */ NULL, /* ioctl */ NULL, /* mmap */ module_open, /* Somebody opened the file */ module_close /* Somebody closed the file */ /* etc. etc. etc. (they are all given in /usr/include/linux/fs.h). * Since we don't put anything here, the system will keep the default * data, which in Unix is zeros (NULLs when taken as pointers). */ }; /* Inode operations for our proc file. We need it so we'll have some * place to specify the file operations structure we want to use, and * the function we use for permissions. It's also possible to specify * functions to be called for anything else which could be done to an * inode (although we don't bother, we just put NULL). */ static struct inode_operations Inode_Ops_4_Our_Proc_File = { &File_Ops_4_Our_Proc_File, NULL, /* create */ NULL, /* lookup */ NULL, /* link */ NULL, /* unlink */ NULL, /* symlink */ NULL, /* mkdir */ NULL, /* rmdir */ NULL, /* mknod */ NULL, /* rename */ NULL, /* readlink */ NULL, /* follow_link */ NULL, /* readpage */ NULL, /* writepage */ NULL, /* bmap */ NULL, /* truncate */ module_permission /* check for permissions */ }; /* Directory entry */ static struct proc_dir_entry Our_Proc_File = { 0, /* Inode number - ignore, it will be filled by * proc_register_dynamic */ 7, /* Length of the file name */ "rw_test", /* The file name */ S_IFREG | S_IRUGO | S_IWUSR, /* File mode - this is a regular file which * can be read by its owner, its group, and everybody * else. Also, its owner can write to it. * * Actually, this field is just for reference, it's * module_permission that does the actual check. It * could use this field, but in our implementation it * doesn't, for simplicity. */ 1, /* Number of links (directories where the file is referenced) */ 0, 0, /* The uid and gid for the file - we give it to root */ 80, /* The size of the file reported by ls. */ &Inode_Ops_4_Our_Proc_File, /* A pointer to the inode structure for * the file, if we need it. In our case we * do, because we need a write function. */ NULL /* The read function for the file. Irrelevant, because we put it * in the inode structure above */ }; /* Module initialization and cleanup ********************************** */ /* Initialize the module - register the proc file */ int init_module() { /* Success if proc_register_dynamic is a success, failure otherwise */ return proc_register_dynamic(&proc_root, &Our_Proc_File); } /* Cleanup - unregister our file from /proc */ void cleanup_module() { proc_unregister(&proc_root, Our_Proc_File.low_ino); }