Bytes won’t bite …

Networks | OS | Games | Fun | 11111000011

Linux network buffers

1.Introduction

I borrowed this document from Mr.Harald Welte (laforge@gnumonks.org) and since I found it very useful I
just thought of sharing this with you all..and more importantly i can always keep it as a note for myself.
So please dont sue me for copyright issues..ok.. :)

2.skbuff

skbuffs are the buffers in which the linux kernel handles network packets. The packet is received by the network card, put into a skbuff and then passed to the network stack, which uses the skbuff all the time.

2.1 struct sk_buff

The struct sk_buff is defined in <linux/skbuff.h> as follows:

next

next buffer in list
prev

previous buffer in list
list

list we are on
sk

socket we belong to
stamp

timeval we arrived at
dev

device we are leaving by
rx_dev

device we arrived at
h

transport layer header (tcp,udp,icmp,igmp,spx,raw)
nh

network layer header (ip,ipv6,arp,ipx,raw)
mac

link layer header
dst

FIXME:
cb

control buffer, used internally
len

length of actual data
csum

checksum
used

FIXME: data moved to user and not MSG_PEEK
is_clone

we are a clone
cloned

head may be cloned
pkt_type

packet class
ip_summed

driver fed us ip checksum
priority

packet queuing priority
users

user count
protocol

packet protocol from driver
security

security level of packet
truesize

real size of the buffer
head

pointer to head of buffer
data

data head pointer
tail

tail pointer
end

end pointer
destructor

destructor function

nfmark

netfilter mark
nfcache

netfilter internal caching info
nfct

associated connection, if any

tc_index

traffic control index

2.2 skb support functions

There are a bunch of skb support functions provided by the sk_buff layer. I briefly describe the most important ones in this section.

allocation / free / copy / clone and expansion functions

struct sk_buff *alloc_skb(unsigned int size, int gfp_mask)

This function allocates a new skb. This is provided by the skb layer to initialize some privat data and do memory statistics. The returned buffer has no headroom and a tailroom of /size/ bytes.

void kfree_skb(struct sk_buff *skb)

Decrement the skb’s usage count by one and free the skb if no references left.

struct sk_buff *skb_get(struct sk_buff *skb)

Increments the skb’s usage count by one and returns a pointer to it.

struct sk_buff *skb_clone(struct sk_buff *skb, int gfp_mask)

This function clones a skb. Both copies share the packet data but have their own struct sk_buff. The new copy is not owned by any socket, reference count is 1.

struct sk_buff *skb_copy(const struct sk_buff *skb, int gfp_mask)

Makes a real copy of the skb, including packet data. This is needed, if You wish to modify the packet data. Reference count of the new skb is 1.

struct skb_copy_expand(const struct sk_buff *skb, int new_headroom, int new_tailroom, int gfp_mask)

Make a copy of the skb, including packet data. Additionally the new skb has a haedroom of /new_headroom/ bytes size and a tailroom of /new_tailroom/ bytes.

anciliary functions

int skb_cloned(struct sk_buff *skb)

Is the skb a clone?

int skb_shared(struct sk_Buff *skb)

Is this skb shared? (is the reference count > 1)?

operations on lists of skb’s

struct sk_buff *skb_peek(struct sk_buff_head *list_)

peek a skb from front of the list; does not remove skb from the list

struct sk_buff *skb_peek_tail(struct sk_buff_head *list_)

peek a skb from tail of the list; does not remove sk from the list

__u32 skb_queue_len(sk_buff_head *list_)

return the length of the given skb list

void skb_queue_head(struct sk_buff_head *list_, struct sk_buff *newsk)

enqueue a skb at the head of a given list

void skb_queue_tail(struct sk_buff_head *list_, struct sk_buff *newsk)

enqueue a skb at the end of a given list.

struct sk_buff *skb_dequeue(struct sk_buff_head *list_)

dequeue a skb from the head of the given list.

struct sk_buff *sbk_dequeue_tail(struct sk_buff_head *list_)

dequeue a skb from the tail of the given list

operations on skb data

unsigned char *skb_put(struct sk_buff *sbk, int len)

extends the data area of the skb. if the total size exceeds the size of the skb, the kernel will panic. A pointer to the first byte of new data is returned.

unsigned char *skb_push(struct sk_buff *skb, int len)

extends the data area of the skb. if the total size exceeds the size of the skb, the kernel will panic. A pointer to the first byte of new data is returned.

unsigned char *skb_pull(struct sk_buff *skb, int len)

remove data from the start of a buffer, returning the bytes to headroom. A pointr to the next data in the buffer is returned.

int skb_headroom(struct sk_buff *skb)

return the amount of bytes of free space at the head of skb

int skb_tailroom(struct sk_buff *skb)

return the amount of bytes of free space at the end of skb

struct sk_buff *skb_cow(struct sk_buff *skb, int headroom)

if the buffer passed lacks sufficient headroom or is a clone it is copied and additional headroom made available.

April 7, 2009 Posted by swordfish1987 | OS, networks | , | No Comments Yet

Managing /proc

First of all, the filesystem contains a huge set of numbered directories that come and go. Each and one of these numbered directories contains information pertaining to all of the currently active processes on the machine. When a new process is started, a new directory is created in the /proc filesystem for it, and a lot of data is created within it regarding the process, such as the commandline with which the program was started with, a link to the “current working directory”, environment variables, where the executable is located, and so on.

Except this, we also have quite a few files as well as directories in the root of the /proc filesystem. This is a complete listing of them all:

[blueflux@work1 ]$ ls -l /proc
total 0
....
-r--r--r--    1 root     root            0 Sep 19 18:09 apm
dr-xr-xr-x    4 root     root            0 Sep 19 10:52 bus
-r--r--r--    1 root     root            0 Sep 19 18:09 cmdline
-r--r--r--    1 root     root            0 Sep 19 18:09 cpuinfo
-r--r--r--    1 root     root            0 Sep 19 18:09 devices
-r--r--r--    1 root     root            0 Sep 19 18:09 dma
dr-xr-xr-x    4 root     root            0 Sep 19 18:09 driver
-r--r--r--    1 root     root            0 Sep 19 18:09 execdomains
-r--r--r--    1 root     root            0 Sep 19 18:09 fb
-r--r--r--    1 root     root            0 Sep 19 18:09 filesystems
dr-xr-xr-x    2 root     root            0 Sep 19 18:09 fs
dr-xr-xr-x    4 root     root            0 Sep 19 18:09 ide
-r--r--r--    1 root     root            0 Sep 19 18:09 interrupts
-r--r--r--    1 root     root            0 Sep 19 18:09 iomem
-r--r--r--    1 root     root            0 Sep 19 18:09 ioports
dr-xr-xr-x   18 root     root            0 Sep 19 18:09 irq
-r--------    1 root     root     268374016 Sep 19 18:09 kcore
-r--------    1 root     root            0 Sep 19 10:52 kmsg
-r--r--r--    1 root     root            0 Sep 19 18:09 ksyms
-r--r--r--    1 root     root            0 Sep 19 18:09 loadavg
-r--r--r--    1 root     root            0 Sep 19 18:09 locks
-r--r--r--    1 root     root            0 Sep 19 18:09 mdstat
-r--r--r--    1 root     root            0 Sep 19 18:09 meminfo
-r--r--r--    1 root     root            0 Sep 19 18:09 misc
-r--r--r--    1 root     root            0 Sep 19 18:09 modules
lrwxrwxrwx    1 root     root           11 Sep 19 18:09 mounts -> self/mounts
-rw-r--r--    1 root     root          208 Sep 19 11:02 mtrr
dr-xr-xr-x    3 root     root            0 Sep 19 18:09 net
dr-xr-xr-x    2 root     root            0 Sep 19 18:09 nv
-r--r--r--    1 root     root            0 Sep 19 18:09 partitions
-r--r--r--    1 root     root            0 Sep 19 18:09 pci
dr-xr-xr-x    3 root     root            0 Sep 19 18:09 scsi
lrwxrwxrwx    1 root     root           64 Sep 19 12:01 self -> 2864
-rw-r--r--    1 root     root            0 Sep 19 18:09 slabinfo
-r--r--r--    1 root     root            0 Sep 19 18:09 stat
-r--r--r--    1 root     root            0 Sep 19 18:09 swaps
dr-xr-xr-x   10 root     root            0 Sep 19 14:39 sys
dr-xr-xr-x    2 root     root            0 Sep 19 18:09 sysvipc
dr-xr-xr-x    4 root     root            0 Sep 19 18:09 tty
-r--r--r--    1 root     root            0 Sep 19 18:09 uptime
-r--r--r--    1 root     root            0 Sep 19 18:09 version
[blueflux@work1 proc]$

Most of the information in the files are rather “human readable”, except a few of them. However, a few of them you should not touch, such as the kcore file. The kcore file contains debugging information regarding the kernel, and if you try to ‘cat’ it, your system may very well hang up and die. If you try to copy it to a real file on the harddrive, you will very soon have filled up your whole partition, and so on. What all of this tells you is to be very careful. Mostly, none of the variables or entries in the /proc filesystem is not dangerous to watch, but a few of them are. A brief walkthrough of the most important files:

  • cmdline – The command line issued when starting the kernel.
  • cpuinfo – Information about the Central Processing Unit, who made it, known bugs, flags etcetera.
  • dma – Contains information about all DMA channels available, and which driver is using it.
  • filesystems – Contains short information about every single filesystem that the kernel supports.
  • interrupts – Gives you a brief listing of all IRQ channels, how many interrupts they have seen and what driver is actually using it.
  • iomem – A brief file containing all IO memory mappings used by different drivers.
  • ioports – Contains a brief listing of all IO ports used by different drivers.
  • kcore – Contains a complete memory dump. Do not cat or anything like that, you may freeze your system. Mainly used to debug the system.
  • kmsg – Contains messages sent by kernel, is not and should not be readable by users since it may contain vital information. Main usage is to debug the system.
  • ksyms – This contains the kernel symbol table, which is mainly used to debug the kernel.
  • loadavg – Gives the load average of the system during the last 1, 5 and 15 minutes.
  • meminfo – Contains information about memory usage on the system.
  • modules – Contains information about all currently loaded modules in the kernel.
  • mounts – Symlink to another file in the /proc filesystem which contains information about all mounted filesystems.
  • partitions – Contains information about all partitions found on all drives in the system.
  • pci – Gives tons of hardware information about all PCI devices on the system, also includes AGP devices and built in devices which are connected to the PCI bus.
  • swaps – Contains information about all swap partitions mounted.
  • uptime – Gives you the uptime of the computer since it was last rebooted in seconds.
  • version – Gives the exact version string of the kernel currently running, including build date and gcc versions etcetera.

And here is a list of the main directories and what you can expect to find in there:

  • bus – Contains information about all the buses, hardware-wise, such as USB, PCI and ISA buses.
  • ide – Contains information about all of the IDE buses on systems that has IDE buses.
  • net – Some basic information and statistics about the different network systems compiled into the system.
  • scsi – This directory contains information about SCSI buses on SCSI systems.
  • sys – Contains lots of variables that may be changed, including the /proc/sys/net/ipv4 which will be deeply discussed in this document.

As you can see, there is literally hundreds of files in the /proc filesystem that may be read and checked for information, and we haven’t looked at half of them here. As has already been said, we will only look closer on the ipv4 part and the variables that are tunable through the sysctl inside the /proc filesystem.

The ipsysctl variables may be set in two different ways which entails two totally different methods. The first one is via the sysctl application provided with most distributions per default these days. The other way entails using the /proc filesystem, which should come with any linux installation as long as you have a kernel that has /proc filesystem turned on. In other words, any linux system you find should contain the /proc filesystem).

The sysctl command is a bit more complex than the /proc filesystem, depending on how you see things. Also, as already mentioned, if you use the sysctl application you need more than just the kernel which is almost all that is required via the /proc filesystem. One of the better things with the sysctl command is that it is much easier to maintain a larger listing of changes that we may want to do. All of the changes that we want to use on the system can then be saved into a special configuration file which contains all of the variables and their values. This way of doing things is in other words more suitable for setting variables that we want to use under all circumstances.

The /proc filesystem way of doing things is a little bit easier while tweaking around with settings. When we finally have figured out the proper setting, we may as well set it in the sysctl.conf file and see to it that sysctl is run upon boot, and we will always have our settings set to kernel. Command lines in a script which sets variables through the /proc filesystem will look much worse than sysctl commands and they are generally less readable. Therefore, if you are planning to implement a huge set of ipsysctl settings in a script or another, or if you figure out that you need to set a lot of them, then you should generally try to use the sysctl command instead.

Well there is a lot more information on http://ipsysctl-tutorial.frozentux.net/ipsysctl-tutorial.html


April 7, 2009 Posted by swordfish1987 | OS | , , | 1 Comment

How to get into GSoC

One big mistake people do after submitting an abstract for GSoC is that they simply wait for the results to come up and do nothing…No..that way you gotta be really lucky to get through.

I just applied for a project with UMIT and they send me these guidelines, rather tasks, to get into a sponsored project.Although this is exclusively for UMIT projects its strongly advised that you do something similar for all applications u have posted.

Most important submit a very detailed proposal via mail…not the 500 character abstract you submitted.That should reflect both the amount of knowledge , idea and motivation you possess.

Then comes those tasks they sent me…

Here are the tasks, and their following scores:

1 – Find a bug and report it in Umit’s bug tracker, with relevant
details (1 pt)
2 – Find a bug and report it in Umit’s bug tracker, with relevant details and a description of how to reproduce the bug (2 pt)
3 – Suggest a patch in the devel mailing list to fix a bug in the tracker,and get the patch accepted and applied (3-7 pt)
4 – Give relevant comments and suggestions about a patch that was sent to the devel mailing list (1 pt)
5 – Find Umit’s easter eggs, and explain how they work (shhish… Don’t tell anyone how did you find them ;-) (2 pt) – Students from last year who already know where are the easter eggs, doesn’t have points here.
Sorry…
6 – Relevant translation work (3-6 pt)
7 – Relevant documentation (howto, manual, help, article, etc.) (3-8 pt)
8 – Relevant improvements to code base, discussed and aproved at the devel mailing list (2-6 pt)

well these bug trackers are present for most of the open source projects…like bugzilla, JIRA and all…you just need to get enlisted thats all…so all the best..if u get in send me a note.. :)

April 7, 2009 Posted by swordfish1987 | google | , , | No Comments Yet

TCP/UDP Tuning in Linux

These are some usefull tweaks on the network performance and
works on kernel versions since 2.4.XX ..

1. Make sure that you have root privleges.

2. Type: sysctl -p | grep mem
This will display your current buffer settings. Save These! You may want to roll-back these changes

3. Type: sysctl -w net.core.rmem_max=8388608
This sets the max OS receive buffer size for all types of connections.

4. Type: sysctl -w net.core.wmem_max=8388608
This sets the max OS send buffer size for all types of connections.

5. Type: sysctl -w net.core.rmem_default=65536
This sets the default OS receive buffer size for all types of connections.

6. Type: sysctl -w net.core.wmem_default=65536
This sets the default OS send buffer size for all types of connections.

7. Type: sysctl -w net.ipv4.tcp_mem=’8388608 8388608 8388608′
TCP Autotuning setting. “The tcp_mem variable defines how the TCP stack should behave when it comes to memory usage. … The first value specified in the tcp_mem variable tells the kernel the low threshold. Below this point, the TCP stack do not bother at all about putting any pressure on the memory usage by different TCP sockets. … The second value tells the kernel at which point to start pressuring memory usage down. … The final value tells the kernel how many memory pages it may use maximally. If this value is reached, TCP streams and packets start getting dropped until we reach a lower memory usage again. This value includes all TCP sockets currently in use.”

8. Type: sysctl -w net.ipv4.tcp_rmem=’4096 87380 8388608′
TCP Autotuning setting. “The first value tells the kernel the minimum receive buffer for each TCP connection, and this buffer is always allocated to a TCP socket, even under high pressure on the system. … The second value specified tells the kernel the default receive buffer allocated for each TCP socket. This value overrides the /proc/sys/net/core/rmem_default value used by other protocols. … The third and last value specified in this variable specifies the maximum receive buffer that can be allocated for a TCP socket.”

9. Type: sysctl -w net.ipv4.tcp_wmem=’4096 65536 8388608′
TCP Autotuning setting. “This variable takes 3 different values which holds information on how much TCP sendbuffer memory space each TCP socket has to use. Every TCP socket has this much buffer space to use before the buffer is filled up. Each of the three values are used under different conditions. … The first value in this variable tells the minimum TCP send buffer space available for a single TCP socket. … The second value in the variable tells us the default buffer space allowed for a single TCP socket to use. … The third value tells the kernel the maximum TCP send buffer space.”

10. Type:sysctl -w net.ipv4.route.flush=1
This will enusre that immediatly subsequent connections use these values.

Quick Step
Cut and paste the following into a linux shell with root privleges:

sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
sysctl -w net.core.rmem_default=65536
sysctl -w net.core.wmem_default=65536
sysctl -w net.ipv4.tcp_rmem=’4096 87380 8388608′
sysctl -w net.ipv4.tcp_wmem=’4096 65536 8388608′
sysctl -w net.ipv4.tcp_mem=’8388608 8388608 8388608′
sysctl -w net.ipv4.route.flush=1

April 7, 2009 Posted by swordfish1987 | OS, networks | , , | No Comments Yet