Saturday, September 11, 2010

kernel ticks, jiffies and high resolution timers...

kernel ticks and jiffies are explained beautifully in many places and best explanation can be found in "understanding Linux kernel" - O'reilly. I'm just trying to summarize some of the facts here!!!!

'jiffies' OR 'ticks' is the finest time resolution that we can get on Linux system. Like any sleep/poll/select/timers functionality in the kernel OR user land, the unit of time is jiffies. For eg., we can block for minimum 1 jiffies whether in the kernel OR user space. What does all this mean ?

Any event like sleep/poll/timers etc., need notification mechanism to wakeup after specified time. To be more specific, if a process says that it has to be woken up after 1 second by way of sleep OR poll, kernel registers timer for this event. Timer will fire after 1 second and the callback routine will do nothing but put the process on the run-queue after removing it from wait-queue so that it get's scheduled any time in future. Any wait event is converted into timer event by the kernel this way. What is that minimum time that one can ask kernel to notify you ? How is this time measured ?

This minimum time that we can request kernel to notify is one 'jiffy'. This 'jiffy' is nothing but a counter that is incremented periodically in timer interrupt handler. This timer interrupt handler is programmed to happen periodically at a pre-defined rate. This rate or frequency of timer interrupt is defined by 'HZ'. This value says how many times timer interrupt will happen in a second. Earlier this value was set to 100 but sometime back this value is changed to 250 for X86 systems.

100 HZ means, timer interrupt will happen 100 times in a second i.e., every 1/100 seconds OR (1000/100)10 ms. Which essentially mean 'jiffy' will be incremented every 10 ms OR highest time resolution that can be achieved is 10 ms. With respect to high speed CPU, this can be much long time. If we increase HZ, we can achieve much higher resolution but CPU will be spending more time in processing timer interrupt per second. So, there is a tradeoff between high timer resolution and % time spend in timer interrupt processing. On high speed CPU's this should not be an issue.

How do we relate jiffies to kernel notification event(timers). As we already discussed that sleep/poll/timers etc is converted into notification event(timers) by the kernel in terms of 'number of jiffies'. This can be minimum 1 jiffy away from now. When are these kernel timers processed ?

kernel timers are processed whenever timer interrupt occur. So, if we register a timer to expire after one jiffy, we will be woken up on occurrence of next timer interrupt. Imagine a situation where we need to be woken up after 1 ms OR imagine a situation where we need to be woken up after 5 ms. In prior case, we might get woken up after 1 ms if timer interrupt is 1 ms away OR max we will be woken up after 10 ms if timer interrupt has just occurred. In the later case, we face the same problem - either we can be woken up prior to 5 ms OR max after 10 ms. Once classic example is RTT calculations. In case RTT is below 1 jiffy, we still will see it as minimum 1 jiffy OR if transmit packet just before timer interrupt and we get ACK just after the timer interrupt, we still see RTT as 1 jiffy where as it can be few micro seconds. All this will get us into serious issues as far as timer resolution for the system is concerned. All these issues can be, to some extent, solved by increasing the frequency of timer interrupt(HZ). So, current X86 systems are programmed to achieve timer interrupt frequency of 250 (see CONFIG_HZ is set to 250 in autoconf.h) which means every 4 ms. On high end systems this HZ value is as high as 1000 where one jiffy will be 1 ms. Normally, we see HZ defined as -

include/*/param.h

#ifdef __KERNEL__
# define HZ CONFIG_HZ /* Internal kernel timer frequency */
# define USER_HZ 100 /* some user interfaces are */
# define CLOCKS_PER_SEC (USER_HZ) /* in "ticks" like times() */
#endif


Let's throw some light on the implementation -

there are two variables jiffies and jiffies_64. jiffies_64 is a 64-bit value where as jiffies is long. On 32-bit, jiffies is just 32-bit value where as jiffies_64 is a 64-bit value. The reason to keep two variables is that with higher timer-interrupt frequency, 32-bit jiffies value wrap around much faster. So, 64-bit jiffies_64 will help. Lower order 32-bit value in this 64-bit jiffies_64 is actually used by any part of the kernel as 32-bit satisfies our requirements of time resolution and accessing 64-bit value on 32-bit architecture takes more than one memory access -

include/linux/jiffies.h

extern u64 __jiffy_data jiffies_64;
extern unsigned long volatile __jiffy_data jiffies;

Once jiffies_64 is incremented on timer interrupt in do_timer() call -

void do_timer(unsigned long ticks)
{
jiffies_64 += ticks;
....
}

So, where is jiffies incremented? We don't increment jiffies explicitly. 'Jiffies' is accessing to lower order 32-bits of 64-bit jiffies_64 variable and we can see this in the linker file -

arch/x86/kernel/vmlinux.lds.S

#ifdef CONFIG_X86_32
OUTPUT_ARCH(i386)
ENTRY(phys_startup_32)
jiffies = jiffies_64;


What happens on SMP machine ? On SMP machine we have timer interrupt for each CPU. This timer interrupt will occur independently on each CPU. So, how do we increment jiffies ?

On SMP, only one CPU is responsible for incrementing jiffies and all other CPU's will cover complete functionality of timer interrupt as usual. Whichever CPU gets to tick_setup_device() routine first, does the jiffies management -

kernel/time/tick-common.c

static void tick_setup_device(struct tick_device *td,
....)
{
...
if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
tick_do_timer_cpu = cpu;
....
}
....
}

So, tick_do_timer_cpu will store CPU ID that will manage jiffies. This variable is checked against CPU ID of the current CPU in tick_periodic(). If we are the same CPU, do_timer() is called else not. Similarly we call tick_do_update_jiffies64() to update jiffies_64 in tick_sched_timer().


What is high resolution timers?

This is related to high resolution timer of the order of seconds. In case kernel and hardware supports high resolution timer of the order of nano seconds, it is not recommended to have periodic timer interrupt every few nano-seconds as it will pull down system performance because CPU will be spending most of it's time in the timer interrupt handler. For this, hardware(timer interrupt) is programmed to occur in the future on demand. This will depend on next pending timer event. If it is after few nano seconds, we program this interrupt to occur at that time else we program it to happen after default time(10 ticks, if ticks is 1 ms - then after 10 ms). check tick_do_update_jiffies64() called from tick_sched_timer(). All high resolution timer code is compiled with CONFIG_HIGH_RES_TIMERS macros.

Let's learn more about tickless kernel and high-resolution timers in our next discussion....

Let's find out more ....