Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [linuxtools-dev] [TMF] Advanced statistics view

> suppose we want to calculate per-cpu CPU utilization of
> each process

> In the meantime, suppose we are also interested to have a
> reverse statistics:
per-process CPU utilization for each CPU
> By using the current organization of the attribute tree , we may
> need to duplicate the data and store them twice in the history
> tree, a separate value for each attribute pair
> (e.g. cpu1--> process1  and process1-->cpu1
> However, it may be useful to somehow relax the definition of
> the attribute tree and let different applications define their
> own organizations of the attributes.

I agree that it is wasteful to keep everything symmetrical and thus compute/store some values more than once. This brings two choices to make:

- select an arbitrary canonical order for such values, for instance process / cpu / system call and thus avoid the variants (cpu/process/system call, cpu/system call/process...). The optimal order to select depends on the typical branching pattern as discussed below.

- what is the desired granularity level? For example, we could store for each process the execution time spent in "system calls" or separately in each specific system call (read, write...) or even further on each CPU and each specific system call. Moreover, we may use different granularity levels for different resources, only looking at all system calls for each process but accounting for each specific system call globally for the system. This is a compromise between granularity and memory.

For those interested, lets look into more details about the cost and optimization of these computations.

Assuming that we create an attribute leaf to store a statistic only when needed, the number of leaves will be the same irrespective of the canonical order. The number of intermediate nodes, which is smaller than the number of leaves, may be minimized by using a category with fewer members first. A typical case is to have 8 CPU, hundreds of processes (let say 1000), each running on mostly 1 or 2 CPUs and a few hundred system calls but each process calling only about 10 different ones.

The number of leaves will be about 1000 * 1.5 * 10 = 15000. If we go pid / syscall / cpu, there will be 1 + 1000 + 1000 * 10 = 11001 intermediate nodes and 1000 + 1000 * 10 + 1000 * 10 * 1.5 = 26000

edge. If we go the other way around, cpu / syscall / pid, there will be approximately 1 + 8 + 8 * 50 = 409 intermediate nodes and 8 + 8 * 50 + 1000 * 10 * 1.5 edges = 15408 edges. There is thus a slight advantage to select the smallest branching factors first.

In terms of computation speed and state history storage, the situation is quite interesting. Each time a system call or scheduling switch event comes in, some value needs to be summed. Whether it is always summed in the same value (e.g. pid / user_space and pid / system) or in different branches (cpu / syscall / pid), the computation time is pretty much the same and should not be a concern. The state history currently stores an interval for each update to statistics (basically for each event since event counts are kept). Thus, updating values and creating a state history entry for each scheduling switch or syscall entry / exit is already much less than than what is required for event counts. Furthermore, there will be one entry in the history for each change, whether they are always to the same entry (pid / system) or to detailed entries (cpu / syscall / pid). Of course, the partial history optimization is always applicable.

What happens when we want to interactively navigate the statistics? We may want to show a 100GB window within a 1TB trace. We can seek at the window begin and end, t0 and t1, and gather the values for cpu / syscall / pid for each. We could also do the samething 1000 times for showing an histogram with a value computed for each pixel in a 1000 pixel wide window. Once we have the values for a specific time, we can dynamically make the needed sums (all syscalls / cpu for each process, all processes / syscall for a CPU...) relatively quickly.

My conclusion is that it has little impact on the computation time and state history size of storing statistics with a fine granularity. The largest impact is on the size of the "current state". Of course, we could always have an option to specify the desired level of granularity wanted in the state computation. We had the cpu / syscall / pid granularity in LTTV and it was very useful (especially pid / syscall but not so much pid / cpu).

Back to the top