Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [linuxtools-dev] [TMF] Advanced statistics view

Hello all,

I have implemented some of the ideas discussed here, mostly as a proof of concept.
The idea was to see where were the major hurdles in implementing more advanced statistics in the current framework.

For simplicity, I have only implemented the cpu usage of every TID for each CPU.

For storage, I used a separate state system than the one used by the current statistics.
The attribute tree is first separated by CPU and then by TID.
The TID nodes contain the cumulative time spent by this TID on the parent CPU.
The TID nodes then contain an additional child node to keep track of the time spent only in the current interval.

Here is a timeline of events and how the corresponding intervals relate to that timeline.

t0: TID1000 scheduled in CPU1
t1: TID1000 scheduled out of CPU1
t5: TID1000 scheduled in CPU1
t7: TID1000 scheduled out of CPU1
t9: end of trace

From t0 to t5: cumulative time for (CPU1,TID1000) = 1; current time for (CPU1, TID1000) = 1;
>From t5 to t9: cumulative time for (CPU1,TID1000) = 3; current time for (CPU1, TID1000) = 2;

To know the exact cumulative time at t6, you must first query the raw cumulative time (3 in this case).
We know this TID ran for 2 units of time in this interval but only 1 has passed since the beginning of the interval.
Therefore, we must subtract 1 unit of time from the raw total to get the exact total at t6 (2).

My first intuition was that the current running time is information that is already stored in the kernel state system used by the control flow view.
Two cases can happen :
1. The query is during the execution of the process.
2. The query is outside the execution of the process.

In the first case, we must do interpolation to have the correct value. By querying the kernel state system, we can make sure that this is the case.
The kernel state system also allows us to know how long the process has been executing and do the interpolation.

In the second case, we do not have to do interpolation. By querying the kernel state system, we find that the process is not executing and that the cumulative time can be used as is.

By using the kernel state system, we have effectively halved the amount of information we have to store in the statistics state system.
By doing so, we have also heavily coupled the kernel state system and the statistics state system. If a new event is added to the kernel state provider, the statistics provider must be aware of that change if the statistics must remain coherent. At the moment, this is not too problematic as only the sched_switch event provokes the state change used by the statistics.
Still, the compromise between file size and the ease of maintenance is an interesting problem.

Here are some history tree file sizes I measured with the statistics state system using the current running time (without optimization)
Trace 1 : 286 MB
CPU usage stats history : 53 MB
event count stats history: 572 MB
kernel state history : 578 MB

Trace 2 : 10.6 MB
CPU usage stats history : 2.24 MB
event count stats history: 17.6 MB
kernel state history : 12.2 MB

Another minor problem is the storage of the cumulative time. The current history implementation only supports 32-bit integers and strings. Cumulative time can easily overflow an integer in long enough traces.
I have not tried implementing 64-bit integer support. But according to my experience with the history tree format, this should not be very hard to do. In my implementation, overflows will occur without warning.

The second part I worked on was the integration of this new information to the existing statistics view.
Unfortunately, the current implementation of the statistics is very specific to the showing of event counts.

I still wanted to have meaningful information displayed, so I worked around the limitations.
Only longs can be displayed in the view, so instead of cpu usage shown in percentage, I have elected to display the raw count in nanoseconds.
Also, there was no way to specify an organization for the statistics. The flat architecture of the event counts was pretty much hard-coded.
Anyway, I feel like this part would require some work before it is integrated seamlessly in the current view.

The code can be found here :
http://git.dorsal.polymtl.ca/~frajotte?p=org.eclipse.linuxtools.git;a=shortlog;h=refs/heads/cpustats

The lightweight version that uses the kernel state system can be found here:
http://git.dorsal.polymtl.ca/~frajotte?p=org.eclipse.linuxtools.git;a=shortlog;h=refs/heads/cpustatslight

Please keep in mind that this was mostly done as a proof of concept to show that it was possible. There are some hard-coded strings that should be refactored and no unit tests.
I would love to have your feedback. I can also commit this on gerrit if you would prefer to review this contribution there.

On Tue, Feb 12, 2013 at 5:04 PM, Alexandre Montplaisir <alexmonthy@xxxxxxxxxxxx> wrote:
On 13-02-12 04:45 PM, François Rajotte wrote:
>
> I believe the main concern was if we want the information about _many_
> processes at a given timestamp.
> If only one query is necessary (Michel's proposal), then we are sure
> to have all the information in one query. (constant in the number of
> processes)
> But if we require to go back in time because we fall on a null
> interval, the second query is at a different time for each process.
> (linear in the number of processes)
> The number of queries is now proportional to the number of processes
> and not constant.

Ah ok see what you mean. There is another advantage in querying at the
same timestamp : if we do a full query, we can get both/all values for
no additional cost.

Note that with a partial history, all queries are full queries anyway
(full, partial, this is getting confusing :P )
We can implement a single query to fill the API, but in background it
would do a full query.
By the way, scratch what I said earlier about not being able to use
partial histories for this ; both algorithms we are talking about here
would work fine with a partial history, we don't need the end times of
the intervals.

>
> (see Michel's response)
>
>
>     >> One possible alternative is to store the cumulative CPU time in
>     one attribute and the entryTime for the current interval if
>     scheduled in and thus ongoing (or NULL if scheduled out). This
>     would be 2 attributes instead of 1 in the current state, 1 change
>     at schedule in and 2 at schedule out (thus 3 changes instead of 2
>     in the history). However, you would not need any of the additional
>     queries and there should be no problem with partial history
>     storage optimizations.
>     > Yes, this is an interesting idea. About the partial history vs full
>     > history, this is something where partial history IMO is not at all
>     > beneficial since the intervals are large and the changing events
>     are few
>     > and far between, relatively on the kernel front.  this state system
>     > takes (empirically) approx 10 mb, 20 mb for the new system for
>     every gb
>     > of the full state system, so trying compression to save space
>     here is
>     > like trying to balance the US economy by cutting PBS's funding. Alex
>     > will probably disagree with me here.
>     >
>     > With very very large traces, I can't tell if this state system
>     will be
>     > larger than say a partial stats history tree. I think some
>     investigation
>     > is needed.
>
>     If you can fit it in a partial history, it will most likely be smaller
>     than a full history, unless you use 1000 times more attributes ;)
>
>     An interesting point with the proposed method is that we already store
>     the status of each process in the standard kernel state system,
>     that can
>     indicate if the process was on or off the CPU at any given time. So we
>     could avoid duplicating this information.
>
>
> I had the same idea. While trying to understand Michel proposal, I
> noticed that the information we would want to store (process schedule
> in and out) is already in the kernel state system.
> If we could somehow link the two state systems together, we could
> reuse that information and save space.

This information could go in a second, LTTng-kernel specific state
system, with which the standard kernel state system is guaranteed to
exist. Or it could go in the same one. Probably the same one, if you
want to benefit from full queries.

>
>
>     I still have to wrap my head around it (I'm not sure if it implies
>     using
>     back-assignments or not), but it's definitely good to experiment.
>
>

30 mins on the drawing board later, I have to say I really like this new
proposed algorithm. We never have to "seek back", and seeking back with
a partial history is very costly (you have to re-read from the last
checkpoint). Especially during the construction... There is no
back-assignment needed either. We need to keep in RAM the current
cumulative CPU time for each process, but that's not a problem.
_______________________________________________
linuxtools-dev mailing list
linuxtools-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/linuxtools-dev


Back to the top