Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [linuxtools-dev] [TMF] Advanced statistics view

On 13-02-12 04:45 PM, François Rajotte wrote:
>
> I believe the main concern was if we want the information about _many_
> processes at a given timestamp.
> If only one query is necessary (Michel's proposal), then we are sure
> to have all the information in one query. (constant in the number of
> processes)
> But if we require to go back in time because we fall on a null
> interval, the second query is at a different time for each process.
> (linear in the number of processes)
> The number of queries is now proportional to the number of processes
> and not constant.

Ah ok see what you mean. There is another advantage in querying at the
same timestamp : if we do a full query, we can get both/all values for
no additional cost.

Note that with a partial history, all queries are full queries anyway
(full, partial, this is getting confusing :P )
We can implement a single query to fill the API, but in background it
would do a full query.
By the way, scratch what I said earlier about not being able to use
partial histories for this ; both algorithms we are talking about here
would work fine with a partial history, we don't need the end times of
the intervals.

>
> (see Michel's response)
>
>
>     >> One possible alternative is to store the cumulative CPU time in
>     one attribute and the entryTime for the current interval if
>     scheduled in and thus ongoing (or NULL if scheduled out). This
>     would be 2 attributes instead of 1 in the current state, 1 change
>     at schedule in and 2 at schedule out (thus 3 changes instead of 2
>     in the history). However, you would not need any of the additional
>     queries and there should be no problem with partial history
>     storage optimizations.
>     > Yes, this is an interesting idea. About the partial history vs full
>     > history, this is something where partial history IMO is not at all
>     > beneficial since the intervals are large and the changing events
>     are few
>     > and far between, relatively on the kernel front.  this state system
>     > takes (empirically) approx 10 mb, 20 mb for the new system for
>     every gb
>     > of the full state system, so trying compression to save space
>     here is
>     > like trying to balance the US economy by cutting PBS's funding. Alex
>     > will probably disagree with me here.
>     >
>     > With very very large traces, I can't tell if this state system
>     will be
>     > larger than say a partial stats history tree. I think some
>     investigation
>     > is needed.
>
>     If you can fit it in a partial history, it will most likely be smaller
>     than a full history, unless you use 1000 times more attributes ;)
>
>     An interesting point with the proposed method is that we already store
>     the status of each process in the standard kernel state system,
>     that can
>     indicate if the process was on or off the CPU at any given time. So we
>     could avoid duplicating this information.
>
>
> I had the same idea. While trying to understand Michel proposal, I
> noticed that the information we would want to store (process schedule
> in and out) is already in the kernel state system.
> If we could somehow link the two state systems together, we could
> reuse that information and save space.

This information could go in a second, LTTng-kernel specific state
system, with which the standard kernel state system is guaranteed to
exist. Or it could go in the same one. Probably the same one, if you
want to benefit from full queries.

>  
>
>     I still have to wrap my head around it (I'm not sure if it implies
>     using
>     back-assignments or not), but it's definitely good to experiment.
>
>

30 mins on the drawing board later, I have to say I really like this new
proposed algorithm. We never have to "seek back", and seeking back with
a partial history is very costly (you have to re-read from the last
checkpoint). Especially during the construction... There is no
back-assignment needed either. We need to keep in RAM the current
cumulative CPU time for each process, but that's not a problem.


Back to the top