Re: [linuxtools-dev] [TMF] Advanced statistics view

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [linuxtools-dev] [TMF] Advanced statistics view

From: François Rajotte <francois.rajotte@xxxxxxxxxx>
Date: Wed, 13 Feb 2013 10:45:56 +1300
Delivered-to: linuxtools-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/linuxtools-dev>
List-help: <mailto:linuxtools-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/linuxtools-dev>, <mailto:linuxtools-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/linuxtools-dev>, <mailto:linuxtools-dev-request@eclipse.org?subject=unsubscribe>

On Wed, Feb 13, 2013 at 9:26 AM, Alexandre Montplaisir <alexmonthy@xxxxxxxxxxxx> wrote:

>> this method only requires a single "attribute" to store the cumulative time and whether or not the process is in an active interval. Its value is changed at schedule in and at schedule out. However, you need to do additional queries. For instance, if you want to show the CPU usage per process for an interval t0,t1 (pixel on screen), you need the query at t0 and t1 and, for each process, you need as well values at their respective interval.startTime -1, leading to an additional query for each process...
> Yes, this is an issue, I think since the number of processes is
> relatively small (under 65535 or in reality under 1k) and for a given
> time slice most threads/processes would not change, this may or may not
> be a performance issue. It should be benchmarked.

If you fall on a "real" interval, there is no additional queries. If you
fall on a null interval, you have to do one additional query, to get the
previous interval. If you want to query a range, you have to do 2
queries already, by definition. So in the worst case, you end up doing 4
queries instead of 2. I don't think this is a big problem from a
performance point of view (especially since those intervals are close,
they will be close in the history backend, perhaps even in the same
block, in which case it doesn't even have to go to the disk).

I believe the main concern was if we want the information about _many_ processes at a given timestamp.
If only one query is necessary (Michel's proposal), then we are sure to have all the information in one query. (constant in the number of processes)
But if we require to go back in time because we fall on a null interval, the second query is at a different time for each process. (linear in the number of processes)
The number of queries is now proportional to the number of processes and not constant.

(see Michel's response)

>> One possible alternative is to store the cumulative CPU time in one attribute and the entryTime for the current interval if scheduled in and thus ongoing (or NULL if scheduled out). This would be 2 attributes instead of 1 in the current state, 1 change at schedule in and 2 at schedule out (thus 3 changes instead of 2 in the history). However, you would not need any of the additional queries and there should be no problem with partial history storage optimizations.
> Yes, this is an interesting idea. About the partial history vs full
> history, this is something where partial history IMO is not at all
> beneficial since the intervals are large and the changing events are few
> and far between, relatively on the kernel front. this state system
> takes (empirically) approx 10 mb, 20 mb for the new system for every gb
> of the full state system, so trying compression to save space here is
> like trying to balance the US economy by cutting PBS's funding. Alex
> will probably disagree with me here.
>
> With very very large traces, I can't tell if this state system will be
> larger than say a partial stats history tree. I think some investigation
> is needed.

If you can fit it in a partial history, it will most likely be smaller
than a full history, unless you use 1000 times more attributes ;)

An interesting point with the proposed method is that we already store
the status of each process in the standard kernel state system, that can
indicate if the process was on or off the CPU at any given time. So we
could avoid duplicating this information.

I had the same idea. While trying to understand Michel proposal, I noticed that the information we would want to store (process schedule in and out) is already in the kernel state system.
If we could somehow link the two state systems together, we could reuse that information and save space.

I still have to wrap my head around it (I'm not sure if it implies using
back-assignments or not), but it's definitely good to experiment.

Follow-Ups:
- Re: [linuxtools-dev] [TMF] Advanced statistics view
  - From: Alexandre Montplaisir

References:
- Re: [linuxtools-dev] [TMF] Advanced statistics view
  - From: Michel Dagenais
- Re: [linuxtools-dev] [TMF] Advanced statistics view
  - From: Matthew Khouzam
- Re: [linuxtools-dev] [TMF] Advanced statistics view
  - From: Alexandre Montplaisir

Prev by Date: Re: [linuxtools-dev] [TMF] Advanced statistics view
Next by Date: Re: [linuxtools-dev] [TMF] Advanced statistics view
Previous by thread: Re: [linuxtools-dev] [TMF] Advanced statistics view
Next by thread: Re: [linuxtools-dev] [TMF] Advanced statistics view
Index(es):
- Date
- Thread

Breadcrumbs