Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[linuxtools-dev] Modeled state in TMF/LTTng

Both the older LTTng standalone viewer, LTTV, and the TMF viewer do reconstruct in part the state of the traced system. This is useful to keep track of the current "modeled state" as events bring information about processes creation and deletion, schedule changes, opened and closed files, and more. Different people have been prototyping tools to match patterns of events and state in order to detect problems or frequently occuring sequences which can be "abstracted". Moreover, as more kernel subsystems (KVM, block IO...) and user-space applications (QEMU, Database servers...) are instrumented to generate events, it will be interesting to allow the modular extension of the "modeled state".

It is therefore a good time to discuss the notion of "modeled state" for the advanced trace analysis tools. This will be an important foundation for advanced analysis modules. A proper organization can make a big difference both in terms of performance and conceptual simplicity for users. My goal is to trigger a discussion which will help define a common jargon, identify important use cases, and discuss possible avenues to have a good, simple and efficient foundation for advanced analysis modules. Several people have been experimenting with internal tools at different companies and should have extremely valuable advice on the topic. Please comment the outlined "assertions" and "questions"!

Any application, database server, telephony server, file server, operating system... maintains a state, and different events trigger state changes. Theoretically, all the data memory constitute the state, all interactions with the outside are events, and each memory write is a state change. In practice, we want to use a much simpler abstract model which is a small subset of the complete data memory. Similarly, instrumented events are carefully chosen to report only the most significant events.

-> proposed term: "modeled state", simple abstract model of a complex application maintained in the trace analysis module.

For example, in LTTng, we started modeling the Linux OS by maintaining the table of processes. We later added the opened file descriptor for each process and the table of block devices. We may eventually want to memory mappings, locks and other structures to study virtual memory performance or locking correctness.

-> just as new instrumented events may easily be added as the need arises, the modeled state should be easily extended.

For a database server, the modeled state would typically store the current connections with clients and possibly the cached database blocks. In a telephony server, the list of active connections and their current state would typically be part of the modeled state.

The modeled state in the trace analyzer obviously duplicates a subset of what the traced application already stores. There is therefore a tradeoff between the amount of traced data to generate and the amount of recreated modeled state to maintain. For example, each trace event in a telephony application could contain more fields to report state information that would otherwise have to be modeled in the analysis tool. Similarly, in LTTng we could avoid modeling the file descriptors and remembering the opened files if each "read" event would contain not only the number of bytes written but also the name of the associated file.

-> our experience with LTTng is that maintaining a modeled state is important for tracing efficiency and for advanced analysis, is it the case for user-space applications as well, can you provide use cases?

The modeled state could be seen as a tree structure, just like a filesystem, with access paths to each item in the modeled state. It is thus feasible to have a generic container for the modeled state and therefore have general utility routines to navigate, store... the modeled state. Currently, hooks (possibly provided in plugins) register to be called when specific event types are encountered and contain code to modify the modeled state. For example, a file open event (with a filename and file descriptor number fields) will trigger calling a hook which adds this opened file descriptor to a table associated with the current process.

The Multi-Core Association is working on a Trace Format Standard. While they are looking at metadata describing the events (type of events, the type of their fields...), the idea has been proposed to also document the associated "modeled state".

-> Is it possible to express in a declarative way the relationship between an event, the associated "modeled state" and how the state is updated?

Let's start with these four assertions/questions!

Back to the top