Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jetty-dev] NoSQL Session manager

The servlet 3.1 EG will soon be considering how to better support
cloud/clusters in the servlet spec and one of the things that will
need to be considered is how to better support HttpSession that are
backed with NoSQL style scalable stores.

We do have a prototype NoSQL session manager for jetty, but before
moving forward with that, I'd like to discuss here a number of the
issues/options for how it can proceed.   But first, lets review what
the jetty HashSession and JDBCSession managers do - as they are a lot
more than just maps of ID's to session instances.

Sessions are created with an ID that is unique within all the contexts
that share an sessionIDmanager.  Multiple contexts can share session
IDs (as there is only 1 session cookie), but cannot share session
values.   If one context invalidates a session, then all contexts with
that session ID have their sessions invalidated.    So the data
structure is not:

   ID --> Attribute --> Value

but is instead

   ID -> Context --> ID --> Attribute --> Value

the double ID mapping is for efficiency within a context.

Currently the session ID is split between a clusterID and a nodeID
part.  The cluster ID is unique within the cluster, while the nodeId
includes a suffix that can be used for load balancer stickyness.
Jetty supports session migration such that if a node receives a
session ID with a mismatched nodeID, then the session cookie is
rewritten and the session is consider migrated.

For a nosql session manager, we need to ask if explicit stickyness in
the session ID is needed.  Connection stickyness may be sufficient for
efficient operation of nosql sessions.

The HashSessionManager holds sessions in memory as attribute maps,
however it essentially works as a cache as sessions may be idled to
disk (but an in memory holder remains) or saved to disk (nothing left
in memory) and restored lazily as requests are received.  The JDBC
session manager similarly uses in memory sessions as a cache in front
of the DB.    Both of these map well to the servlet specs requirement
that only a single version of a session should be in memory at any
given time.  However there is a suggestion that for scalability this
constraint should be relaxed and a session might be able to exist on
multiple nodes at the same time.

Given this, I think we need to consider if the session manager should
be explicitly caching sessions in memory.  Rather it should delegate
caching to the nosql layer and rely on it's mechanisms for maintaining
distributed cache consistency.

However, having our own in memory map of sessions was very useful for
doing invalidation of old sessions, as we could iterate over our in
memory sessions without worrying if another node in the cluster was
duplicating that work.    The JDBC manager also has a sweep of the DB
for old sessions and we'd have to do something similar for NoSQL.

If we don't want to have sessions concurrently in different nodes,
then we really need to think of a way that can be enforced
efficiently.... perhaps using the nodeID in the cookie? But I think
for scalability it is inevitable that concurrent instances will
eventually be allowed, which brings up the question of what is the
semantics/granularity of concurrent session updates?

If session 12345 exists in both node A and node B, and both are
updated at about the same time, with attribute xyz updated in node A
and attribute pqy updated in node B,   then should the resulting
session state reflect the changes to both xyz and pqy, or should the
update of one be overwritten by the update of the other?   ie should
we persist at session or attribute granularity?

Also, the perennial bug bear of distributed sessions is how to handle
last access time.  While many sessions are read mostly, the last
access time is updated on every request.  We don't want this to
trigger full serialisation of the session on every request - nor do we
want multiple nodes to fight about who has the correct last update
time.  This is something best kept in memory and only occasionally
swept to the persistent store.

Which brings me to the really perennial distributes session issues:
+ should changes made to the attibute value objects after the
setAttribute be reflected in the persisted/distributed state? Is
setAttribute pass-by-reference or pass-by-value ?
+ when is a session attribute persisted/distributed? When setAttribute
is called? when the current request completes? when all simultaneous
requests complete? at regular intervals? on container shutdown?

For our current managers, they are somewhat configurable and can have
variations of these semantics, but essentially they are pass by
reference and sessions are persisted either at regular intervals or on
shutdown.


your thoughts?


Back to the top