Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jetty-dev] NoSQL Session manager

Hi,

On Wed, Jun 1, 2011 at 05:10, Greg Wilkins <gregw@xxxxxxxxxxx> wrote:
> The servlet 3.1 EG will soon be considering how to better support
> cloud/clusters in the servlet spec and one of the things that will
> need to be considered is how to better support HttpSession that are
> backed with NoSQL style scalable stores.
>
> We do have a prototype NoSQL session manager for jetty, but before
> moving forward with that, I'd like to discuss here a number of the
> issues/options for how it can proceed.   But first, lets review what
> the jetty HashSession and JDBCSession managers do - as they are a lot
> more than just maps of ID's to session instances.
>
> Sessions are created with an ID that is unique within all the contexts
> that share an sessionIDmanager.  Multiple contexts can share session
> IDs (as there is only 1 session cookie), but cannot share session
> values.   If one context invalidates a session, then all contexts with
> that session ID have their sessions invalidated.    So the data
> structure is not:
>
>   ID --> Attribute --> Value
>
> but is instead
>
>   ID -> Context --> ID --> Attribute --> Value
>
> the double ID mapping is for efficiency within a context.
>
> Currently the session ID is split between a clusterID and a nodeID
> part.  The cluster ID is unique within the cluster, while the nodeId
> includes a suffix that can be used for load balancer stickyness.
> Jetty supports session migration such that if a node receives a
> session ID with a mismatched nodeID, then the session cookie is
> rewritten and the session is consider migrated.
>
> For a nosql session manager, we need to ask if explicit stickyness in
> the session ID is needed.  Connection stickyness may be sufficient for
> efficient operation of nosql sessions.

I think, as you also note below, that removing stickyness is asking
for *big* troubles.

> The HashSessionManager holds sessions in memory as attribute maps,
> however it essentially works as a cache as sessions may be idled to
> disk (but an in memory holder remains) or saved to disk (nothing left
> in memory) and restored lazily as requests are received.  The JDBC
> session manager similarly uses in memory sessions as a cache in front
> of the DB.    Both of these map well to the servlet specs requirement
> that only a single version of a session should be in memory at any
> given time.  However there is a suggestion that for scalability this
> constraint should be relaxed and a session might be able to exist on
> multiple nodes at the same time.
>
> Given this, I think we need to consider if the session manager should
> be explicitly caching sessions in memory.  Rather it should delegate
> caching to the nosql layer and rely on it's mechanisms for maintaining
> distributed cache consistency.

Perhaps I am missing something, but I fail to see how this is possible
without a fully coherent transactional cache replication with
pessimistic lock handling.

> However, having our own in memory map of sessions was very useful for
> doing invalidation of old sessions, as we could iterate over our in
> memory sessions without worrying if another node in the cluster was
> duplicating that work.

Not that easy; in Jetty6 we integrated Terracotta, and that iteration
was basically migrating all the session to all nodes, so we had to
figure out an alternative solution to avoid that mass migration.

> The JDBC manager also has a sweep of the DB
> for old sessions and we'd have to do something similar for NoSQL.

Agreed.

> If we don't want to have sessions concurrently in different nodes,
> then we really need to think of a way that can be enforced
> efficiently.... perhaps using the nodeID in the cookie? But I think
> for scalability it is inevitable that concurrent instances will
> eventually be allowed, which brings up the question of what is the
> semantics/granularity of concurrent session updates?

I am questioning "for scalability we need to allow concurrent
instances in different nodes", which IMHO is not true (as scalability
can be achieved with stickyness).
What are the drivers in supporting this view ?

> If session 12345 exists in both node A and node B, and both are
> updated at about the same time, with attribute xyz updated in node A
> and attribute pqy updated in node B,   then should the resulting
> session state reflect the changes to both xyz and pqy, or should the
> update of one be overwritten by the update of the other?   ie should
> we persist at session or attribute granularity?

You can't solve this problem if not via pessimistic locking and that
would be a scalability killer.
For example, what if nodeA removes attribute "foo" and nodeB changes
its value at the same time ?
You need pessimistic locking because with optimistic you can't decide
(you can detect that a concurrent update happened, but you cannot
decide the final status of the attribute - changed or removed).

> Also, the perennial bug bear of distributed sessions is how to handle
> last access time.  While many sessions are read mostly, the last
> access time is updated on every request.  We don't want this to
> trigger full serialisation of the session on every request - nor do we
> want multiple nodes to fight about who has the correct last update
> time.  This is something best kept in memory and only occasionally
> swept to the persistent store.

It's worse than that. You need atomic updates of the lastAccessedTime,
and you need to be able to run over them when you sweep without
migrating the whole session (if you have a non sticky model).
Again, that is why you do not want to give up on stickyness.

> Which brings me to the really perennial distributes session issues:
> + should changes made to the attibute value objects after the
> setAttribute be reflected in the persisted/distributed state? Is
> setAttribute pass-by-reference or pass-by-value ?
> + when is a session attribute persisted/distributed? When setAttribute
> is called? when the current request completes? when all simultaneous
> requests complete? at regular intervals? on container shutdown?
>
> For our current managers, they are somewhat configurable and can have
> variations of these semantics, but essentially they are pass by
> reference and sessions are persisted either at regular intervals or on
> shutdown.
>
> your thoughts?

I think a viable solution is to have stickyness, which means there is
a master session on one node, and that session is the only one that
can be updated.
This solution is deployed around the world and proved to scale.

Another solution is to have a pessimistic pass-by-reference
distributed state updateable from any node, but I personally have
experience of very bad scalability of this solution because of the
distributed locking involved.

Simon
-- 
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz


Back to the top