Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jetty-dev] NoSQL Session manager

My initial thoughts on this are that it is exceedingly difficult to
take a broad group of technologies such as those that comprise
NoSql[1] and come up with a generic implementation or specification
for something like the session management interface that applied
generically and equally to all NoSql implementations.

To shamelessly pull from wikipedia to illustrate the point:

   NoSQL architectures often provide weak consistency guarantees, such
as eventual consistency, or transactions restricted to single
   data items. Some systems, however, provide full ACID guarantees in
some instances by adding a supplementary
   middleware layer (e.g., CloudTPS).

It depends on what technology your talking about putting the session
manager on top that will drive the decisions that need to be made in
that implementation.  NoSql solutions vary on how they address the CAP
theorem [2]...and this drives the architecture of the session manager
as well.  One solution may provide you native ways of dealing with
consistency and others you might require you to manually manage
consistency yourself.

Pulling from the cassandra (a very popular nosql solution) wiki [3]:

   The CAP theorem states that you have to pick two of Consistency,
Availability, Partition tolerance: You can't have the three at the
   same time and get an acceptable latency.

   Cassandra values Availability and Partitioning tolerance (AP).
Tradeoffs between consistency and latency are tunable in Cassandra.
   You can get strong consistency with Cassandra (with an increased
latency). But, you can't get row locking: that is a definite win for
HBase.

   Note: Hbase values Consistency and Partitioning tolerance (CP)

>From what I have seen, you really need to consider the underlying
technology in play before really designing something like a session
manager.  Even something as simple as caching in memory or not could
have an effect on what is happening under the covers, HBase and
Cassandra can not to be considered 'equivalent' in every way.  Even
the concept of relying on a master node is not necessarily a safe
assumption, cassandra really doesn't even had that concept.

Greg, you mentioned that some users of sessions are mostly read only
and others are heavily read/write oriented.  It is probably the case
that different of these nosql technologies are better suited for these
different usage patterns.

This is a nice topic to see being discussed though, good stuff.

cheers,
jesse

[1] http://en.wikipedia.org/wiki/NoSQL
[2] http://en.wikipedia.org/wiki/CAP_theorem
[3] http://wiki.apache.org/cassandra/ArchitectureOverview



--
jesse mcconnell
jesse.mcconnell@xxxxxxxxx



On Wed, Jun 1, 2011 at 05:15, Simone Bordet <sbordet@xxxxxxxxxxx> wrote:
> Hi,
>
> On Wed, Jun 1, 2011 at 05:10, Greg Wilkins <gregw@xxxxxxxxxxx> wrote:
>> The servlet 3.1 EG will soon be considering how to better support
>> cloud/clusters in the servlet spec and one of the things that will
>> need to be considered is how to better support HttpSession that are
>> backed with NoSQL style scalable stores.
>>
>> We do have a prototype NoSQL session manager for jetty, but before
>> moving forward with that, I'd like to discuss here a number of the
>> issues/options for how it can proceed.   But first, lets review what
>> the jetty HashSession and JDBCSession managers do - as they are a lot
>> more than just maps of ID's to session instances.
>>
>> Sessions are created with an ID that is unique within all the contexts
>> that share an sessionIDmanager.  Multiple contexts can share session
>> IDs (as there is only 1 session cookie), but cannot share session
>> values.   If one context invalidates a session, then all contexts with
>> that session ID have their sessions invalidated.    So the data
>> structure is not:
>>
>>   ID --> Attribute --> Value
>>
>> but is instead
>>
>>   ID -> Context --> ID --> Attribute --> Value
>>
>> the double ID mapping is for efficiency within a context.
>>
>> Currently the session ID is split between a clusterID and a nodeID
>> part.  The cluster ID is unique within the cluster, while the nodeId
>> includes a suffix that can be used for load balancer stickyness.
>> Jetty supports session migration such that if a node receives a
>> session ID with a mismatched nodeID, then the session cookie is
>> rewritten and the session is consider migrated.
>>
>> For a nosql session manager, we need to ask if explicit stickyness in
>> the session ID is needed.  Connection stickyness may be sufficient for
>> efficient operation of nosql sessions.
>
> I think, as you also note below, that removing stickyness is asking
> for *big* troubles.
>
>> The HashSessionManager holds sessions in memory as attribute maps,
>> however it essentially works as a cache as sessions may be idled to
>> disk (but an in memory holder remains) or saved to disk (nothing left
>> in memory) and restored lazily as requests are received.  The JDBC
>> session manager similarly uses in memory sessions as a cache in front
>> of the DB.    Both of these map well to the servlet specs requirement
>> that only a single version of a session should be in memory at any
>> given time.  However there is a suggestion that for scalability this
>> constraint should be relaxed and a session might be able to exist on
>> multiple nodes at the same time.
>>
>> Given this, I think we need to consider if the session manager should
>> be explicitly caching sessions in memory.  Rather it should delegate
>> caching to the nosql layer and rely on it's mechanisms for maintaining
>> distributed cache consistency.
>
> Perhaps I am missing something, but I fail to see how this is possible
> without a fully coherent transactional cache replication with
> pessimistic lock handling.
>
>> However, having our own in memory map of sessions was very useful for
>> doing invalidation of old sessions, as we could iterate over our in
>> memory sessions without worrying if another node in the cluster was
>> duplicating that work.
>
> Not that easy; in Jetty6 we integrated Terracotta, and that iteration
> was basically migrating all the session to all nodes, so we had to
> figure out an alternative solution to avoid that mass migration.
>
>> The JDBC manager also has a sweep of the DB
>> for old sessions and we'd have to do something similar for NoSQL.
>
> Agreed.
>
>> If we don't want to have sessions concurrently in different nodes,
>> then we really need to think of a way that can be enforced
>> efficiently.... perhaps using the nodeID in the cookie? But I think
>> for scalability it is inevitable that concurrent instances will
>> eventually be allowed, which brings up the question of what is the
>> semantics/granularity of concurrent session updates?
>
> I am questioning "for scalability we need to allow concurrent
> instances in different nodes", which IMHO is not true (as scalability
> can be achieved with stickyness).
> What are the drivers in supporting this view ?
>
>> If session 12345 exists in both node A and node B, and both are
>> updated at about the same time, with attribute xyz updated in node A
>> and attribute pqy updated in node B,   then should the resulting
>> session state reflect the changes to both xyz and pqy, or should the
>> update of one be overwritten by the update of the other?   ie should
>> we persist at session or attribute granularity?
>
> You can't solve this problem if not via pessimistic locking and that
> would be a scalability killer.
> For example, what if nodeA removes attribute "foo" and nodeB changes
> its value at the same time ?
> You need pessimistic locking because with optimistic you can't decide
> (you can detect that a concurrent update happened, but you cannot
> decide the final status of the attribute - changed or removed).
>
>> Also, the perennial bug bear of distributed sessions is how to handle
>> last access time.  While many sessions are read mostly, the last
>> access time is updated on every request.  We don't want this to
>> trigger full serialisation of the session on every request - nor do we
>> want multiple nodes to fight about who has the correct last update
>> time.  This is something best kept in memory and only occasionally
>> swept to the persistent store.
>
> It's worse than that. You need atomic updates of the lastAccessedTime,
> and you need to be able to run over them when you sweep without
> migrating the whole session (if you have a non sticky model).
> Again, that is why you do not want to give up on stickyness.
>
>> Which brings me to the really perennial distributes session issues:
>> + should changes made to the attibute value objects after the
>> setAttribute be reflected in the persisted/distributed state? Is
>> setAttribute pass-by-reference or pass-by-value ?
>> + when is a session attribute persisted/distributed? When setAttribute
>> is called? when the current request completes? when all simultaneous
>> requests complete? at regular intervals? on container shutdown?
>>
>> For our current managers, they are somewhat configurable and can have
>> variations of these semantics, but essentially they are pass by
>> reference and sessions are persisted either at regular intervals or on
>> shutdown.
>>
>> your thoughts?
>
> I think a viable solution is to have stickyness, which means there is
> a master session on one node, and that session is the only one that
> can be updated.
> This solution is deployed around the world and proved to scale.
>
> Another solution is to have a pessimistic pass-by-reference
> distributed state updateable from any node, but I personally have
> experience of very bad scalability of this solution because of the
> distributed locking involved.
>
> Simon
> --
> http://bordet.blogspot.com
> ---
> Finally, no matter how good the architecture and design are,
> to deliver bug-free software with optimal performance and reliability,
> the implementation technique must be flawless.   Victoria Livschitz
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>    http://xircles.codehaus.org/manage_email
>
>
>


Back to the top