The problem of end-user concurrency
by Frans
The problem of end-user concurrency arises when multiple
end-users are making changes to shared data.
End-user transactions and database transactions
Traditionally, end-user transactions are mapped one-to-one
to database transactions. Although, this is technically
the easiest solution, it does not produce the most natural
behavour from an end-user perspective. It completely
ignores the fact that there are different types of end-user
interactions with a system. Some of these types are:
- Modifying. During this type of interaction, the end-user
claims the exclusive right to modify a logical part of
the shared data. And after having made the changes, the
exclusive right is revoked.
- Browsing. During this type of interaction, the end-user
is looking through the data, either for finding a certain
item or for finding some "work" to be done. The user does
not want to have some exclusive rights to the data being
browsed, and the data may changed when requeried.
- Reserving. During this type of interaction, the end-user
is looking through the data, with the aim of making a
certain modification, usually in the form of a resource
allocation. Whenever allocatable items are shown the user
wants to have the guarantee that the displayed items can
be claimed for a certain period of time, before doing the
actual claim. Practically, this means that whenever a
requery is done (within a predefined time frame), the
items will not have disappeared.
- Montoring. During this type of interaction, the end-user
wants an up-to-date view of a selected part of the data.
Any change made on the data, should be come visible within
a specified time-lapse after the change was applied.
Database transaction are an implementation issue
It should be understood that the notion of (database)
transactions is a purely implementational issue. The common
example of transfering a sum of money from one account to
another account, suggest that transactions are needed for
maintaining consistency. They are not. They are the sole
result of how a certain database functionality is implemented.
The amouth stored on a certain account is the sum of all
transactions it was involved in. A database implementation
that would store all transactions in a single table would
not need any database transactions. (Obvious, from a performance
point of view, such a database would not be very well designed.)
The amouth stored in the accounts can be viewed as a materialized
view.
(To be continued)
Identifier generation
Whenever items are added to a set of data, they
usually need to be identified uniquely. The identifier
is that what uniquely identifies the piece of data.
Either there the identifiers are determined by the
end-users or by the system. Both have their own problems
with respect to end-user concurrency.
System generated identifiers
There are at least three methods for generating unique
identifiers by a system. These are:
- Random unique identifiers. Whenever the system is
requested to produce an identifier, it generates
an identifier, but it can not be said on for hand,
which identifier will be generated next. There does
not need to be a strict order in the generated identifiers.
Usually, a combination of time stamps and random
generators is used to produce random unique identifiers.
Random unique identifier generation is often used in
distributed identifier generation in a loosely connected
system, meaning that there is no guarantee that all
components of the system can be reached at any given time,
or that it is not feasible to connect all distributed
identifier generators. (There is always a theoretical chance
that non-unique identifiers are generated, but this chance
is regarded to be practically zero.) Random unique identifiers
are usually rather long.
- Sequential identifiers. Whenever the system is requested
to produce an identifier, it will generate a next
identifier according to some predefined numbering
scheme. Such a numbering scheme can include date and
time elements. This usually involves some counters that
will be incremented every time when an identifier needs
to be generated. It is thus known which will be next
identifier to be generated (apart from possible date and
time elements), and a complete ordering does exists.
Whenever two (or more) end-users request for an identifier
within overlapping end-user transactions, the system may
not give them the same identifier. Whenever the users
abort there end-user transaction, the identifiers they
requested for, are not used. When these identifiers are
discarded, this will result in gaps in the sequence of
identifiers used in the system. If they are reused, the
sequence of identifiers does not match the sequence in
which the identified items were created, and there is
the possibility of momentary gaps.
- Strict sequential identifiers. This means that all items
in the system have strict sequential identifiers. This
is only possible when identifiers are generated when
end-user transactions are committed.
User defined identifiers
In this case it is the responsability of the user to come up
with an identifier when a new item needs to be added to a
database. User defined identifiers are always made inside
end-user transactions. There is a possibility that two user, independently
of each other, pick the same identifier while they are concurrently
adding items to the database. Traditional database transactions
only discover non-unique identifiers when the transactions commits.
In case end-user transactions are implemented by database transactions,
this means that one of the users will receive an error when the
end-user transaction is committed, not when the identifier was chosen.
Uniqueness of identifiers should trancedent the traditional concept
of isolation in database transactions.
(To be continued)
Other Software Engineering articles