The problem of end-user concurrency

by Frans

The problem of end-user concurrency arises when multiple end-users are making changes to shared data.

End-user transactions and database transactions

Traditionally, end-user transactions are mapped one-to-one to database transactions. Although, this is technically the easiest solution, it does not produce the most natural behavour from an end-user perspective. It completely ignores the fact that there are different types of end-user interactions with a system. Some of these types are:

Modifying. During this type of interaction, the end-user claims the exclusive right to modify a logical part of the shared data. And after having made the changes, the exclusive right is revoked.
Browsing. During this type of interaction, the end-user is looking through the data, either for finding a certain item or for finding some "work" to be done. The user does not want to have some exclusive rights to the data being browsed, and the data may changed when requeried.
Reserving. During this type of interaction, the end-user is looking through the data, with the aim of making a certain modification, usually in the form of a resource allocation. Whenever allocatable items are shown the user wants to have the guarantee that the displayed items can be claimed for a certain period of time, before doing the actual claim. Practically, this means that whenever a requery is done (within a predefined time frame), the items will not have disappeared.
Montoring. During this type of interaction, the end-user wants an up-to-date view of a selected part of the data. Any change made on the data, should be come visible within a specified time-lapse after the change was applied.

Database transaction are an implementation issue

It should be understood that the notion of (database) transactions is a purely implementational issue. The common example of transfering a sum of money from one account to another account, suggest that transactions are needed for maintaining consistency. They are not. They are the sole result of how a certain database functionality is implemented. The amouth stored on a certain account is the sum of all transactions it was involved in. A database implementation that would store all transactions in a single table would not need any database transactions. (Obvious, from a performance point of view, such a database would not be very well designed.) The amouth stored in the accounts can be viewed as a materialized view.

(To be continued)

Identifier generation

Whenever items are added to a set of data, they usually need to be identified uniquely. The identifier is that what uniquely identifies the piece of data. Either there the identifiers are determined by the end-users or by the system. Both have their own problems with respect to end-user concurrency.

System generated identifiers

There are at least three methods for generating unique identifiers by a system. These are:

Random unique identifiers. Whenever the system is requested to produce an identifier, it generates an identifier, but it can not be said on for hand, which identifier will be generated next. There does not need to be a strict order in the generated identifiers. Usually, a combination of time stamps and random generators is used to produce random unique identifiers. Random unique identifier generation is often used in distributed identifier generation in a loosely connected system, meaning that there is no guarantee that all components of the system can be reached at any given time, or that it is not feasible to connect all distributed identifier generators. (There is always a theoretical chance that non-unique identifiers are generated, but this chance is regarded to be practically zero.) Random unique identifiers are usually rather long.
Sequential identifiers. Whenever the system is requested to produce an identifier, it will generate a next identifier according to some predefined numbering scheme. Such a numbering scheme can include date and time elements. This usually involves some counters that will be incremented every time when an identifier needs to be generated. It is thus known which will be next identifier to be generated (apart from possible date and time elements), and a complete ordering does exists. Whenever two (or more) end-users request for an identifier within overlapping end-user transactions, the system may not give them the same identifier. Whenever the users abort there end-user transaction, the identifiers they requested for, are not used. When these identifiers are discarded, this will result in gaps in the sequence of identifiers used in the system. If they are reused, the sequence of identifiers does not match the sequence in which the identified items were created, and there is the possibility of momentary gaps.
Strict sequential identifiers. This means that all items in the system have strict sequential identifiers. This is only possible when identifiers are generated when end-user transactions are committed.

User defined identifiers

In this case it is the responsability of the user to come up with an identifier when a new item needs to be added to a database. User defined identifiers are always made inside end-user transactions. There is a possibility that two user, independently of each other, pick the same identifier while they are concurrently adding items to the database. Traditional database transactions only discover non-unique identifiers when the transactions commits. In case end-user transactions are implemented by database transactions, this means that one of the users will receive an error when the end-user transaction is committed, not when the identifier was chosen. Uniqueness of identifiers should trancedent the traditional concept of isolation in database transactions.

(To be continued)