PrimeBase XT: 2007

Thursday, December 20, 2007

Making PBXT Fully Durable

Until now PBXT has been ACId (with a lower-case d). This is soon to change as I have had some weeks to work on a fully durable version of the transactional engine (http://www.primebase.com/xt).

My first concern in making PBXT fully durable was to what extent I would have to abandon the original "write-once" design. While there are a number of ways to implement durability, the only method used by databases (as far as I know) is the write-ahead log.

The obvious advantage of this method is that all changes can be flushed at once. However, this requires that all data be written twice: once to the log and after that, to the database itself.

My solution to this problem is a compromise, but I think it is a good one. In a nutshell: short records are written twice, and long records are written once. When it comes to durability, this compromise, I believe, is a good one.

If a transaction writes only short records, then one flush will suffice to commit it. Because the records are short, contention on the write-ahead log is at a minimum. If a transaction writes any long records, most of the data will be written once to a data log (as opposed to a transaction log). Contention for writing on the data log is zero (because each writer has its own data log), but two flushes are required to commit the transaction.

By doing this I have saved other transactions having to wait while a certain transaction copies a large amount of data to the transaction log. Although the transaction log uses a double buffering system, this will still cause a hold up.

In summary: if you have a transaction which writes large records, then it will basically just hold up itself, and not everybody else.

Another innovation I have introduced to reduce contention on the transaction log is an "operation sequence number".

Normally operations must be synchronized on the transaction log to ensure consistency. For example, the allocation of a block must be written to the log before usage. But this means all threads need to lock the transaction log when performing an operation.

Instead of doing this, I issue a unique sequence number for each operation done on a table. The operations are then written to the log in batches without concern about the order.

The process that then applies the changes in the log to the database sorts the operations by sequence number before they are applied. This is also done on restart, during recovery.

Thursday, November 08, 2007

BLOB Streaming presentation at the Hamburg MySQL User Group

I have just posted the presentation that I gave at the Hamburg MySQL User Group last Tuesday. You can download the presentation here.

I have added a few slides on advanced topics: backup, replication and the distributed repository, which did not actually make it into my talk. However, these topics came up in the discussion over a few drinks afterwards.

Thanks to Lenz for the opportunity to present the BLOB Streaming Project and to those that were there for the good feedback.

As Lenz said, it was a "pretty technical crowd". For example, it did not go unnoticed that a denial of service attack could be launched by a malicious client, that establishes many upload connections that fill up the server's file system. Although unreferenced BLOBs of this type are deleted from the repository after 5 minutes, this is still a serious threat for anyone that exposes the MyBS HTTP port to the internet.

To prevent this it may be necessary to limit upload to clients with specific IP addresses (which could be specified by a system variable). Lenz suggested using HTTP-based authentication such as digest access authentification. Any other ideas would be welcome.

Another question was whether a BLOB could be deleted while it is being downloaded from the repository. Although BLOBs are not locked while they are downloaded, I have just realized that this is not a problem. The BLOB remains in the repository after deletion until the compactor thread removes it by deleting a repository file that contains the BLOB. And this is only done once all readers have release the repository file.

I have submitted this talk under the heading An Introduction to BLOB Streaming for MySQL Project as a proposal for the MySQL Conference & Expo 2008. And, if it is approved I will also be presenting Inside the PBXT Storage Engine at the conference. Ronald mentioned that there have been nearly 300 submissions so I will be quite lucky to get both talks approved! :)

Friday, October 19, 2007

New PBXT/MyBS release enables JDBC-based BLOB streaming!

This is quite a milestone for me! At last it possible to actually do some practical work with the BLOB streaming engine (MyBS)!

For this release I have completed changes to the MySQL Connector/J 5.0.7, to allow BLOB data to be transparently stored and retrieved from the MyBS BLOB repository. The new version of the driver is called MySQL Connector/J SE (streaming enabled).

Uploading a BLOB is as simple as using setBinaryStream() or setBlob() on INSERT or UPDATE. By using getBinaryStream() or getBlob() after a SELECT you get direct access to the data stream coming from the repository. More information and some examples are provided in the documentation at: http://www.blobstreaming.org/documentation.

To try this out you need to install the latest versions of PBXT and MyBS. Both are available from: http://www.blobstreaming.org/download.

Binary versions of the storage engines are also available for MySQL 5.1.22 running on 32-bit Linux and x86 Mac OS X. The modified version of the JDBC source code is included in the MyBS source code distribution, but the driver can also be downloaded here.

I have included a small test program, TestJDBC.java, as part of the JDBC driver. So once you have installed the engines, you can test BLOB streaming as follows:

java -cp mysql-connector-java-5.0.7se-bin.jar TestJDBC

TestJDBC connects to a local MySQL server, creates a PBXT table and tests INSERT and SELECT of rows containing BLOBs. The program also serves as an example of how to do BLOB streaming with JDBC, but this is all pretty much standard stuff.

To get started quickly, the most important things to note are:

Set EnableBlobStreaming=true in your JDBC connection URL.
Streamable BLOBs can only be stored in LONGBLOB columns in PBXT tables.
Use setBinaryStream(), setAsciiStream() or setBlob() and specify the length to upload a BLOB.

A streamable BLOB is a BLOB where the data is stored in the MyBS BLOB repository and a reference is inserted into the row. If you use setBinaryStream() on INSERT, for example, but specify a length of -1, then the JDBC driver reverts to the default (non-streaming enabled) behavior which is to store the data directly in the table. The data will be returned correctly on SELECT, but is not streamable.

As usual, any comments, questions or bug reports can be sent directly to me: paul-dot-mccullagh-at-primebase-dot-com. Make sure you put the word PBXT or MyBS in the e-mail title to make it through my spam filter! :)

Tuesday, September 25, 2007

PBXT & MyBS at the MySQL Developer Meeting in Heidelberg

I was glad to have the opportunity to join the MySQL developers in Heidelberg for a few days, so thanks to MySQL for the invitation. In between great food, quite a few beers and a number of boat trips we managed to get a significant amount of work done!

In what could be considered a follow-up to the engine summit at Google following the MySQL User's conference, I joined Calvin Sun, Brian Aker, Jeffrey Pugh, Monty and others from MySQL and the engine developing community to discuss things concerning storage engines.

One of the main topics of the meeting was features and other changes to the MySQL front-end as required by the engines. Some of the requirements (such as an interface to the MySQL optimizer) would really require huge changes, but most agree that freely defined attributes on tables, columns and indexes would be very useful (and relatively easy to implement). Monty would like error handling on the commit call to be added ASAP, but Jeffrey said that's a feature, not a bug fix, and so for MySQL 5.1 it's a no-go. It will be interesting to see who wins that one! I would also like to have engine defined, custom data types. My most pressing problem: how can I indicate that a BLOB column is streamable?

I presented the ideas behind the BLOB Streaming engine to the connector developers and we discussed how the PHP and JDBC connectors could be extended to support BLOB Streaming. Mark Matthews, responsible for JDBC, showed me the spot in the code where the ResultSet would handle a MyBS data stream. Mark also pointed out that JDBC will need to upload a BLOB without specifying which table it would be going into or the JDBC driver will have to parse the SQL statement. Hmm, ... I should have realized this before!?

I am also looking forward to discussing things further with Andrey Hristov, developer of the mysqlnd PHP Connector, after he has tried out the new engine. Making the BLOB streaming functionality easily available to PHP developers will be a great step forward.

I was also glad to be able to meet with Mats Kindahl whose experience on the MySQL replication team is very useful to the BLOB streaming project. His main concern is to maintain the flexibility of the system as he points out in his blog. He suggested a more loosely coupled system, for example to use database triggers instead of the MyBS server-side API calls. While flexibility is important, I want to avoid too many moving parts, and make sure that the basic setup is simple. We both agreed that an embedded scripting language (ala MySQL proxy) may be a good compromise.

In a bit of time between sessions Stewart Smith and I took a look at adding the BLOB streaming functionality into the NDB cluster engine. We didn't get all that far with our quick hack, but we both saw that it could be done relatively easily. The potential for the combination of MySQL cluster and BLOB streaming is huge.

Altogether it is very helpful to any developer in the community to have such concentrated access to the MySQL developers as is possible at the internal developer's conference. This is a great offer on the part of MySQL, and I can only imagine that they will have to continue to limit the number of external developers that can be accommodated at these meetings.

So my recommendation: try to book a ticket as early as possible for next year!

Tuesday, August 28, 2007

MySQL Camp: a Secret Tip?

Where can you get access to some of the most informed people from MySQL and the community, for free?

The answer: at MySQL Camp. And then throw in lunch and breakfast for free, being able to influence the session topics and you have quite a package deal.

So it is strange why so few people took up the offer in New York this year!?

My talk was about the BLOB Streaming engine, MyBS, and I have posted the slides: Presentation - MySQL Camp 2007: The BLOB Streaming Project.

OK, so I got pretty much ragged about the name, MyBS. Why, I was asked, did I name it that? Jay, even suggested a session to find a new name for the engine! Thanks, Jay, very considerate of you... :)

But it was quite unnecessary, because I really can't see what the problem is. I think the name is cool. Uhm, totally ... cool.

Friday, July 27, 2007

BLOB streaming engine (MyBS), version 0.5 Alpha released!

With some effort just before my holiday, I have managed to complete the release of the next version of MyBS, the BLOB streaming engine for MySQL.

This version includes all the basic functionality required to stream BLOB data in and out of MySQL tables.

The main features are:

Uploading of BLOB data directly into the database using HTTP PUT or GET methods.

Downloaded of BLOB data directly from the database using HTTP GET.

BLOB size may exceed 4GB - theoretical BLOB size limit of 256 Terabytes.

BLOBs are stored in a repository which manages references from other storage engine tables.

BLOBs are referenced by a URL.

URLs referencing BLOBs in the repository have a unique access code, for security.

The theoretical maximum repository size is 4 Zettabytes (2^72 bytes) per database.

The server-side streaming API allows any storage engine to store BLOB data in the repository.

MyBS system tables provide a view of the BLOBs and associated references in the repository.

MyBS works together with the PBXT transactional storage engine, version 0.9.88, which supports the MyBS streaming API. Both engines can be downloaded from: http://www.blobstreaming.org/download.

Documentation for MyBS is also available. It includes details about all features so far, and some examples of use: http://www.blobstreaming.org/documentation.

If you try out the new engine, I'd like to hear from you. Any comments, questions and bug reports can be sent directly to me.

Tuesday, July 17, 2007

The MyBS Engine and the BLOB Repository

After some consideration I have decided to move the BLOB repository from PBXT to MyBS (§). This has the advantage that any engine that does not have its own BLOB repository (or is otherwise not suitable for storing large amounts of BLOB data) can reference BLOBs in the MyBS BLOB repository.

(§) MyBS stands for "BLOB Streaming for MySQL". The BLOB Streaming engine is a new storage engine for MySQL which allows you to stream media data directly in and out of the database. More info at www.blobstreaming.org.

Lets look at an example of this. Assume my standard example table:

CREATE TABLE notes_tab (
  n_id    int PRIMARY KEY,
  n_text  longblob
) ENGINE=PBXT;

And assume we have a file called blob_eg.txt with the contents "This is a BLOB Streaming upload test".

Firstly, I can upload a BLOB to the MyBS BLOB Repository using the HTTP PUT method:

% curl -T blob_eg.txt http://localhost:8080/test/notes_tab test/1-326-4891cdae-0

Here I uploaded a BLOB to the repository and specified the database, test, and the table, notes_tab. The URL returned, test/1-326-4891cdae-0, is the reference to the BLOB in the BLOB repository, returned by MyBS. Note that the BLOB is not yet in the table (to store the BLOB directly in the table, I would have to specify a column and a condition which identifies a particular row in the table).

However, the BLOB is already stored in the database, and I can download as follows:

% curl http://localhost:8080/test/1-326-4891cdae-0 This is a BLOB Streaming upload test

Since the BLOB is not yet referenced by a table, the MyBS BLOB repository sets a timer. If the BLOB is not retained (reference count incremented) within a certain amount of time it is removed from the BLOB repository.

To actually insert the BLOB into the table you just insert the BLOB reference, for example:

mysql> insert notes_tab values (1, "test/1-326-4891cdae-0");

On the MySQL server the notes_tab table engine will call the MyBS engine (using the server-side BLOB Streaming API) and retain the test/1-326-4891cdae-0 BLOB reference. So I can now download the BLOB by referencing the table, column and row as follows:

% curl http://localhost:8080/test/notes_tab/n_text/n_id=1 This is a BLOB Streaming upload test

Note: this example will only work with MyBS 0.5 (www.blobstreaming.org/download) or later. Coming soon!

Thursday, June 28, 2007

PBXT: Top 5 wishes of a Storage Engine

In response to Ronald's challenge in Top 5 wishes for MySQL, here is my top 5 wish list. However, it make sense for me to put a slightly different spin on the top 5 series, and write from a storage engine developers perspective.

1. A generic engine test suite

A set of mysql-test-run test scripts and results that are intended to be run by all engines. The tests will verify basic functionality and compatibility, and form the basis for an engine certification process.

2. Internal APIs

PBXT already has to call into MySQL to open .frm files, and transform path and file names. The BLOB Streaming engine will need to access user privilege information. Other engines use the cross-platform functionality provided by mysys. What we need is a number of official, well-defined APIs to various MySQL internal functionality.

3. Customizable table and column attributes

Specialized engines require specialized information. Right now, this information is being packed into table and column comments (hack, hack, ...).

4. Push-down restrict and join conditions

This is a big one for engines in general. Many engines are being created that can do certain searches better than the MySQL query processor. However, for the optimizer to know whether to push down a condition or not will probably require a better performance metric.

5. Custom data types

SQL-92 has the concept of a domain, which is basically a named data type. This could be used as the basis for custom data types provided by a storage engine, made available in the form of a new domain.

And without numbering them, let me slip in a few more wishes. How about MySQL community project development hosted on MySQLForge, complete with integration into the MySQL bug tracking system?! And I have heard that this may also be possible: PBXT and other GPL community engines on the MySQL Community distribution :)

Tuesday, June 26, 2007

First release of the BLOB Streaming engine for MySQL

I have just released the first version of the BLOB Streaming engine for MySQL (MyBS). You can download the source code of the engine from http://www.blobstreaming.org/download. Pluggable binaries for MySQL 5.1.19 (32-bit Linux and Mac OS X) are also available.

To install the plug-in copy libmybs.so to the /usr/local/mysql/lib/mysql directory, connect to your server using mysql, and enter:

mysql> install plugin MyBS soname "libmybs.so";

This version allows you to download BLOBs that are already stored in the database using HTTP. The URL is specified as follows:

http://mysql-host-name:8080/database/table/blob-column/condition

Where condition has the form: column1=value1&column2=value2&...

I gave an example of this in my previous blog: "GET"ing a BLOB from the database with the BLOB Streaming Engine

8080 is the default port, which can be set using the mybs_port system variable on the mysqld command line. For example: mysqld --mybs_port=8880

In order for BLOB streaming to work you also need PBXT version 0.9.87 which is streaming enabled. Streaming enabled simply means the engine supports the MyBS server-side streaming API.

This version of PBXT is also available from www.blobstreaming.org, or from Sourceforge.net.

Note that this version is currently only for use behind the firewall because the HTTP access is unrestricted.

The next step will be to enable the uploading of BLOBs using the HTTP PUT method, and the implementation of basic security.

Tuesday, June 05, 2007

"GET"ing a BLOB from the database with the BLOB Streaming Engine

Current plans call for the use of the HTTP protocol to upload and retrieve BLOBs to and from the database. The BLOB Streaming Engine makes this possible by integrating a lightweight HTTP server directly into the MySQL server.

I am currently working on an alpha implementation of this, and would like to give a short demonstration of what is possible using this system.

We start by creating a table using any streaming enabled storage engine (a streaming enable storage engine, is an engine that supports the server-side streaming API):

use test;
CREATE TABLE notes_tab (
  n_id        INTEGER PRIMARY KEY,
  n_text      BLOB
) ENGINE=pbxt;
INSERT notes_tab VALUES (1, "This is a BLOB streaming test!");

Now assuming the MySQL server is on the localhost, and the BLOB streaming engine has been set to port 8080, you can open your browser, and enter this URL:

http://localhost:8080/test/notes_tab/n_text/n_id=1

With the following result:

So without even doing a SELECT, you can GET a BLOB directly out of the database!

Note that there is no need for the BLOB in the database to be explicitly "streamable" for this to work.

Friday, May 18, 2007

The Scalable BLOB Streaming discussion so far

Having discussed BLOB streaming for MySQL with a number of people I have gathered quite a bit of input on the subject.

So here are some details of the issues raised so far:

Server-side API

One of the first things that Brian Aker pointed out to me was that the server-side API must make it possible to use the sendfile() system call. The call does direct disk to network transfer and can speed up delivery of a BLOB stream significantly.

The server-side API must also include a mechanism to inform the stream enabled engine that an upload is complete. Assuming the streaming protocol used is HTTP, there are 2 ways of determining the end of a download: either the connection is closed when download is complete, or the length of the BLOB data is specified in a HTTP header.

ODBC Issues

Monty was concerned about compatibility with the MySQL ODBC driver. The ODBC function SQLPutData() is used to send BLOB data, and SQLGetData() is used to retrieve BLOB data. Both functions transfer data in chunks. The implementation of these functions would have to be made aware of streamable BLOB values.

This can probably be done in a way that is transparent to the user. It may be necessary to designate certain database types as streaming BLOBs. One suggestion is to add 2 new types to MySQL. LONGBIN and LONGCHAR. These keywords are not yet used by MySQL.

However, from the OBDC side the data types LONGVARBINARY and LONGVARCHAR are already mapped to the MySQL types BLOB and TEXT respectively. But this simply means that it would be transparent to an ODBC application whether a BLOB is being streamed, or retrieved using the standard client/server connection.

Upload Before or After INSERT

There is some debate about how exactly a BLOB should be inserted. The 2 main possibilities are:

1) Upload the BLOB. The upload returns a handle (or key). Perform the INSERT setting the BLOB column to the key value.

2) Perform an INSERT, but specify in a CREATE_BLOB() function the size of the BLOB that is to be inserted into the column. The caller then SELECTs back the inserted data. In the BLOB column the caller finds a handle. The handle can then be used to upload the data.

For method (1) it was also suggested that the client application specify a name, instead of using a handle returned by the server. The name could then be used as input to a "create_blob" function which is specified in the INSERT statement. The advantage of this is that the text of the insert statement is readable.

One advantage of method (1), as pointed out by Monty, is that the client does not have to wait for the upload to complete before doing the INSERT. The server could wait for the upload to complete. This would require a client side generated identifier for each BLOB uploaded.

Note that method (2) requires use of a transaction to prevent the row with the BLOB from being selected before the BLOB data has been uploaded.

Storage of BLOBs

Stewart Smith suggested that the BLOBs be stored by the streaming engine itself. This is something to be considered, and probably will be necessary in most scale-out scenarios. Otherwise the current plan is that each engine that is streaming BLOB enabled (i.e. that supports the server-side streaming API) will store the BLOBs themselves.

Scaling Writes

Also mentioned in regard to scaling is the fact that read scale-out is usually more important than write. However, there are web-sites that have a heavy write load. To be investigated would be how NBD could used in combination with the streaming engine to scale-out uploading of BLOBs.

mysqldump & mysql

As pointed out by Monty, the output of mysqldump is a readable stream. So any changes to mysqldump in order to support streamable BLOBs should ensure that this continues to be the case. This is related to the question of how to upload BLOBs using the mysql client. One idea would be to provide a command in mysql to upload a file or a block of binary data or text.

Security

In some cases it may be necessary to check whether a client is authorized to download or upload a BLOB. The most straight forward method would be for the server to issue authorization tokens which are submitted with an upload or download. These tokens expire after a certain amount of time, or when the associated server connection is closed. However, if the streaming protocol is to be HTTP then we need to investigate the possibility of using standard HTTP authentication when retrieving and sending BLOBs.

In-place Updates

In-place updates of BLOBs is allowed by some SQL servers (for example by using functions UPDATETEXT and WRITETEXT supported by MS SQL Server). In this case the streaming enabled engine would be responsible for locking or using some other method to ensure that concurrent reads of the BLOB remain consistent.

PHP Upload

Georg Richter, who is responsible for the PHP Native Driver at MySQL, noted that using BLOB streaming, data could be uploaded from the Web directly into the database. The PHP extension to the standard MySQL client software should include a function to make this possible.

ALTER TABLE

Georg also pointed out that it should be possible to convert BLOBs currently stored in MySQL tables into streamable BLOBs using ALTER TABLE. This would enable developers to quickly try out streaming BLOBs and test the performance characteristics, in combination with their applications.

These are the main issues raised so far. Any further ideas, comments and questions are welcome! :)

Tuesday, May 01, 2007

PBXT and the MySQL Conference & Expo 2007

The conference is over, and it was a great week but pretty exhausting! From what I saw, and from what I have heard the sessions were of a very high standard this year. I have come back with a number of new ideas and quite a few things I would like to implement in PBXT. One of these is new online backup API for MySQL which will enable you to make a snapshot backup of data stored by all engines, and do point in time recovery.

For me the week started off with the BoF on scalable BLOB streaming on Tuesday evening. The BoF was well attended and there was significant interest in the topic. I will be reporting on some of the issues discussed soon.

On Wednesday morning I presented: PrimeBase XT - Design and Implementation of a Transactional Storage Engine. I was pleased with the number of questions, interest and feedback I received.

When I mentioned to Mårten Mickos that someone had said it was the best session they had heard so far at the conference, Mårten told me that they pay guys to say this to the speakers so that they feel good! So I must thank MySQL for this very thoughtful gesture - joke, of course ;)

Thanks also to Sheeri Kritzer for video recording the talk so I hope to get a link to that soon. In the meantime I have posted my slides to the PBXT home page:

http://www.primebase.com/xt/download/pbxt_mysql_uc_2007.pdf

After that I also took part in 2 lightning rounds sessions: Top MySQL Community Contributors and State of the Partner Engines. Both sessions were interesting in their diversity. One of the top contributors, Ask Bjørn Hansen, described how meeting his goal of filing a bug a week was a lot easier in the early days of MySQL. However, more recently he has been helped by the current state of the GUI tools!

During the Partner Engines session, it came as a surprise to some people in the audience that not all engines are for free. Actually there is a wide spectrum from GPL over partially free to highly priced. In fact, one of the developers of a data-warehousing engine found the question as to whether it may be free quite amusing. Solid has not decided to what extent it high-availability offering will be commercial, and the Amazon S3 engine is free, but the service behind it not. So that's something to watch out for in general.

I took the opportunity during the Partner Engines session to mention our plans for the future of PBXT which involve building a scalable BLOB streaming infrastructure for MySQL. This is relevant to a number engine developers as we will be providing an open system which will enable all engines to stream BLOBs directly in and out of the server. So please check out blobstreaming.org for more details.

Thursday, April 19, 2007

What makes the design of PBXT similar to MyISAM?

PBXT is a transactional storage engine, but what does the design have in common with MyISAM?

I'll be answering this and other questions during my session at the MySQL Users Conference next week:

PrimeBase XT: Design and Implementation of a Transactional Storage Engine
Date: Wednesday, April 25
Time: 10:45am - 11:45am
Location: Ballroom F

So be sure to check it out! :)

Monday, April 09, 2007

PBXT 0.9.86 the MySQL Conference 2007 release!

I have just released PrimeBase XT 0.9.86 which will be my last release before the MySQL User Conference this year. The most significant change in this version is the reduction of the number of data logs used per table. This, and a number of other modifications, makes PBXT fit to handle databases with 1000's of tables.

If you would like to learn more about the development and design of PBXT then join me for my session at the MySQL User Conference in Santa Clara, on Wed, April 25, 10:45am - 11:45am, Ballroom F:

PrimeBase XT: Design and Implementation of a Transactional Storage Engine

Looking ahead, I would also like to invite all who are interested to the "Scalable BLOB Streaming Infrastructure" BoF, which is scheduled for Tuesday, April 24, 8:30pm - 9:30pm.

This will be an informal discussion of our plans for the implementation of scalable BLOB streaming in MySQL, and how this all fits together with PBXT and other engines.

We would like as much input as possible at this early stage, so I hope you can make it!

I will also be taking part in a number of lightning rounds so check out this page for a complete list of my sessions:

I'm looking forward to seeing you all there!

Friday, March 23, 2007

PBXT and a scalable BLOB streaming infrastructure

As Kaj Arnö already mentioned in his blog, we have a vision for MySQL and it is called a scalable BLOB streaming infrastructure. Our plan is to build this into and around the MySQL architecture with the help of MySQL and the community.

It is a "big picture" idea and in this way a response to Robin Schumacher's question: The MySQL Vision - What do you see?. But at the same time it is very relevant and practical in the context of the Web 2.0 world.

The design of the system includes a client-side library which extends the existing MySQL client API, a stream based communications protocol and a scalable back-end which is (at least partially) linked into the MySQL server. In short, we want to make it possible to put BLOBs of any size in the database.

The system will be open, and available to all engines running under MySQL. Of course, I believe PBXT has the ideal design and performance characteristics to support this new infrastructure.

Many say that their programs run fine with the BLOBs stored in the filing system. And this is true, but what about transactional consistency, backup and replication? And what about scale-out for the storage and delivery of this data? Simply put, we know these are significant issues from our customers in the print and web publishing business that need this functionality.

And, when you think about it, media streaming has a very promising future. I believe it's becoming more important for an increasing number of developers. So this is more about opening up new opportunities and extending the reach of MySQL into new areas.

We have a pretty good idea of the basic design of the system, but this is just the starting point. The design must reflect the requirements of the users and developers, so our first step is to get as many as possible in the MySQL community together who have an interest in this technology.

We want to know what features and characteristics of such a system are important to you. What are your requirements of such a system?

To this end I am arranging a BoF on scalable BLOB streaming at the MySQL user conference next month. This will be a great opportunity to get together and discuss the subject with all who are interested, so I hope you will join me.

Of course you are welcome to contact me any time in this regard by e-mail at paul-dot-mccullagh-at-primebase-dot-com.

Monday, March 19, 2007

PBXT does Windows!

On Friday I posted PBXT version 0.9.85 which concludes my initial work on porting the engine to Windows NT/XP.

I have built a MySQL 5.1 executable (mysqld-nt.exe) to make it easy to try it out. The binary package can be downloaded from http://www.primebase.com/xt. Instructions on how to install are given in the README file.

If you would like to build it yourself, then download the source code from SourceForge.net, and follow the instruction in the windows-readme.txt file.

Unfortunately MySQL 5.1 does not support runtime loading of storage engines, as done on Linux and Mac OS X. Last I heard, this feature will also be a bit slow in coming, and is scheduled for 5.2. So if there is anybody else out there who would like to see this feature sooner, let your voices be heard!

On the whole the port to Windows was fairly straightforward. I miss atomic "seek-and-read/write" functions like pread and pwrite under Windows. But the only real source of problems was the pthread stuff. PBXT uses a bit more of the pthread library than the MySQL server, so I had to add implementations for several functions.

I would like to just link one of the available pthread libraries for Windows (such as pthreads-win32) but MySQL already implements some of the functions, which could cause link errors. In general, I think, MySQL should consider using a LGPL library such as pthreads-win32.

But this is a minor point, PBXT for Windows is running anyway...

Thursday, February 08, 2007

The first binary distribution of PBXT!

It's quite an effort to compile and test 3 versions of MySQL on 4 different machines, but that is what I have done for the first binary distribution of the PrimeBase XT pluggable storage engine.

Of course, its worth the effort because this is the "holy grail" of the pluggable storage engine strategy. Namely, binary plug-ins that work with the binary distributions prepared by MySQL and others in the community.

And by creating an installer script I have made it easier than ever to install the plug-in. So after you have downloaded the binary package from http://www.primebase.com/xt, all you have to do is:

1. Start your MySQL server.

2. Enter: sudo ./install

The installer will automatically determine the version of the server running and install the corresponding plug-in.

The package includes plug-ins for MySQL 5.1.14, 5.1.15 and 5.2.0 on Linux (32/64-bit) and Mac OS X (x86/PowerPC). Nevertheless you may wonder why the download is 22MB in size.

The reason is I have had to include patched versions of mysqld with the plug-ins for MySQL 5.1.14 and 5.2.0. The patch fixes a bug that causes MySQL to crash when restarted after a plug-in has been installed. The installer generates an uninstall script which restores the unpatched mysqld and removes the plug-in.

The bug is fixed in MySQL 5.1.15, so in the future the package should be much smaller, even when it includes more platforms.

Wednesday, January 31, 2007

PBXT 0.9.8 Beta with Referential Integrity released!

I have just released version 0.9.8 of the PrimeBase XT storage engine for MySQL 5.1. The major feature of this new version is foreign key support. As far as I know, this makes PBXT the first 3rd party storage engine to fully implement referential integrity, and the second after InnoDB.

With this version PBXT is practically feature complete. Only CHECK TABLE remains to be implemented for the first GA release (see pbxt-to-do.txt for details). So my plan to complete the PBXT GA release on time for MySQL 5.1 GA looks quite doable (especially now that I have heard MySQL 5.1 GA has been pushed to 3rd quarter 2007).

The latest package can be downloaded from www.primebase.com/xt or sourceforge.net/projects/pbxt. I have updated the instructions on how to build and install PBXT as a pluggable storage engine: how-to-build.txt. This text also explains how to download and build MySQL 5.1.14.

Although building everything from source code is relatively straight forward I plan to make binary versions of the plug-in available for MySQL 5.1.14 shortly. So a quick test drive of PBXT is about to become easier than ever...

Tuesday, January 23, 2007

PBXT Session at the 2007 MySQL Conference & Expo

Thanks to MySQL, and in particular Jay, for giving me the opportunity to talk about PBXT at the up-coming MySQL Conference in April. I will be presenting a session with the title: PrimeBase XT: Design and implementation of a transactional storage engine.

Topics include the design of XT and how it differs from conventional implementations, experience with implementing storage engines, surprises and gotchas, and performance. So I think this talk will be of interest to both end-users and other developers of storage engines.

See you there!