Appendix: ProsePoint Express conceptual architecture

This is a high level description of the technology and conceptual architecture behind ProsePoint Express for those who are interested. Being an ongoing service continually under development and improvement, this description is subject to change at any time and may be out of date.

ProsePoint Express and its hosted website service comprises a number of technology components. From the outside, it may look to be a conventional website, but there are multiple parts underneath, all inter-operating together to present a single, highly scalable service.

ProsePoint Express conceptual architecture diagram

Web and database server

The core of ProsePoint Express is the web server. This runs the application software which powers the service.

ProsePoint Express uses the Drupal / Pressflow content management framework on top of an Apache web server. Drupal was used during initial development but it was switched to Pressflow soon after deployment in order to improve performance and scability.

The next layer on top of Drupal / Pressflow is Website Builder. This is a purpose-built collection of modules which implements a lightweight, yet massively multisite, generic content management system. Website Builder is the technology responsible for managing and operating your ProsePoint Express site.

Website Builder started off by being just a very basic but modular cms. It was then extended by writing a number of plugins to add features which would be useful to newspaper websites.

ProsePoint Express stores its content and configuration in a database on a database server. Currently, this uses mysql.

Search server

The search function within ProsePoint Express is handled by a separate search server for performance and scability reasons. This uses the Apache Solr enterprise search platform which powers many of the world's largest Internet sites.

Caching and acceleration layer

In front of the web server, between it and the general Internet, sits the http caching and acceleration layer. This layer caches fully built html pages, images and files. If a visitor requests a page or file which is already in the cache, it is served from here instead and the request doesn't reach the web server.

This layer also improves efficient use of resources by making sure a complete request is received before passing it to the web server.

Adding a separate caching and acceleration layer in front of the web server has resulted in massive increases in performance and scability. (The last time this was measured, the service was capable of 3000 requests per second. We haven't had a need to revisit it since.)

The http caching and acceleration layer uses Varnish (and Pound for SSL traffic).

Content delivery network (CDN)

This section to be filled in.

(Note: As of writing, the CDN component has not yet been implemented.)


Finally, there is a separate server used to store backups.

ProsePoint Express and all its sites are backed-up daily into two geographically dispersed data centres. One is in Newark, New Jersey. The other is in Fremont, California. Both locations are in the United States of America.

These backups cover ProsePoint Express and all its sites. They are not intended to be used for recovering individual pages of individual sites.