Frequently Asked Questions (FAQ)
What is TPC-D?

0. Introduction

1. General and Administrative Questions
1.1 What is the current revision of the Specification?
1.2 Who maintains the specification/FAQ?
1.3 How can I get a copy of the specification?

2. Rationale Questions
2.1 What does TPC-D model?
2.2 What does TPC-D mean by "Decision Support"?
2.3 How is TPC-D similar to or different from TPC-C?
2.4 Why does TPC-D model an environment with ad hoc queries as opposed to ones which are prewritten?
2.5 Why does TPC-D model a system which supports concurrent updates?

3. Database Design Questions
3.1 Do I need to use any particular products to implement TPC-D?
3.2 What does the schema look like?
3.3 Can I make any changes to the schema?
3.4 Why is the TIME table optional?
3.5 Does TPC-D require referential integrity on primary keys?
3.6 Does the database require indexes on primary and/or foreign keys?
3.7 Are the queries designed to work without indexes?
3.8 Are there any rules governing the creation of secondary access structures such as B-tree indexes?
3.9 Are implementations with different indexing strategies comparable?
3.10 Does TPC-D allow the definition of auxiliary objects such as views or indexes that represent pre- aggregated versions of the underlying table data?
3.11 Does TPC-D allow replication of database tables?

4. Query Questions
4.1 How were the queries chosen?
4.2 Why ANSI SQL?
4.3 What is the difference between a functional query definition and SQL?
4.4 Can I adapt the queries to the SQL required by my DBMS?
4.5 I can't run the queries in Section 2. Now what?
4.6 The query runs faster if I rewrite the select/from/where clause. Is that permissible?
4.7 Can I use a mix of variants, modifications, and functional query definitions?
4.8 How can I be sure the query is giving the right answer?

5. Update Questions
5.1 Why does TPC-D require update functions?
5.2 The update functions aren't SQL. How do I implement them?
5.3 Why are there varying degrees of flexibility in the rules regarding the implementation of the queries as opposed to the update functions?

6. ACID Questions
6.1 What does ACID stand for?
6.2 Why are the ACID tests part of a decision support benchmark?
6.3 Are the ACID tests run against the TPC-D schema?
6.4 What isolation level does TPC-D require?

7. Data Population Questions
7.1 Why were the permissible scale factors chosen?
7.2 How big is the data set?
7.3 How do I build a data set?
7.4 Where are the sources for DBGEN?
7.5 DBGEN doesn't run on my platform. Now what?
7.6 How do I know my data population is correct?
7.7 How is the data for the update functions generated?
7.8 How is DBGEN used in the database load test?

8. Metrics and Execution Rules Questions
8.1 What is the metric for the benchmark?
8.2 What is the "power" metric (QppD)?
8.3 What is the "throughput" metric (QthD)?
8.4 What is the "query per hour" rating (QphD)?
8.5 How is the price-performance metric calculated?
8.6 Why does TPC-D use a geometric mean for the power test?
8.7 Why are there two performance metrics?
8.8 What are the rules regarding fair published comparisons of TPC-D results?
8.9 Is this a multi-user benchmark? How many users do I have to run?
8.10 If the number of users is left up to the sponsor, how can I compare results with differing numbers of query streams?
8.11 Do all the queries have to be run? In what sequence?
8.12 How are the update streams run in the benchmark? Are they optional?
8.13 Does the benchmark have any rules to ensure that the reported measurements are reproducible?

9. Pricing questions
9.1 Why do you require that the number of supported terminals and users be at least equal to 10 times the number of query streams?
9.2 Why aren't the network and end-user devices such as terminals or PCs priced?
9.3 Why is the product required to be available within 6 months of the full disclosure report?
9.4 Why does TPC-D require a test sponsor to disclose system pricing in such detail?

10. Audit Questions
10.1 Must all benchmark results be independently audited?
10.2 How do I find a TPC certified auditor?
10.3 What happens during an audit?
10.4 How do I find additional information to prepare for an audit?
10.5 What recourse do I have if I disagree with my auditor?
10.6 How far in advance should I choose an auditor?
10.7 Must the auditor be present when the performance tests are executed?

11. Full Disclosure Report Questions
11.1 Is the FDR absolutely required by the time of publication or can I publish and give the FDR later?
11.2 Is there a particular format required for publishing the FDR?
11.3 Can I ask the auditor to write my FDR?
11.4 Why is there both an FDR and an Executive Summary?
11.5 Why do QppD and QthD appear on the executive summary, but not the composite metric, QphD?
11.6 Why does the number of streams in the throughput test NOT appear on the executive summary?
11.7 What is included in the database load time?

12. Miscellaneous Questions
12.1 I need answers to a question not listed here. What should I do?

0. Introduction

This is a list of Frequently Asked Questions concerning the TPC-D benchmark. It is not intended to answer every question about benchmarking in general, or even TPC-D in particular, but it should provide a good starting point in any effort to become more familiar with TPC-D. The questions are grouped into general topics which loosely follow the structure of the TPC-D specification. Please refer to Section 1 (General and Administrative Questions) to be sure that this document is current with the revision of the benchmark specification that you are interested in.

This document uses several terms and phrases which are specific to TPC benchmarks. It is assumed that the reader is familiar with basic TPC benchmark terminology, and has access to the TPC-D benchmark specification.

DISCLAIMER: This document is accurate to the best of our knowledge. Any discrepancy between this document and the specification represents an error in this document. The specification is the only definitive source of information, and supersedes any contradictory information found in this document.

1. General and Administrative Questions

1.1: What is the current revision of the Specification?

The current revision of the TPC-D Benchmark Specification is version 1.3.1, released to the public on 20 February 1998.

1.2: Who maintains the specification/FAQ?

The specification/FAQ is maintained (and periodically revised) by the Transaction Processing Performance Council (TPC). Please refer to the TPC's general FAQ for more information on the TPC. A good place to start is the TPC's home page (http://www.tpc.org). The TPC may also be reached at:

TPC
Presidio of San Francisco
Building 572B (surface)
P.O. Box 29920 (mail)
San Francisco, CA 94129-0920
Phone: (415) 561-6272
FAX: (415) 561-6120
email: admin@tpc.org

2. Rationale Questions

2.1: What does TPC-D model?

TPC-D models a decision support environment in which complex ad hoc business-oriented queries are submitted against a large database. The queries may access large portions of the database and typically involve one or more of the following characteristics:

multi-table joins
extensive sorting
grouping and aggregation
sequential scans

The purpose of TPC-D is to assess cost/performance of a particular system which supports this type of application. TPC-D can be scaled such that the size of the database can be tailored to the processing capacity of the system. The system should be available for queries on a 24-hour-per-day, 7-day-per-week basis and support background "trickle" or periodic batch updates to the database. In addition to a set of read-only queries, the benchmark also includes two update functions to model this database maintenance activity.

2.2: What does TPC-D mean by "Decision Support"?

Decision Support is a term used to describe the capability of a system to support the formulation of business decisions through complex queries against a database. It can also specifically refer to a database which is intended for this purpose, as opposed to one which primarily supports on-line transaction processing operations. Decision Support is different from OLTP. OLTP applications are update-intensive and generally consist of shorter transactions that access a small portion of a database, often through a primary key or index. Decision support applications typically consist of long and often complex read-only queries that access large portions of the database. Decision support databases are updated relatively infrequently, either by periodic batch runs, or by background "trickle" update streams. The database need not contain real-time or up-to-the-minute information, as decision support applications tend to process large amounts of data which usually would not be affected significantly by individual transactions.

2.3: How is TPC-D similar to or different from TPC-C?

General differences between TPC-D and TPC-C are highlighted in the following table:

*Benchmark*	TPC-D	TPC-C
Type of computing	Decision Support	On-line Transaction Processing (OLTP)
Application	Ad hoc business Questions, e.g. sales trends	Regular business Operations, e.g. order-entry processing
Interaction with DB	Dynamic SQL queries	3GL programs
Types of DB Accesses	Retrievals	Retrievals and Updates
Rows accessed per interaction	Many	Few
Concurrent Users	Few	Many
Performance metrics	(1) Single-user query processing power; (2) multi-user query throughput	New Order transactions
Network activity	Low	High
Front-end components priced?	No	Yes

TPC-D is similar to TPC-C concerning the business and execution environment which is modeled:

24x7 availability
Access from multiple users
Requires support of ACID properties of transaction processing

2.4: Why does TPC-D model an environment with ad hoc queries as opposed to ones which are prewritten?

Decision support queries can come either in an ad hoc or prewritten form because the business questions they intend to solve could be posed either dynamically or repeatedly. Modeling strictly ad hoc queries presupposes end-users with no special knowledge of the database or of optimizer strengths and weaknesses in a particular vendor's DBMS. TPC-D chooses to exclusively model ad hoc queries and enforces execution rules which are based upon this premise. Some examples are the restriction on how the queries may be phrased, the ban on optimizer hints, the requirement of a constant DBMS configuration throughout the performance tests, and the requirement of dynamic SQL as opposed to static SQL.

2.5: Why does TPC-D model a system which supports concurrent updates?

A pure decision support environment need not support concurrent updates because decision support applications themselves tend to be largely read-only. That TPC-D models a system supporting concurrent updates is one manifestation of the "worldwide" database model upon which TPC-D is based. The database is viewed as one which can support both queries and updates throughout a 24-hour period from end users all over the world, with no regular sustained quiescent period during which maintenance (for example, reorganizing or reloading) could be done. The major ramifications of this are that a DBMS must support database locking as well as the ACID properties of transaction processing systems.

3. Database Design Questions

3.1: Do I need to use any particular products to implement TPC-D?

A database management system with an SQL interface must be used so that the queries can be run using the syntax provided in the specification. No particular products must be used, but the system and its components (hardware and software) must be generally available.

3.2: What does the schema look like?

The schema consists of eight tables shown in Clause 1.2 of the specification. A ninth table, the TIME table, is optional. The schema models a database for a worldwide distributor that purchases parts from suppliers and sells them to customers. The two largest tables are the master-detail pair of Order and Lineitem, which together constitute about 85% of the database. The other tables describe the business's parts, suppliers and customers. All tables except the two small Nation and Region "code" tables scale linearly with the size of the database.

3.3: Can I make any changes to the schema?

An implementor of TPC-D is not allowed to change the schema. While it is recognized that in real-life situations an implementor would change the schema to optimize system performance, the benchmark is designed to exercise certain functions of a DBMS which may be avoided if changes to the database definition were allowed.

An implementor is allowed to change the ordering of columns in a table definition, relative to the specified table layouts indicated in Clause 1.4 of the specification.

3.4: Why is the TIME table optional?

The use of the TIME table is allowed for those SQL implementations which do not support arithmetic functions on dates and time intervals.

3.5: Does TPC-D require referential integrity on primary keys?

The specification notes primary and foreign keys in its table definitions purely to clarify relationships which are implicit in the data generation rules that TPC-D requires. The decision of whether or not to employ them in an implementation of TPC-D is left to the test sponsor.

3.6: Does the database require indexes on primary and/or foreign keys?

TPC-D as a benchmark does not require indexes (Clause 1.4), but it permits them (Clause 1.5).

3.7: Are the queries designed to work without indexes?

Only a small number of the queries, notably query 1, were designed to be run without indexes. Usually, some indexing will be needed to reduce the elapsed time of queries.

3.8: Are there any rules governing the creation of secondary access structures such as B-tree indexes?

TPC-D does not dictate a particular set of indexes nor does it place limits on the number and type of indexes used in a TPC-D implementation. The only requirement is that indexes (and other auxiliary data access structures) reflect update activity, be managed transparently and maintain ACID properties (see Clause 1.5).

3.9: Are implementations with different indexing strategies comparable?

Yes. TPC-D was designed to assign some cost or penalty for the creation and maintenance of auxiliary data structures so that test sponsors are not lured into creating huge numbers of indexes. There are four ways in which an implementor "pays" for indexing:
1. Reduced performance: The more indexes or other auxiliary data structures exist, the longer the insert (UF1) and delete (UF2) portions of the benchmark will take. UF1 and UF2 both contribute to the performance metrics.

2. Increased cost: Most auxiliary data structures such as indexes require disk storage. Since all disks must be priced, adding indexes will increase the overall cost of the configuration.

3. Data storage ratio: As additional storage is required the ratio between the total amount of priced disk space and the database size will increase. Since this ratio must be reported (Clause 8.1) and appears on the executive summary it has the effect of highlighting the expansion factor of the raw data.

4. Database load time: The time required to create an index affects the database load time since the creation of indices are considered as part of the database load test.

Clearly, indexes are not free. Selecting an appropriate set of indexes or other auxiliary data structures is an important aspect of tuning a TPC-D implementation. A common starting point is the logical set of primary and foreign keys shown in Clause 1.2 and selected columns frequently referenced in predicates.

3.10: Does TPC-D allow the definition of auxiliary objects such as views or indexes that represent pre-aggregated versions of the underlying table data?

There is no explicit discussion of specialized aggregation objects. However, like any other auxiliary data structure that is derived from the base table data, aggregation objects are a form of replication that must satisfy the requirements of Clause 1.5. Namely, they must be automatically managed by the SUT; the aggregation objects must be transparent to the query and may not be explicitly referenced; updates to the base table are reflected in the aggregation objects in the same transaction; and aggregation objects retain full ACID properties.

3.11: Does TPC-D allow replication of database tables?

TPC-D allows the logical replication of database objects subject to the set of rules described above. Indexes, auxiliary aggregation tables and replicated tables are all permitted. See Clause 1.5 for additional guidance on rules governing replication of database objects.

4. Query Questions

4.1: How were the queries chosen?

Member companies of the TPC-D subcommittee were all invited to submit queries for inclusion in TPC-D. Each query submission had to consist of 3 components: 1) A business question, written in English, to prove that the query was asking a sensible business question, 2) Standard SQL syntax that implemented the business question, and 3) A short overview of what SQL functionality was exercised.

After the submission period, the proposed queries were analyzed and compared to a survey of decision support customer's queries. Next, duplicates and queries deemed too simple for a complex query workload were dropped.

The remaining queries were rephrased from SQL-89 (using lots of views) to SQL-92 (using nested table expressions and CASE expressions). The result was the 17 Functional Query Definitions presented in Clause 2.

4.2: Why ANSI SQL?

ANSI SQL is the generally accepted standard upon which most vendors base their SQL implementations. The standard specifies both syntax and semantics, so that the meaning of the language and the functionality of an SQL statement may be commonly understood. Full SQL-92 was chosen as the reference version of ANSI SQL.

4.3: What is the difference between a functional query definition and SQL?

Functional Query Definitions detail the function(s) to be performed by a query, and the parameter substitution points. They are templates, based on the formal syntax of the ANSI's Structured Query Language (SQL). A Functional Query Definition (FQD) becomes valid SQL syntax when a permissible value is substituted for each parameter in the FQD, based on the constraints detailed in the Substitution Parameters section of each query definition. The QGEN program is provided to facilitate the generation of executable SQL syntax by supplying acceptable values for the substitution parameters for each query.

4.4: Can I adapt the queries to the SQL required by my DBMS?

The query definitions as provided are written according to the SQL-92 standard, but also include substitution parameters in which particular values are to be placed. A particular test sponsor's implementation may differ from the functional query definition; since SQL is continually evolving, such differences are acceptable, subject to constraints intended to ensure comparability among implementations. An implementor must write each query in the dialect of SQL offered by his DBMS, and provide values for the substitution parameters to form an executable SQL statement.

4.5: I can't run the queries in Section 2. Now what?

Because most vendors do not completely conform to the SQL-92 standard, the specification allows some flexibility in the precise phrasing of a query implementation as long as the query's function does not change. Allowable minor query modifications are documented in the specification. If the query still cannot be run, Appendix B contains a set of variants for most of the queries which avoid some common syntactic limitations. Try running one of the approved variants, either by using the SQL as is or by applying minor query modifications as before. If you are still unsuccessful, refer to Clause 2.2 for discussion on the allowance of new query variants, and attempt to rewrite the query to perform in a functionally equivalent manner. Submit the variant to the TPC-D Maintenance Subcommittee for approval.

4.6: The query runs faster if I rewrite the select/from/where clause. Is that permissible?

No. With the exception of minor query modifications (MQMs) to either the functional query definition (FQD) or to one of the approved variants, no modifications to the SQL syntax are allowed. The intent is to focus on the strength of the data manager and its associated query optimizer, rewarding those systems which perform well without requiring the end user to possess specialized knowledge of the unique strengths and weaknesses of a given DBMS. A list of allowable MQMs can be found in Clause 2.2.

4.7: Can I use a mix of variants, modifications, and functional query definitions?

Yes. See Section 2.2.4.3. The following are acceptable methods of implementing any query:

FQD as is
FQD + modification
Variant as is
Variant + modification

4.8: How can I be sure the query is giving the right answer?

The specification for the answer to each query using the qualification database is provided. For each query, values are supplied for each substitution parameter, and the answers are provided in the text for that particular query using those particular values against the qualification database. While this does not guarantee correct answers at other scale factors, the TPC-D rule is that correctness at the qualification scale factor indicates correctness at other scale factors.

5. Update Questions

5.1: Why does TPC-D require update functions?

TPC-D models an environment where the database is accessed by users worldwide. Thus, the database must be available essentially on a 24 x 7 basis. At the same time, the database must be kept reasonably in sync with its operational data source. The environment being modeled is one in which read-only "extract" databases would be impractical; because there is no lengthy "refresh window" available in which the database could be reloaded, it must be able to be updated in order to add new data and remove obsolete data.

5.2: The update functions aren't SQL. How do I implement them?

Because they are intended to reflect updates to a decision support system (using either periodic bulk refresh or an ongoing background task), the implementor is allowed significant flexibility in this area. The update functions delete old rows from, and insert new rows into the order and lineitem tables. Because various systems may facilitate this insertion or removal of rows in different ways, the insertions or deletions within each update function may be done either one at a time or in groups, providing that the relationship between orders and lineitems is maintained. SQL may be used but is not required. For example, a 3GL program may be used.

5.3: Why are there varying degrees of flexibility in the rules regarding the implementation of the queries as opposed to the update functions?

This is again a consequence of the environment being modeled. It is assumed that ad hoc queries are submitted by less sophisticated end users, probably using a front-end SQL query tool. The query tool generates fixed SQL text, and the users are assumed not to edit or modify the text before it is sent to the server. These assumptions motivate the SQL-as-written rules for queries in the benchmark.

In contrast, very few restrictions are placed on the implementation of the two update functions. It is assumed that these are implemented by "experts" and submitted by a very small group of sophisticated users or DBAs. The implementors of the updates are very familiar with the system at hand and can be expected to implement the updates for optimal performance. This may include using a pre-compiled program with embedded SQL statements, sorting the rows to be inserted or deleted in a particular order, or splitting the work of a given update function up among multiple processes.

6. ACID Questions

6.1: What does ACID stand for?

"ACID" is an acronym for the four properties of transaction-processing systems: Atomicity, Consistency, Isolation, Durability.

Atomicity refers to the principle that the update operations done by a transaction on a database are "atomic", i.e., that either all operations are done or that none of them are. The intent of this is that no partially-completed operations should leave any effects on the data.

Consistency is the property of the application that requires any execution of transactions to take the database from one consistent state to another. For example, in TPC-D, the total value of each order must be equal to the sum of its lineitems (ignoring taxes and discounts for simplicity).

Isolation pertains to the extent to which operations done upon data by one transaction are "seen" by or protected from a different concurrently-running transaction or query. For more information, see Clause 3.4 of the specification.

Durability is the property of a system to preserve the effects of committed transactions and insure database consistency after recovery from certain types of system failures (disk, memory, operating system, for example).

6.2: Why are the ACID tests part of a decision support benchmark?

The business and execution environment that TPC-D models, defined in Clause 0, is different from a periodic reload/rebuild read-only database environment that may also be associated with decision support. The database is accessed by users worldwide, thus there is no overnight window available for lengthy database maintenance. The database is seldom reloaded and updates must be performed on an ongoing basis. It is regularly the case that queries and updates are submitted concurrently.

This environment mandates general-purpose ACID properties. Atomicity and consistency are required because the database must not be corrupted by partially completed updates, since reloading is not a viable option for repair. Isolation is needed, albeit at a reduced level, because queries may be submitted while the database is being updated and should not see uncommitted updates. Durability is essential as well, because again, reloading the database and replaying updates manually is not a practical means of dealing with media or system failures in the timely manner required.

Even though the benchmark requires the database to have specified ACID properties, its focus is on decision support query performance. Thus, the ACID tests are not timed. While it is acknowledged that enabling the ACID properties may reduce the query performance measured by the TPC-D metrics, the intent of the benchmark is to focus on peak query performance. For this reason, systems for which query performance is further reduced if updates execute concurrently are allowed to schedule the required update activity in the throughput test outside of the time during which queries are running.

6.3: Are the ACID tests run against the TPC-D schema?

Of the tables in the TPC-D schema, only the order and lineitem tables are used in the ACID tests. Additionally, a history table is defined exclusively for the TPC-D ACID transaction. The tests are run against the qualification database.

6.4: What isolation level does TPC-D require?

TPC-D has three fundamental isolation level requirements as stated in Clause 3.4 of the specification:

Between any two ACID transactions, serializability is required.
Between any ACID transaction and an "arbitrary" transaction, repeatable reads are required but phantoms are allowed.
Between any TPC-D query (1 through 17) and any ACID transaction, the reading of committed data is required (i.e., dirty writes and reads are disallowed), but non-repeatable reads and phantoms are allowed.

7. Data Population Questions

7.1: Why were the permissible scale factors chosen?

The allowable scale factors (1 GB, 10 GB, 30 GB, 100 GB, 300 GB, 1000 GB) were chosen to provide a number of head to head comparison points between implementations, while allowing appropriate targeting of system configurations and market segments. Note that, with the exception of the missing 3 GB scale factor, the numbers constitute a series where each number is approximately three times larger than its predecessor.

7.2: How big is the data set?

The precise size of a data set will depend on the particular DBGEN command line options that are used to create it. For example, if the flat files that make up a data set use fixed length fields, the resulting files will be larger than those generated using variable length data. As a rule of thumb, the data set will consist of 1.2 GB of flat files at the 1 GB scale factor, with the lineitem table being the single largest file, at approximately 850MB per SF.

7.3: How do I build a data set?

The soft appendix to the TPC-D specification includes the source code for DBGEN, a portable data generation utility. The associated README file provides a complete explanation of the proper use of DBGEN, including the creation of all permissible TPC-D data sets.

7.4: Where are the sources for DBGEN?

DBGEN sources are available by either FTP or on MS-DOS diskettes. Contact the TPC for detailed instructions. Also refer to FAQs 1.2 and 1.3 in this document.

7.5: DBGEN doesn't run on my platform. Now what?

The DBGEN sources have been successfully ported to a broad variety of platforms. If your platform is not natively supported, it should be a simple task to mimic the portability conventions used in config.h and the makefile. For further clarification, refer to Porting.Notes in the soft appendix.

7.6: How do I know my data population is correct?

The soft appendix provides answer sets to the queries run against the qualification (SF 0.1 or 100MB) database; they represent a good test for data population correctness. Answer sets for data populations larger than the 100MB qualification database are not provided, but there are some simple data integrity tests that can be used to validate larger data volumes:

Row Counts: nation: 25, region: 5, time (if used): 2557, order: 1.5 million * SF, supplier: 10 thousand * SF, customer: 150 thousand * SF, part: 200 thousand * SF, partsupplier: 800 thousand * SF, lineitem: approximately 6 million * SF.
Column cardinality: Clause 4 provides detailed cardinality definitions for each attribute in the schema.

7.7: How is the data for the update functions generated?

DBGEN is responsible for generating the input files for both update functions. For UF1 (New Sales Update Function), DBGEN produces flat files similar to those used in the initial data load. For UF2 (Old Sales Update Function), DBGEN can produce a number of formats to allow a wide variety of implementation options.

7.8: How is DBGEN used in the database load test?

The database load test can be implemented using the flat files that DBGEN produces. An implementation is also allowed to add appropriate routines to the baseline DBGEN source code to present the output of DBGEN directly to the DBMS, provided these sources are disclosed as part of the full disclosure report.

8. Metrics and Execution Rules Questions

8.1: What is the metric for the benchmark?

TPC-D has three metrics--two performance metrics and one price-performance metric. The two performance metrics are the "power" metric (QppD@Size) and the "throughput" metric (QthD@Size). The price-performance metric is "Price-per-QphD@Size".

8.2: What is the "power" metric (QppD)?

The power metric is based on a geometric mean of the 17 TPC-D queries, the insert test and the delete test. It measures the ability of the system to give a single user the best possible response time by harnessing all available resources.

The inverse of the computed geometric mean is converted to hours from seconds and scaled by the database volume (SF) to obtain QppD.

Therefore, the complete formulation is:

1 * 3600
QppD = --------------------- * SF
GM (Q1..Q17,UF1,UF2)

where GM is the geometric mean.

8.3: What is the "throughput" metric (QthD)?

The throughput metric is a classical throughput measure characterizing the ability of the system to support a multi-user workload in a balanced way. A number of query users (S) is chosen, each of which execute the full set of 17 queries in a different order. In the background there is an update stream that runs a series of insert/delete operations (one pair for each query user). The choice of the number of users is at the discretion of the test sponsor.

The throughput metric is computed as the total amount of work (S*17), converted to hours from seconds (3600 seconds per hour), scaled by the database volume (SF) and divided by the total elapsed time (Ts) required between the first query starting and the last query or update function completing.

Therefore the complete formulation is:

S * 17 * 3600
QthD = --------------- * SF
Ts

Note that if the test sponsor chooses S=1 for the throughput test, for ease of benchmarking it is permissible to omit the throughput test and compute the throughput metric using timings obtained during the power test. Similarly, it is permissible to schedule the insert/delete activity for the throughput test after all the queries complete.

8.4: What is the "query per hour" rating (QphD)?

The query per hour rating is equal to the geometric mean (i.e., square root of the product) of the power and throughput metrics. Its primary purpose is to derive a single performance rating that can be used to compute price-performance.

As a result of combining the power metric (which due to its use of a geometric mean is sensitive to short queries) and the throughput metric (which due to its implicit use of the arithmetic mean is sensitive to long running queries) QphD places attention on both the ability to make simple queries run very fast, as well as on the ability to make the very long queries run faster. The use of either metric alone would place to much emphasis on either short or long-running queries.

8.5: How is the price-performance metric calculated?

The price-performance metric is the ratio of the total system price divided by composite query-per-hour rating QphD@Size.

Consider a system with the following price and TPC-D performance characteristics:

Total System Cost: $5,000,000.00
TPC-D Power Metric: 200.00 QppD @ 100 GB
TPC-D Throughput Metric: 100.00 QthD @ 100 GB

Based on the performance metrics, the composite query per hour rating QphD is equal to square_root (200.00 * 100.00), or 141.42 QphD @ 100 GB. Therefore, the price-performance metric is equal to the total system cost, $5 million, divided by this metric, or $35,335.34/QphD @ 100 GB.

8.6: Why does TPC-D use a geometric mean for the power test?

The power metric arose from the following goals:

Since TPC-D is a representation of a generalized workload, all queries should contribute equally to the power metric;
Different implementations find different queries more "difficult" than others;
The metric should not unduly reward the "over tuning" of a single query.

Because a metric based on an arithmetic mean of query execution times would be dominated by the longest running queries, the geometric mean was chosen for the power metric. To assure that an "over-tuned" query does not skew the results, the specification imposes a three-order-of-magnitude limit on the difference between the longest and shortest queries (Clause 5.4).

Finally, it should be noted that the throughput test uses an arithmetic mean. In the simplest case of using a single query stream, the weight of any particular query in determining the throughput metric is directly proportional to its execution time (this is less clear for a multiple-stream case because there are multiple queries running concurrently). As a result, the final QphD metric is based on a combination of both a geometric (QppD) and arithmetic (QthD) mean. As a result it is in the test sponsor's interest to balance the contributions of both the power and throughput metrics.

8.7: Why are there two performance metrics?

Systems today are used for both scale-up (supporting more users, i.e., higher throughput) and speedup (making a single task faster, i.e., reduced response time) of a workload. The power metric demonstrates the speedup while the throughput metric shows the scale-up capacity of the system.

8.8: What are the rules regarding fair published comparisons of TPC-D results?

There are two items regarding comparability of TPC-D results, as documented in Clause 5.4 of the specification:

The TPC discourages comparison of TPC-D benchmark results measured against databases of different sizes (i.e., scale factors). Database performance and capabilities, as well as price-performance ratios, may not scale proportionally with changes in database sizes; therefore, such comparisons may be misleading. Any printed communication which compares results measured against databases of different sizes must carry a specific disclaimer which states the TPC's discouragement of such comparisons.
Given that the database scale factors are the same, any TPC-D result is comparable to any other TPC-D result regardless of the number of query streams used.

By policy of the TPC, published comparisons of any TPC benchmark results must include all metrics for that particular benchmark. Therefore, the comparison of TPC-D results for two or more systems must include the power metric, the throughput metric, and the price-performance metric. While one metric may be emphasized more than another, the three metrics are considered a unit, and none may be omitted.

8.9: Is this a multi-user benchmark? How many users do I have to run?

TPC-D measures a system's ability to provide answers to complex decision support queries. Whether a given system is best utilized when supporting a single user or multiple concurrent users will depend on a broad range of system design and product engineering decisions. Accordingly, TPC-D consists of two tests: the power test, which reflects a system's ability to focus all of its resources on the needs of a single user, and the throughput test, which gauges how quickly the system can answer the demands of an arbitrary number of users, determined by the test sponsor. By mandating a single user test and allowing a configurable multi-user test, TPC-D provides a way to compare systems whose parallelism allows a single user to monopolize machine resources with those system which mandate that resources must be equitably shared among multiple users.

8.10: If the number of users is left up to the sponsor, how can I compare results with differing numbers of query streams?

During the Power test each of the update functions is executed once. During the Throughput test, the update functions are isolated from the query streams into a single update stream comprised of one execution of each update function for every query stream. As a result, the ratio of update activity to query activity is maintained regardless of the number of query streams executed in the throughput test, assuring comparability.

8.11: Do all the queries have to be run? In what sequence?

Yes, all 17 queries in a query set have to complete execution during the performance tests. The execution sequence of the queries in a set is also pre-defined by the benchmark.

8.12: How are the update streams run in the benchmark? Are they optional?

The term "update stream" is pertinent only to the throughput test of the benchmark, however the "update functions" (UF1: inserts; UF2: deletes) must be run in both the power and throughput tests of the benchmark.

The power test mandates that UF1 precedes the queries, and UF2 immediately follows the last query; however in this case the update functions are not considered as being part of an "update stream". The throughput test does require a separate update stream consisting of pairs of alternating UF1/UF2 executions, the number of pairs being equal to the number of query streams the test sponsor chooses to run. The inserts done within each execution of UF1 may be done in parallel (as can the deletes in UF2), but UF1/UF2 pairs must be executed sequentially, and within each pair, UF2 must not commence until UF1 is complete.

In the throughput test, the scheduling of the updates is not specified relative to the query streams. Thus, test sponsors may execute the updates concurrently with the query streams, or before or after the query streams complete. Note that there is a possible penalty, however, for executing the updates before or after the query streams, because the throughput test timing begins when the first query or update function begins and ends when the last query or update function ends.

8.13: Does the benchmark have any rules to ensure that the reported measurements are reproducible?

TPC-D metrics reported for a given system must represent a level of performance that is repeatable and reproducible. The benchmark specifies a test condition that must be met by the reported metrics. The test sponsor must:

do two consecutive performance tests, ensure that the resulting composite query-per-hour rating QphD@Size for the two tests are within 5% of each other and pick the smaller one for reporting, or
do three consecutive performance tests, ensure that the resulting composite query-per-hour rating QphD@Size for all three tests are within 10% of each other and pick the smallest one for reporting.

9. Pricing questions

9.1: Why do you require that the number of supported terminals and users be at least equal to 10 times the number of query streams?

In some parallel architectures best performance would be obtained with only 1 or 2 query streams. If one only needed to price a system that supported one terminal and user, "pricing specials" might be used which would be unrepresentative. Most decision support systems need to support many users, not one. With this requirement we can ensure that the priced system really does support multiple users.

9.2: Why aren't the network and end-user devices such as terminals or PCs priced?

User interaction rates of decision support do not stress communication devices and adding these to the pricing model would not have real value. Also, many purchasers of decision support systems likely already have PCs/terminals and network components. Because OLTP systems tend to be implemented first, adding a decision support system would not typically entail purchasing PCs/terminals and networks.

9.3: Why is the product required to be available within 6 months of the full disclosure report?

It is impossible to prove or disprove a vendor's claim when the product is not yet available to the general public, therefore the six-month availability requirement is in the interest of fair competition. This allows the test sponsor ample time to complete its product release, but also allows competitors to challenge the result within a reasonable amount of time as documentation for the released product becomes available. Allowing a vendor just a few months longer to release the product after publication of the FDR could give the vendor several months of advertising advantage before any competitor could gather sufficient data to challenge the FDR.

9.4: Why does TPC-D require a test sponsor to disclose system pricing in such detail?

Increasingly we were seeing 'package specials' where nearly the entire configuration was covered under one price. This obscured information which customers found useful. Customers of the benchmarks wanted to understand price performance at a finer level, i.e., dollars per GB of disk, etc. While we do not prohibit package pricing, we do require that the vendor disclose finer granularity pricing so that the benchmark consumer knows what an additional processor, or disk, or tape drive costs.

10. Audit Questions

10.1: Must all benchmark results be independently audited?

Yes, as for all benchmark results sanctioned by the TPC, an independent audit is required. This audit must be conducted by an auditor who has been certified for TPC-D audits by the TPC.

10.2: How do I find a TPC certified auditor?

The list of certified auditors can be obtained from the office of the TPC Administrator. See FAQ 1.2 within this document.

10.3: What happens during an audit?

During the course of an audit, the auditor reviews the implementation of the benchmark and verifies that it is in compliance with the requirements described in the specification. The auditor then verifies that the execution of the benchmark is also in compliance. Finally, the reporting of the benchmark results in the form of a Full Disclosure Report is reviewed and certified by the auditor.

10.4: How do I find additional information to prepare for an audit?

The specification outlines the areas that an auditor should concentrate on. Additional information about the actual audit process should be obtained directly from the auditor selected by the test sponsor.

10.5: What recourse do I have if I disagree with my auditor?

A process is in place to resolve disagreements between test sponsors and auditors. This process involves the Technical Advisory Board (TAB) and is described in the TPC policies.

10.6: How far in advance should I choose an auditor?

As soon as most elements of the benchmark implementation are in place an auditor should be contacted to review those elements. This first review can be useful to correct potential problems early in the benchmarking process and avoid spending time tuning a non-compliant implementation.

10.7: Must the auditor be present when the performance tests are executed?

The TPC audit policies describe various types of audit. Among them are two types of off-site audits which leverage the results from previous on-site audits. The "Intermediate Off-Site Audit", the only off-site audit suited for the verification of performance tests, assumes that the tests are highly automated and that no change has been made to the benchmark environment since the corresponding on-site audit. As a result, an "Intermediate Off-Site Audit" is characterized by the fact that the auditor is not present during the performance tests.

11. Full Disclosure Report Questions

11.1: Is the FDR absolutely required by the time of publication or can I publish and give the FDR later?

The FDR is absolutely required and the auditor must sign it off as part of the "auditor letter". The TPC will not publish a result without a complete and audited FDR.

11.2: Is there a particular format required for publishing the FDR?

There is no requirement concerning the format of the FDR; only its content is specified. It needs to be legible and contain the required sections. If people in your company have published TPC-C benchmark results you could use this as a starting point.

11.3: Can I ask the auditor to write my FDR?

The specification places no restriction on who may write an FDR. The auditor must see it and sign it off.

11.4: Why is there both an FDR and an Executive Summary?

They serve different purposes. The Executive Summary provides a quick overview of the benchmark results and therefore is formatted in a standard way to allow easy comparisons. By contrast, the FDR must contain all the implementation details so that anybody could rerun the benchmark on the same hardware and with the same software version and find the same results.

11.5: Why do QppD and QthD appear on the executive summary, but not the composite metric, QphD?

QppD and QthD have somewhat of a "physical" meaning and are good measures associated with two phases of the test and two important aspects of a decision support system. QphD is a composite which integrates the two tests in a balanced way but should be viewed mainly for comparison purposes and that is why it is presented only as the final score after the dollars have been factored in.

11.6: Why does the number of streams in the throughput test NOT appear on the executive summary?

The number of streams is built into the QthD metric but does not appear as a separate measure because it cannot be correlated, in isolation, to a measure of goodness. This benchmark encourages the reporting of the point which makes the best use of the machine, i.e., the one which maximizes the total number of queries per unit of time.

11.7: What is included in the database load time?

Generally speaking, the database load time includes loading of the data as well as other operations which are necessary to prepare the system for the performance test. Some of these activities are the creation of indexes, gathering of database statistics, and configuration of the system under test.

The database load test is considered to have begun when the test sponsor begins table creation, and to have ended when the SUT is capable of running the performance test without any further change. The database load may be decomposed into several phases. For example, a test sponsor may generate the data for one table into a flat file and then load it into a table, then may repeat the process for another table either immediately afterward or after some delay. The database load time is the sum of the elapsed time of all phases between the end points of the database load test.

Several types of operations may need to be performed which are not technically part of loading the database or preparing the system for the performance test, and therefore may be excluded from the load time:

The installation or removal of physical resources (e.g., CPU, memory, or disk) on the SUT that are not priced;
Any operation not affecting the state of the DBMS (e.g., data generation into flat files, permutation of the flat files, OS disk partitioning or configuration);
Any modification to the state of the DBMS which is not specific to the TPC-D workload (e.g., logical tablespace creation, database block formatting);
Backup of the database at the test sponsor's discretion.

12. Miscellaneous Questions

12.1: I need answers to a question not listed here. What should I do?

First, look at the latest revision of the specification. Many questions can be answered by reading through the relevant portions of the specification. If you still have not found an answer, try contacting system and/or database vendors who are members of TPC (the specification contains the TPC membership list) and ask them. Finally, if they aren't able to help, you can forward the question directly to the TPC (see section 1.0 in the FAQ for the address and phone number for the TPC). The majority of the time, the TPC office will not answer the question but will forward it onto the members of the TPC-D subcommittee. To make this process as efficient as possible, it is best to use email if it is available.

Frequently Asked Questions (FAQ)What is TPC-D?