|
Query 1, Pricing Summary Report
TPC-D Business Question: Provides a summary pricing report for all Lineitems shipped as of a given date, a date always within 180 days of the greatest shipdate contained in the database.
SELECT L_RETURNFLAG, L_LINESTATUS,
SUM(L_QUANTITY) (NAMED SUM_QTY),
SUM(L_EXTENDEDPRICE) (DECIMAL(18,2)) (NAMED SUM_BASE_PRICE),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) (DECIMAL(18,2))
(NAMED SUM_DISC_PRICE),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX))
(DECIMAL(18,2)) (NAMED SUM_CHARGE),
AVG(L_QUANTITY) (DECIMAL(15,2)) (NAMED AVG_QTY),
AVG(L_EXTENDEDPRICE) (DECIMAL(15,2)) (NAMED AVG_PRICE),
AVG(L_DISCOUNT) (DECIMAL(15,2)) (NAMED AVG_DISC),
COUNT(*) (NAMED COUNT_ORDER)
FROM LINEITEM
WHERE L_SHIPDATE <= ('1998-12-01' (date, format 'yyyy-mm-dd')) - 90 (date)
GROUP BY
L_RETURNFLAG, L_LINESTATUS
ORDER BY
L_RETURNFLAG, L_LINESTATUS;
|
Functionality: Query 1 performs multiple aggregations and summaries by reading and processing over 95% of the rows of the database's largest table. Only this single table is scanned with a very low number of rows being returned.
Relevance
Financial--Account Status and trends: The query is a summary rollup report similar to queries used on a daily basis in many banking insitutions. Such queries read each Account record and perform some extensive summarization on different aspects of the financial data in each account. They provide the financial analysts an immediately useful composite of the current status of accounts in general and trends in the customer base.
Telecommunications--Accounts Vulnerable to Competition:
Query 1 is representative of a single table view in a multi-step approach where the data warehouse detail is joined together to create a single de-normalized detail image that is exported to a business unit database. The query scans an intermediate work table created by joining a BILLED RECURRING CHARGES table (where each row represents a customer's monthly total bill amounts) and a SERVICE ORDER PRODUCT ACTIVITY table. From such a combination of billing and service data we could determine the total billed revenue per account where the account is vulnerable to competition.
One way to estimate the vulnerability score of an account is by categorizing an account as HIGH, MEDIUM, or LOW based on the accounts current propensity to change long distance providers, which shows up in the service order activity by account (this is one example of determining product churn). In trend analysis this could be used as a 180 day moving average score of the several levels of vulnerability. This table could then be summarized and exported to a workstation tool to capture a 180 day moving average type graph.
Query 2, Minimum Cost Supplier
TPC-D Business Question: Find, in a given Region, for each Part of a certain type and size, the Supplier who can supply it at minimum cost. If several suppliers in that region offer the desired part type and size at the same (minimum) cost, the query lists the Parts from Suppliers with the 100 highest account balances.
SELECT S_ACCTBAL, S_NAME, N_NAME, P_PARTKEY, P_MFGR,
S_ADDRESS, S_PHONE, S_COMMENT
FROM PARTTBL, SUPPLIER, PARTSUPP, NATION, REGION
WHERE P_PARTKEY = PS_PARTKEY
AND S_SUPPKEY = PS_SUPPKEY
AND P_SIZE = 15
AND P_TYPE LIKE '%BRASS'
AND S_NATIONKEY = N_NATIONKEY
AND N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'EUROPE'
AND PS_SUPPLYCOST =
(SELECT MIN(PS_SUPPLYCOST)
FROM PARTSUPP, SUPPLIER, NATION, REGION
WHERE P_PARTKEY = PS_PARTKEY
AND S_SUPPKEY = PS_SUPPKEY
AND S_NATIONKEY = N_NATIONKEY
AND N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'EUROPE'
)
ORDER BY
S_ACCTBAL DESC, N_NAME, S_NAME, P_PARTKEY;
|
Functionality: Query 2 is a correlated subquery based on a 5-table join in both outer query and inner query. Close to 5% of the Supplier rows result from the selection criteria and query processing, but only the 100 Suppliers with the highest account balances are returned.
Relevance
Retail--Vendor Negotiations: This is a classic example of a retailer pitting one vendor against the other in terms of getting the best price. This could be the beginning of vendor analysis to determine if the retailer should continue to do business with certain suppliers (see also Query 15). The answer set also provides information on account balances so the retailer can possibly leverage volume order discounts as well.
Telecommunicatons--Gathering Competitive Data: For a geographic location (state, region, telephone area code) a query such as this finds the competitor who is offering the minimum price for a product. A product could be a package, such as an optional calling plan for toll calls. You could use the relation of products offered by competitors and the DEPLOYED PRODUCTS table to a BILLING ACCOUNT table relationship to find the top 100 revenue accounts who have that product currently being offered by the local telco.
Query 3, Shipping Priority
TPC-D Business Question: Retrieves the shipping priority and potential revenue of Orders having the largest revenue among those that had not been shipped as of a given date. Orders are listed in decreasing order of revenue. If more than 10 unshipped Orders exist, only the 10 Orders with the largest revenue are listed.
SELECT L_ORDERKEY,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)(DECIMAL(18,2))
(NAMED REVENUE),
O_ORDERDATE, O_SHIPPRIORITY
FROM CUSTOMER, ORDERTBL, LINEITEM
WHERE C_MKTSEGMENT = 'BUILDING'
AND C_CUSTKEY = O_CUSTKEY
AND L_ORDERKEY = O_ORDERKEY
AND O_ORDERDATE < '1995-03-15'
AND L_SHIPDATE > '1995-03-15'
GROUP BY
L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITY
ORDER BY
REVENUE DESC, O_ORDERDATE;
|
Functionality: Query 3 performs a 3-table join on three of the larger tables in the database. Anywhere from 1/5 to _ of the rows of each of the 3 tables participate in the joins, with a final aggregation that produces a very high number of rows, in the millions for most volume points. Only the 10 Orders with the highest revenue are returned.
Relevance
Telecommunications--Revenue at Risk: Using the Billed Usage, the Service Order Activity, and the Billing Account table, you could ask "What is the current toll revenue at risk where the account has service order activity that had a completion date 30 or more days greater than the request date?"
Query 4, Order Priority Checking
TPC-D Business Question: Counts the number of Orders ordered in a given quarter of a given year in which at least one Lineitem was received by the Customer later than its committed date. The query lists the count of such Orders for each order priority sorted in ascending priority order.
SELECT O_ORDERPRIORITY, COUNT(*) (NAMED ORDER_COUNT)
FROM ORDERTBL
WHERE O_ORDERDATE >= '1993-07-01'
AND O_ORDERDATE < ADD_MONTHS('1993-07-01',3)
AND EXISTS ( SELECT *
FROM LINEITEM
WHERE L_ORDERKEY = O_ORDERKEY
AND L_COMMItdATE < L_RECEIPtdATE
)
GROUP BY
O_ORDERPRIORITY
ORDER BY
O_ORDERPRIORITY;
|
Functionality: Query 4 is a correlated subquery in which about 1/28 of the Lineitem rows are selected for evaluation based on order date. An aggregation that produces a count by priority produces 5 rows in the answer set.
Relevance
Retail--tracking Slippage: The problem of back-orders or late shipments can be a severe one for retailers and mail order houses. Many such organizations also like to try and guarantee delivery dates and thus an answer set like the one provided here are invaluable to identifying and providing corrective measures to the mailing process.
NCR actually has a car manufacturer that is trying to use just this type of query to allow buyers to custom-order a car at the dealership and be given a delivery date for the vehicle BEFORE IT IS EVEN BUILT. In order to accomplish this they have checkpoints in the staging process to assure the vehicle is being built according to plan. If any step of the process is late then the managing organization is notified and corrective action is taken immediately. It is this type of query that can allow for proactive rather than reactive steps to be taken.
Telecommunications--Measuring Location Revenue. A question such as this could be used on the business side to measure location revenue, for example by wire center. Using the Call Detail Record (CDR) table data you could show the total revenue from toll calls generated through that wire center for a given period, by joining the Billed Usage table to a wire center Location table, and grouping by NPA/NXX (area code and exchange), and selecting all locations where there existed call failures due to whatever reason(s).
Relating revenue, usage and maintenance to wire center locations is important for telcos as they look to acquire or sell off certain geographic locations for regulatory, revenue, and serviceability reasons.
Query 5, Local Supplier Volume
TPC-D Business Question: Lists for each Nation in a Region the revenue volume that resulted from Lineitem transactions in which the Customer ordering parts and the Supplier filling them were both within that Nation. The query considers only Parts ordered in a given year, displaying the Nations and revenue volume in descending order by revenue.
SELECT N_NAME,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT) (DECIMAL(15,2)))
(NAMED REVENUE)
FROM CUSTOMER, ORDERTBL, LINEITEM, SUPPLIER, NATION, REGION
WHERE C_CUSTKEY = O_CUSTKEY
AND O_ORDERKEY = L_ORDERKEY
AND L_SUPPKEY = S_SUPPKEY
AND C_NATIONKEY = S_NATIONKEY
AND S_NATIONKEY = N_NATIONKEY
AND N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'ASIA'
AND O_ORDERDATE >= '1994-01-01'
AND O_ORDERDATE < ADD_MONTHS('1994-01-01',12)
GROUP BY
N_NAME
ORDER BY
REVENUE DESC;
|
Functionality: This is a 5-table join of large and small tables, where the data aggregated is reduced down to 1/5 of the Customers and Suppliers (representing one Region out of five) and 1/7 of the Lineitems (one year out of seven). The largest detail table has no direct selection applied to it. Five rows are returned, constituting the revenue for each nation in the selected region.
Relevance
Telecommunications--Measuring Local Area Revenue: Using a state or NPA Location tables with the BILLED USAGE, Call Detail Record (CDR), or a Message Level toll table, you could measure the intra-LATA revenue and toll, versus long distance revenue for calls that originate and terminate in the same LATA. This would tell you the highest revenue local toll customers that will be at risk when the long distance providers are allowed to provide local toll.
Financial--Neighborhood Banking Volume: This query would be very useful for financial institutions to evaluate the impact of customer banking transactions that take place at a branch or ATM in the same geographic area or zip code as the home branch of that customer.
Query 6, Forecasting Revenue Change
Business Question: Considers all the Lineitems shipped in a given year with discounts between DISCOUNT - 0.01 and DISCOUNT + 0.01. The query lists the amount by which the total revenue would have increased if these discounts had been eliminated for Lineitems with L_QUANTITY less than an inputted quantity.
SELECT SUM(L_EXTENDEDPRICE*L_DISCOUNT (DECIMAL(15,2)))
(NAMED REVENUE)
FROM LINEITEM
WHERE L_SHIPDATE >= '1994-01-01'
AND L_SHIPDATE < ADD_MONTHS('1994-01-01',12)
AND L_DISCOUNT BETWEEN .06 - 0.01 AND .06 + 0.01
AND L_QUANTITY < 24;
|
Functionality: This query accesses the large detail table only (Lineitem) selecting about 12% of the rows, and returning a single column answer.
Relevance
Retail--Return-On-Investment Calculation: One of the banes of the data warehouse is the calculation of Return on Investment (ROI). In order to determine this, one has to question what WOULD HAVE HAPPENED IF.... We have seen many retailers perform this type of analysis and have shown that actions taken from summarized answer sets (small, medium tables) only provide $100-$1000 worth of value while actions taken from detailed answer sets (large, very large tables) provide between $15,000 - $50,000 worth of value.
Examples of real life use of this type of query would be the infamous sock sale at one of our very large retailers. In this scenario a buyer analyzed the effect of sales on socks if a 5 cent discount was taken. The dollar results achieved from a 4-query analysis was a $2,000,000 increase in sales of sport socks for the company.
Financial--What If Analysis: This is a common query to the Financial Industry. Retail banks frequently find it necessary to discount loans, checking accounts or ATM fees, for promotional or competitive reasons. This information from this query provides a look at revenue if the discount had not been offered.
Telecommunications--Predicting Revenue Generation: Using the Call Detail Record (CDR) table for RBOC's (regional phone companies) or IXC's (inter-exchange carriers), you could measure the amount of revenue that could be generated by not offering certain optional calling plans, OCPs, based on minutes used and origination - termination calling patterns.
Query 7, Volume Shipping
TPC-D Business Question: Finds, for two given Nations, the gross discounted revenues derived from Lineitems in which parts were shipped from a Supplier in either Nation to a Customer in the other Nation during 1995 and 1996. Two nations are given as input parameters.
SELECT N1.N_NAME, N2.N_NAME,
EXtrACT(YEAR FROM L_SHIPDATE)(NAMED "YEAR"),
SUM(L_EXTENDEDPRICE * (1-L_DISCOUNT) (DECIMAL(15,2)))
(NAMED REVENUE)
FROM SUPPLIER, LINEITEM, ORDERTBL, CUSTOMER,
NATION N1, NATION N2
WHERE S_SUPPKEY = L_SUPPKEY
AND O_ORDERKEY = L_ORDERKEY
AND C_CUSTKEY = O_CUSTKEY
AND S_NATIONKEY = N1.N_NATIONKEY
AND C_NATIONKEY = N2.N_NATIONKEY
AND ((N1.N_NAME = 'FRANCE' AND N2.N_NAME = 'GERMANY')
OR (N1.N_NAME = 'GERMANY' AND N2.N_NAME = 'FRANCE'))
AND L_SHIPDATE BETWEEN '1995-01-01' AND '1996-12-31'
GROUP BY
N1.N_NAME, N2.N_NAME, "YEAR"
ORDER BY
N1.N_NAME, N2.N_NAME, "YEAR";
|
Functionality: Query 7 is a 6-table join that requires a small 25-row table, NATION, to be aliased and processed as though it were two distinct look-up tables. A date constraint selects 2/7 of the Lineitem rows, while a join to the lookup tables result in 2/5 of the Customers and Suppliers being selected. Four rows are returned.
Relevance
Telecommunications--Customer Calling Patterns: Using the CDR table and the DEPLOYED PRODUCT table a telephone company (telco) could get a better customer profile of products and calling patterns of both ends of calls (originating and terminating) and could measure revenue generated from promotions offered at either location. The search criteria could be in either the originating or terminating NPA (area code). This query would allow a telco to better package products and services with "calling plans."
Query 8, National Market Shar
TPC-D Business Question: The market share for a given Nation within a given Region is defined as the fraction of the revenue from the products of a specified type in that Region that was supplied by Suppliers from the given Nation. The query determines this for the years 1995 and 1996.
SELECT EXtrACT(YEAR FROM O_ORDERDATE)
(NAMED "YEAR"),
SUM(CASE WHEN N2.N_NAME = 'BRAZIL'
THEN L_EXTENDEDPRICE*(1-L_DISCOUNT)
ELSE 0
END) / SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT))
(DECIMAL(15,2))
(NAMED MKT_SHARE)
FROM PARTTBL, SUPPLIER, LINEITEM, ORDERTBL, CUSTOMER,
NATION N1, NATION N2, REGION
WHERE P_PARTKEY = L_PARTKEY
AND S_SUPPKEY = L_SUPPKEY
AND L_ORDERKEY = O_ORDERKEY
AND O_CUSTKEY = C_CUSTKEY
AND C_NATIONKEY = N1.N_NATIONKEY
AND N1.N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'AMERICA'
AND S_NATIONKEY = N2.N_NATIONKEY
AND O_ORDERDATE BETWEEN '1995-01-01' AND '1996-12-31'
AND P_TYPE = 'ECONOMY ANODIZED STEEL'
GROUP BY
"YEAR"
ORDER BY
"YEAR";
|
Functionality: Query 8 is an 8-table join including the NATION table twice. CASE (if/else logic) is used to determine and return a percent value based on two different years. Selection constraints qualify 1/50 of the Part rows, and 1/25 of the Nations.
Relevance
Financial--Regional Rankings. As the Financial Industry constantly goes through mergers and acquisitions, market share is a constant worry. This query is an example of how bankers can compare their results with others for the same "area." Although "area" would only have applicability to the Financial Industry if area is defined as state, county, or metropolitan region. This is an industry where most institutions have word the "FIRST" somewhere in their name. This gives them a picture of where they rank, and if they are living up to their name. It also has applicability to expanding institutions looking for vulnerable banks to acquire.
Telecommunications--Competitive Market Share Analysis. Within a particular region an RBOC (regional phone company) knows how many homes and businesses he services. Using something like a postal address file, all addresses, old and new (even under construction) in that region can be determined. A query such as this could determine market share by joining PRODUCT DEPLOYED to a TELEPHONE NUMBER POSTAL DESCRIPTION table, a PRODUCT(part) table, a RETAILER (competitor) table, and Location (could be a geographic location or telephone wire center location) table. The results would allow the telco to assess their market share in light of their competitor's for current or past years.
Query 9, Product Type Profit Measure
TPC-D Business Question: Finds, for each nation and each year, the profit for all Parts ordered in that year which contain a specified substring in their part-names and which were filled by a Supplier in that nation.
SELECT N_NAME
,EXtrACT( YEAR FROM O_ORDERDATE) (NAMED "YEAR")
,SUM((L_EXTENDEDPRICE*(1-L_DISCOUNT) -PS_SUPPLYCOST*L_QUANTITY)
(DECIMAL(15,2)))
(NAMED SUM_PROFIT)
FROM PARTTBL, SUPPLIER, LINEITEM, PARTSUPP, ORDERTBL, NATION
WHERE S_SUPPKEY = L_SUPPKEY
AND PS_SUPPKEY = L_SUPPKEY
AND PS_PARTKEY = L_PARTKEY
AND P_PARTKEY = L_PARTKEY
AND O_ORDERKEY = L_ORDERKEY
AND S_NATIONKEY = N_NATIONKEY
AND P_NAME LIKE '%green%'
GROUP BY
N_NAME, "YEAR"
ORDER BY
N_NAME, "YEAR" DESC;
|
Functionality: Query 9 is a 6-table join with selection criteria applied to only one small table, Part. A text string search is done against the PART table, resulting in 5% of the Part rows being selected. Revenue is aggregated for each combination of Year and Nation, returning 175 rows.
Relevance
Financial Industry--Identifying Profitable and Non-Profitable Customers. This is the hottest of all historical data (Data Mart or Data Warehouse) projects in the Financial Industry today. tremendous efforts and dollars are being spent to identify profitable products and customers. Of more importance to some institutions is the identification of unprofitable customers and methods of dealing with them. A recently-issued mission statement from an established bank stated that their charter was to identify the profitable customers and cultivate those relationships and to "attrit" those who are unprofitable. This query is needed to accomplish that.
Telecommunications--Measuring Switch Level Toll Revenue:Within this industry there is a desire to measure profitability at a wire center level. This is currently difficult due to legacy data and incompatibilities of RBOC (regional phone company) systems. Once a telco shows for a given wire center and its switches what services/products are available on those switches, you could measure the switch level toll revenue and product usage across that switch, based on toll traffic. The LIKE clause specification could be looking for a string within switch capabilities, such as "switch generic" (meaning what OS is running on this switch and what features does it provide).
Query 10, Returned Item Reporting
TPC-D Business Question: Finds the top 20 Customers, in terms of their effect on lost revenue for a given quarter, who have returned Parts. The query considers only Parts that were ordered in the specified quarter. Customers are listed in descending order of lost revenue.
SELECT C_CUSTKEY, C_NAME,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT) (DECIMAL(15,2)))
(NAMED REVENUE),
C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT
FROM CUSTOMER, ORDERTBL, LINEITEM, NATION
WHERE C_CUSTKEY = O_CUSTKEY
AND L_ORDERKEY = O_ORDERKEY
AND O_ORDERDATE >= '1993-10-01'
AND O_ORDERDATE < ADD_MONTHS('1993-10-01',3)
AND L_RETURNFLAG = 'R'
AND C_NATIONKEY = N_NATIONKEY
GROUP BY
C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS, C_COMMENT
ORDER BY
REVENUE DESC;
|
Functionality: Query 10 is a 4-table join of three large tables and one look-up table. Customer detail is returned by this query, alongside of only one column of summarized data. 1/28 of the Order rows qualify based on date, while 1 in 4 Lineitems match the criteria of having a return flag of 'R'. Millions of rows are returned from this query, but the final sort determines which 20 rows are to be displayed in the answer set.
Relevance
Retail Inventory--Fraud tracking. While most retailers start with sales analysis and then move onto inventory or other operational types of applications, there are a few retailers that are seriously looking at returned items. Most notably, one large retailer we work with has a data warehouse project designed specifically to support returned item analysis.
Two examples of the value of this analysis show up in the sales and returns associated with the Super Bowl and the Prom. Women have frequently bought dresses for special occasions (Prom night, weddings) and then returned the dress after the event. Sales of binoculars increase in the city of major sporting events just prior to games such as the world series or super bowl. What has also been shown is a dramatic increase in returns of such items immediately following the sporting event. Queries such as the above can help retailers identify people who are "frequent returners" sequenced by their impact on the business, and can take appropriate actions to reduce this type of fraud and abuse.
Financial--Default Credit Report. This query, by changing the column name from RETURNFLAG to DEFAULTFLAG has great applicability to finance. Instead of returning an item, customers default on loans, overdraft DDA accounts, prematurely exit from CD+s and other commitments. This query identifies the top 20 offenders, in terms of lost revenue.
Telecommunications--Identifying Sales Potential. Could be used to determine the top 20 customers (or billing accounts) likely to purchase additional phone lines based on their "Product Need Score" who have had order activity (they show up in the SERVICE ORDER PRODUCT ACTIVITY table) within a recent time period. Using a geographic location table such as a wire center switch table you could find telco locations by phone number designations (using NPA and NXX, the area code and exchange).
Query 11, Important Stock Identification
TPC-D Business Question: Finds, from scanning the available stock of Suppliers in a given Nation, all the Parts that represent a significant percentage of the total value of all available Parts. The query displays the part number and the value of those Parts in descending order of value.
SELECT PS_PARTKEY, SUM(PS_SUPPLYCOST * PS_AVAILQTY (DEC(18,2))
(NAMED VALUE)
FROM PARTSUPP, SUPPLIER, NATION
WHERE PS_SUPPKEY = S_SUPPKEY
AND S_NATIONKEY = N_NATIONKEY
AND N_NAME = 'GERMANY'
GROUP BY
PS_PARTKEY HAVING SUM(PS_SUPPLYCOST * PS_AVAILQTY) >
(SELECT SUM(PS_SUPPLYCOST * PS_AVAILQTY (FLOAT))*0.0000001000
FROM PARTSUPP, SUPPLIER, NATION
WHERE PS_SUPPKEY = S_SUPPKEY AND
S_NATIONKEY = N_NATIONKEY AND
N_NAME = 'GERMANY')
ORDER BY
VALUE DESC;
|
Functionality: Query 11 is a subquery with both the inner and outer query composed of a 3-table join of the same 3 moderate and small tables. A HAVING clause associates the two sub-queries. 1/25 of these tables' rows are selected, with several million rows being returned in the final answer set.
Relevance
Telecommunications--Product Forecasting: This is actually similar to a product forecasting query run in a regional telco system. For a region (such as city, NPA/NXX, or wire center) find the product penetration from that growth estimate. The HAVING clause would find all products that have a certain level of penetration by showing the percentage of current households divided by the total households. The outer query would list all products and a quantity (for example at the wire center level you could show available capacity of "twisted pairs" for additional lines).
Financial--Identifying Outstanding Loans: This query is equivalent to one performed in a loan office which scans all loans to find the highest dollar figure loans that represent some percent or more of all commercial loan dollars outstanding in a specific geographic region.
Query 12, Shipping Modes and Order Priority
TPC-D Business Question: Counts, by shipping mode, for Lineitems actually received by Customers in a given year, the number of Lineitems belonging to Orders for which the Receiptdate exceeds the Commitdate for 2 different specified shipping modes. Only Lineitems that were actually shipped before the Commitdate are considered. Late Lineitems are partitioned into two groups, those with priority Urgent or High, and those with a priority other than Urgent or High.
SELECT L_SHIPMODE (NAMED SHIPMODE),
SUM(CASE WHEN O_ORDERPRIORITY = '1-URGENT'
OR O_ORDERPRIORITY = '2-HIGH'
THEN 1
ELSE 0
END) (NAMED HIGH_LINE_COUNT) ,
SUM(CASE WHEN O_ORDERPRIORITY <> '1-URGENT'
AND O_ORDERPRIORITY <> '2-HIGH'
THEN 1
ELSE 0
END) (NAMED LOW_LINE_COUNT)
FROM ORDERTBL, LINEITEM
WHERE O_ORDERKEY = L_ORDERKEY
AND L_SHIPMODE IN ('MAIL' , 'SHIP')
AND L_COMMItdATE < L_RECEIPtdATE
AND L_SHIPDATE < L_COMMItdATE
AND L_RECEIPtdATE >= '1994-01-01'
AND L_RECEIPtdATE < ADD_MONTHS('1994-01-01',12)
GROUP BY
L_SHIPMODE
ORDER BY
L_SHIPMODE;
|
Functionality: Query 12 is a two-table join of the two largest tables, with 1/7 of the Lineitems being selected based on date, and all Orders participating in the join. CASE functionality is used to allow a single processing of both tables in order to count the qualifying rows for two different sets of priority criteria. Two rows are returned from the aggregation.
Relevance
Telecommunications--Service Order Activity Evaluation. Using the SERVICE ORDER PRODUCT ACTIVITY table and a table associating products/services with a calculated "need score" of those products to customers, this query could prove very useful for a telco. The CASE function could aggregate on a level of need, "HIGH - MEDIUM - LOW", where the service order completed date was within a certain range and greater than the order date. Or, in the event products are supplied by alternate sales channels, that information would be available in the service order activity while the SHIPMODE criteria could represent certain sales channels.
Financial--Late Payment Assessment. This query could represent a count of all late payments to a bank made by customers, grouped by market segment, but only for accounts in 2 of 8 distinct categories.
Query 13, Sales Clerk Performance
TPC-D Business Question: Computes the total loss of revenue on Orders placed by a given Order clerk due to Parts being returned by Customers. The query groups and orders the results by the year in which the Parts were ordered.
SELECT EXtrACT( YEAR FROM O_ORDERDATE) (NAMED "YEAR"),
SUM(L_EXTENDEDPRICE * (1-L_DISCOUNT) (DECIMAL(15,2)))
(NAMED REVENUE)
FROM LINEITEM, ORDERTBL
WHERE O_ORDERKEY = L_ORDERKEY
AND O_CLERK = 'Clerk#000000088'
AND L_RETURNFLAG = 'R'
GROUP BY
"YEAR"
ORDER BY
"YEAR";
|
Functionality: Query 13 is a short-running query, as each clerk has only 1500 Orders no matter what the volume of data in the database. The join between the two largest tables involves a comparatively small amount of data, as only the Lineitems that have been returned for those 1500 Orders are processed. As there are seven years in the database, seven or fewer rows will be returned, one per year.
Relevance
Finance--Loan Officer Performance: By changing the column names from "O_Clerk" and "L_ReturnFlag" to "O_LoanOfficer" and "L_DefaultFlag" respectively, this query has applicability to the Financial Industry and the government auditing agencies. This would identify the loan officers that have had the largest volume of defaults in their practice. In correlation with Query 10 a bank examiner could make determinations of fraud or illegal intent. At a minimum the bank executives would certainly identify candidates for additional training, or modification to bank policies.
Telecommunications--Evaluating Failed Call Incidents: This could be a call center type request, call center meaning a customer service contact with a customer, where account information was being retrieved while the service representative was handling the initial customer inquiry.
The application would be able to provide the number of successful calls for this account (BTN) as well as the failed attempts, due to busy signal, line error, or other causes. This would require the RBOC (regional phone company) measured toll CDR (call detail) table to include all attempts, not just the rated calls. This would allow the customer service rep to offer a better, guaranteed connection as a product if the business sees a need for reducing the failed calls to and from that location.
Query 14, Promotion Effect
TPC-D Business Question: Determines what percentage of the revenue in a given year and month was derived from promotional Parts. The query considers only Parts actually shipped in that month and gives the percentage.
SELECT 100.00*SUM(CASE WHEN P_TYPE LIKE 'PROMO%'
THEN L_EXTENDEDPRICE*(1-L_DISCOUNT)
ELSE 0
END (DECIMAL(15,0)))
/ SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT) (DECIMAL(15,0)))
(NAMED PROMO_REVENUE)
FROM LINEITEM, PARTTBL
WHERE L_PARTKEY = P_PARTKEY
AND L_SHIPDATE >= '1995-09-01'
AND L_SHIPDATE < ADD_MONTHS('1995-09-01',1);
|
Functionality: Query 14 calculates the numerator and the denominator of a fraction at the same time by using the CASE syntax to scan and process the data in one single sweep. A two-table join is required, with only 1/84th (one month) of the largest table qualifying for the join based on a date.
Relevance
Retail--Sales Analysis, Promotion Analysis. A universal application of a data warehouse, regardless of industry, is to track the effectiveness of promotions. All companies want to know whether or not their ads or targeted marketing efforts are helping or not.
One example in retailing is a mail order catalog company that created a special mailing for petite size clothing. After mailing over 400,000 of the catalogues to women who requested the special mailings, the company was surprised to find less than a 3% reply rate. Using detailed data it was shown that the people requesting the catalogue traditionally ordered large sizes and wanted the petite catalogues to provide incentive to diet. A new mailing list was created based on past orders and a total of 200,000 mailings were sent out with a reply rate over 20%. This relates to both reduced mailing costs as well as increased orders.
Telecommunications--Promotion Effect: Data from a PROMOTIONS, PRODUCTS/SERVICES, and LOCATION tables could be de-normalized into an Event table which tracks all contacts with all customers. In a query such as the one above, data from this Event (customer contact) table could be joined to the SERVICE ORDER PRODUCT ACTIVITY table allowing you to determine for products that have promotions in certain locations during certain periods what revenue was derived, expressed as a percentage of revenue from all products.
Query 15, Top Supplier
TPC-D Business Question: Finds the Supplier who contributed the most to the overall revenue for Parts shipped during a given quarter of a given year. In case of a tie, the query lists all Suppliers whose contribution was equal to the maximum, presented in Supplier number order.
CREATE VIEW REVENUE0 (SUPPLIER_NO, TOTAL_REVENUE) AS
SELECT L_SUPPKEY,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT))
FROM LINEITEM
WHERE L_SHIPDATE >= '1996-01-01'
AND L_SHIPDATE < ADD_MONTHS('1996-01-01',3)
GROUP BY
L_SUPPKEY;
SELECT S_SUPPKEY, S_NAME, S_ADDRESS, S_PHONE, TOTAL_REVENUE
FROM SUPPLIER, REVENUE0
WHERE S_SUPPKEY = SUPPLIER_NO
AND TOTAL_REVENUE = (SELECT MAX(TOTAL_REVENUE) FROM REVENUE0)
ORDER BY
S_SUPPKEY;
|
Functionality: Query 15 requires either a view with aggregates or a temporary table. In the view, 1/28th of the Lineitems (3 months) are selected based on date and aggregated on Supplier. Close to all the Suppliers in the database will participate as output from this first step aggregation. The query itself returns details from the Supplier row which represents the maximum revenue. One row is returned from this query.
Relevance
Retail--Vendor Negotiations, Vendor Performance. In combination with Query 2, this can be part of a vendor analysis. Here the retailer is trying to determine the suppliers that provide the greatest percent to total revenue. Taking this a step further the buyer can add cost to this query and get profitability for each vendor.
Another important aspect of these types of queries is the retailers push to allow the vendors and suppliers access to the data within the warehouse themselves. Some large retail accounts have given access to their vendors in order to streamline the process of ordering and delivery in a timely manner. This can only be accomplished through the analysis of detailed level data.
Telecommunications--Top Services and Products. De-regulation of the communication industry could allow a single customer view across currently unavailable product/service combinations. A query such as this could answer what SERVICEs/PRODUCTs, instead of SUPPLIER, that are offered across multiple lines of business. Currently these could be cellular, regular telephone as we know it - POTS (Plain Old Telephone Service), wireless, cable and all their associated products and services. The query would indicate which provided the most revenue and which other products were growing fastest.
Query 16, Parts/Supplier Relationship
TPC-D Business Question: Counts the number of Suppliers who can supply Parts that satisfy a particular customer's requirements. The Customer is interested in Parts of eight different sizes as long as they are not a given type, not of a given brand, and not from a Supplier who has had complaints registered at the Better Business Bureau. Results must be presented in descending count and ascending brand, type and size.
SELECT P_BRAND, P_TYPE, P_SIZE, COUNT(DISTINCT PS_SUPPKEY)
(NAMED SUPPLIER_CNT)
FROM PARTSUPP, PARTTBL
WHERE P_PARTKEY = PS_PARTKEY
AND P_BRAND <> 'Brand#45'
AND P_TYPE NOT LIKE 'MEDIUM POLISHED%'
AND P_SIZE IN (49, 14, 23, 45, 19, 3, 36, 9)
AND PS_SUPPKEY NOT IN
( SELECT S_SUPPKEY FROM SUPPLIER
WHERE S_COMMENT LIKE '%Better Business Bureau%Complaints%')
GROUP BY
P_BRAND, P_TYPE, P_SIZE
ORDER BY
SUPPLIER_CNT DESC, P_BRAND, P_TYPE, P_SIZE;
|
Functionality: Query 17 uses correlated subquery functionality to perform what-if analysis. A join of Lineitem and Part take place in both inner and outer queries. 1/1000th of the rows in the Part table qualify based on the selection criteria. The item quantity that represents 20% of the average item quantity is calculated in the subquery with the AVERAGE aggregation.
Relevance
Telecommunications--Cost of Dropping Products. Using a PRODUCT DEPLOYED table to a product table relationship you could simulate this same request, to determine the loss in revenue if products below a certain average revenue were no longer offered. A more interesting query, and one that accesses the very large call detail (CDR) table would be to calculate the measured toll amount that a product is associated with and what would be the impact to toll revenue of not offering that product. An example of products in this case could be an OCP, optional calling plan.
|