What’s your fraud IQ?
Trying to uncover evidence of fraud in a data set of millions of
records is somewhat akin to searching for a needle in a haystack.
Fortunately, the successful employment of data analysis techniques can
clear away most of the “hay” and leave the fraud examiner or auditor
with a much smaller stack to dig through. Do you know how to
effectively analyze data for the red flags of fraud? Are you making
the most of data analysis in your fraud detection efforts? Take this
Fraud IQ quiz and find out.
1. One of the main benefits of using data analysis techniques to
detect fraud is that:
a. They can provide insight into the details of how a fraud occurred.
b. They can be used to establish predication for a full fraud examination.
c. They are easily performed using off-the-shelf tools that enable
anyone to undertake an in-depth analysis without specific
d. They can take the place of the fraud risk assessment in
identifying key areas of fraud risk within the organization.
2. Which is the most effective order of steps in the data
a. Build a profile of potential frauds; obtain the data; verify the
data; cleanse the data; analyze the data.
b. Obtain the data; cleanse the data; analyze the data; verify the
data; build a profile of potential frauds.
c. Obtain the data; analyze the data; cleanse the data; verify the
data; build a profile of potential frauds.
d. Build a profile of potential frauds; verify the data; obtain the
data; cleanse the data; analyze the data.
3. Mitch, a CPA, is attempting to sort through a very large
amount of production data for his organization’s various manufacturing
plants to identify anomalies that might indicate fraudulent activity.
Which of the following techniques would be most helpful for Mitch in
establishing expected values for data in a population?
a. Fuzzy logic matching.
b. Gap testing.
c. Compliance verification.
d. Regression analysis.
4. The audit department of a large financial institution
received a tip that a few employees have been colluding to siphon off
customer data and sell it to an organized crime ring for use in
identity theft schemes. To help identify whether such a scheme is
occurring and who might be involved, the internal audit team decides
to employ textual analytics techniques to interemployee
communications. Which of the following combinations of keywords or
phrases likely would be the most helpful in identifying communications
between the data thieves regarding their scheme?
a. “Confidential” and “customer information.”
b. “Confidential”; “unauthorized copying”; and “nobody will notice.”
c. “Nobody will notice” and “not hurting anyone.”
d. “Unauthorized copying” and “customer information.”
5. Diana, a CPA, is the controller for Square Box Co. She receives a
call from Joshua, the accounts receivable manager of Circle Corp., one
of Square Box’s vendors, regarding numerous double payments received
from Square Box during the past six months. Joshua says he does not
understand why the invoices are being paid twice and that he would
like to get the situation straightened out to avoid having to continue
issuing refund checks to Square Box. He also says that he usually
deals with Amanda, Square Box’s accounts payable manager, but that she
has not been able to curb the situation, so he thought he would try
taking it to her boss. After hanging up the phone, Diana pulls up
Circle Corp.’s accounts payable history to try to figure out what’s
going on, but she sees no sign of duplicate payments or refunds.
Growing concerned, she decides to run some data analytics tests on
payments to vendors to see if she can find any other anomalies. Which
of the following fields would be LEAST helpful in searching for clues
regarding duplicate payments to vendors in Square Box’s accounting system?
a. Vendor address.
b. Vendor number.
c. Invoice number.
d. Payment amount.
6. For which of the following data sets would a Benford’s Law
analysis be LEAST appropriate?
a. Employee hourly wage rates.
b. Customer balances.
c. Expense reimbursement claims.
d. Inventory prices.
7. Analyzing data using Robert Gunning’s Fog Index is most
useful in uncovering which of the following fraud schemes?
a. Kickbacks paid to overseas vendors.
b. Financial statement manipulation.
c. Theft of proprietary information.
d. Skimming of incoming cash receipts.
8. Link analysis and geospatial analysis can be particularly
useful in uncovering red flags of which of the following types of
c. False billing.
d. Financial statement manipulation.
9. The audit team for Shady Business Inc. is applying data
analytics techniques to help identify areas where fraud might be
occurring. In which of the following situations would examining the
ratio of maximum values to minimum values within a data set be most useful?
a. Amount of raw materials on hand by part number.
b. Net payroll check by employee.
c. Unit prices paid for a product by purchase transaction.
d. Total quantity of products purchased by customer.
10. The results of the fraud risk assessment for XYZ Corp.
indicate that the risk of fraud in the company’s purchasing function
is high due to frequent turnover in the department and several other
control weaknesses. In response, XYZ’s audit team decides to employ
data analysis tests on the purchasing system to identify internal
control breaches and anomalies that might indicate fraud. The audit
team begins by extracting payments to vendors that lack required
information in the vendor master file; this test returns thousands of
transactions as exceptions. Which of the following procedures would be
LEAST helpful in reducing the number of false positives included in
the analysis results?
a. Combining multiple data analysis tests and weighting the results
based on number of tests for which each record returns an exception.
b. Supplementing the results with consideration of known behavioral
red flags, such as financial difficulties or unusually close
associations with vendors, displayed by particular employees in the
c. Combining the data in the system with data from outside sources,
such as industry codes of vendors.
d. Filtering the results to include only those transactions recorded
during normal business hours by employees with access to the
1. (b) Data analysis techniques provide a means to
explore specific areas for evidence of potential fraud without
undertaking formal investigation procedures. As such, these techniques
are an effective way to help establish—or disprove—predication for a
fraud examination. (Predication is the totality of
circumstances that would lead a reasonable, prudent, and
professionally trained person to believe a fraud has occurred, is
occurring, or will occur.)
For example, a hotline tip is received from an employee claiming
that another employee, with whom the caller has a history of personal
conflict, is embezzling company funds. Provided the company has
granted the auditors (or whoever is in charge of responding to the
tip) access to its financial records, data analysis procedures
generally can be used to explore the financial data relative to the
reported embezzlement without alerting anyone involved. If, as a
result of the analysis, anomalies are detected and predication is
confirmed, more formal investigation procedures, such as interviews,
However, undertaking data analysis requires prudent consideration of
many issues involved. Those involved in the process must have a
thorough understanding of the data itself and the software involved in
housing and analyzing it. This often requires working closely with
information technology experts to ensure that the data is acquired and
analyzed in a sound manner. Additionally, data analysis procedures
should be closely tied to the results of the organization’s fraud risk
assessment to ensure that the approach is efficient and based on the
organization’s true risks and operations. Even with this foundational
knowledge, however, those performing data analysis engagements must
know that the anomalies identified do not, in themselves, indicate
fraudulent activity. Instead, they illuminate outliers in the data
that might—or might not—be the result of fraud but that should be
followed up with additional procedures to determine their legitimacy.
2. (a) Although the core of data analysis involves
running targeted tests on data to identify anomalies, the ability of
such tests to help detect fraud depends greatly on what the fraud
examiner does before and after actually performing the data analysis
techniques. Consequently, to ensure the most accurate and meaningful
results, examiners should employ a formal data analysis process that
begins several steps before the tests are run and concludes with
active and ongoing review of the data. While the specific process will
vary based on the realities and needs of the organization, the
following approach contains steps that should be considered and
implemented, to the appropriate extent, in searching for anomalies
that might indicate fraud:
A. Planning phase:
i. Understand the data and the data environment.
iii. Build a profile of potential frauds.
iv. Determine whether predication exists.
B. Preparation phase:
i. Identify the relevant data.
ii. Obtain the data.
Verify the data.
iv. Cleanse and transform the data.
C. Testing and interpretation phase:
i. Analyze the data.
D. Post-analysis phase:
i. Respond to the analysis findings.
ii. Monitor the data.
3. (d) Regression analysis, also called correlation
analysis, is a statistical technique that uses a series of records to
create a model relationship between a dependent variable and one or
more independent variables. For example, regression analysis could be
used to model and predict the number of widgets manufactured based on
amounts of materials and labor used, maintenance costs, utilities, and
other related factors. Mitch then could use the resulting model to
identify anomalies in the data. A period or facility in which reported
production output is significantly lower or higher than predicted
based on this model would merit further examination.
4. (c) As with other forms of data analysis, the
objective of using textual analysis on nonstructured data, such as
emails and other text, is to narrow down an enormous data population
to a smaller group that meets the specified criteria and can be
examined further for signs of fraud. Consequently, in determining
keywords to use, the audit team must avoid words that would result in
a huge number of false positives. For example, the words
“confidential” and “unauthorized copying” appear in many individuals’
email signatures as part of a standard warning about unintended
recipients. Consequently, including such terms in a search likely
would result in thousands of emails—or more—that are unrelated to any
fraud. Similarly, the term “customer information” likely also will
appear in a vast number of legitimate communications. Instead,
focusing on words that might indicate the mindset or motives of a
scheme—terms such as “nobody will notice” or “not hurting anyone” and
variations of those phrases—will yield fewer and more meaningful
search results. Identifying patterns within those communications that
contain such phrases, such as employees with an unusually high
occurrence of the keywords or particular dates and times when the use
of the words or phrases appears to spike, can help the audit team
direct its subsequent examination activities.
5. (b) Diana did not see any signs of duplicate
payments on Circle Corp.’s vendor account in the company’s system,
indicating that the duplicate payments likely were made under another
vendor’s account. However, the payments were made for the same
amounts, received at the same address, and noted by Circle Corp. as
being applied to the same invoice as the legitimate payments on Circle
Corp.’s account. Consequently, searching for duplicates in each of
these fields, as well as the vendor name field, should provide some
information on how these payments were recorded in Square Box’s
accounting system. To help address the risk of slight variations of
duplicate information, fuzzy logic searching (which can help identify
records with similar or potentially duplicate—though not
identical—values, such as First Street, First St., and 1st St.) should
be used. Conversely, Square Box’s system, like most accounting
systems, does not allow for duplicates in the primary key
fields—which, for vendor records, is the vendor number. So running a
duplicate check on this field is unlikely to yield useful information.
6. (a) Benford’s Law states that within many large
data sets, such as corporate sales statistics or U.S. city
populations, the distribution of digits follows an unequal but
consistent pattern. For example, the first digit of a multidigit
number is 1 approximately 30% of the time—far more than the expected
frequency of one out of nine. The likelihood decreases for each digit
from 2 to 9, which is the first digit only 4.6% of the time.
Predictable patterns also occur in the second and third digits of
multidigit numbers. Applying Benford’s Law to a data set can help
identify numbers that have been manipulated as part of a fraud scheme,
as most fraudsters’ concealment efforts will result in the data not
conforming to the law’s expected digit distribution. However,
Benford’s Law is applicable only to randomly generated numeric data.
Consequently, such an analysis will not yield meaningful results if
used on data with preassigned digits (such as invoice numbers) or data
confined to a predetermined range (such as hourly wage rates).
7. (b) The notes to a company’s financial
statements are notoriously difficult to decipher, particularly for
readers without a financial or accounting background or education.
Consequently, the notes can be an excellent candidate for fraudulent
manipulation by management. One tool for assessing the readability of
the notes to financial statements is the Fog Index developed by Robert
Gunning. The Fog Index uses an algorithm to measure the readability of
a sample of English writing; the score that results from the
calculation represents the number of years of formal education needed
to understand the text upon an initial reading. Because notes to
financial statements are inherently complex, it is not surprising that
many receive a Fog Index score well beyond what would be considered
easily readable by almost anyone—including their intended audience.
(Test the Fog Index at gunning-fog-index.com.)
Therefore, a high Fog Index alone is not necessarily an indicator of
fraudulent activity. The real value in applying the Fog Index to
financial statement fraud detection lies in using the index to make
comparisons between particular notes within the same period, to
similar notes in other periods, or to the notes of other organizations
in the same industry. Any significant changes or deviations in a Fog
Index score that are highlighted by these types of comparisons could
indicate fraudulent activity and warrant a closer look.
8. (a) Link analysis and geospatial analysis can
both be particularly useful in uncovering corruption schemes, such as
bribery and conflicts of interest—schemes that often involve off-book
aspects that typically make them among the most difficult frauds to
detect. Link analysis provides visual representations (such as charts
with lines showing connections) of data from multiple sources to
discover communications, locations, patterns, trends, associations,
relationships, and hidden networks. For example, link analysis can be
used to demonstrate complex networks of parties and uncover indirect
relationships, including those connected through several
intermediaries. Similarly, geospatial analysis provides a visual model
of the geographical locations of transactions, assets, customers,
vendors, or other data. Using such an analysis to examine cash
disbursements in certain regions where bribes are prominent can
provide insight into potential corruption schemes. (Transparency
International’s Corruption Perception Index is a good source for
determining such regions.)
9. (c) Calculating the ratio of maximum values to
minimum values can provide quick, high-level visibility into a data
set for which a small range of values would be expected. Specifically,
a maximum-to-minimum ratio close to 1 indicates that there is not much
variance between the highest and lowest number in a data set. Such a
calculation would be useful in examining unit prices for product
purchases, since large ratios would highlight large variations in the
price paid for the product—and possibly instances of being
overcharged, which might be the result of a kickback scheme involving
a purchasing employee.
10. (d) Limiting the number of false positive
results is one of the biggest challenges in effectively designing and
using data analysis techniques to detect fraud. A test that provides
thousands of exceptions can be useful in identifying control
weaknesses or unenforced policies, but is less helpful in detecting
specific transactions that are part of a fraud scheme. To reduce the
need to sift through a huge number of transactions, the data analysis
team can combine several analysis techniques—e.g., payments to vendors
with incomplete profiles, payments just below approval thresholds, and
payments made unusually soon after the invoice date—and weight the
results by the number of exceptions each record shows. In this
situation, a transaction showing as an exception for all three tests
would merit closer scrutiny than a transaction that appeared in the
results of just one of these tests. Similarly, examining a transaction
through the lens of its circumstances—particularly, who is involved
and what nondata red flags might be present—can also be helpful. If
the data analysis team can identify employees with known fraud risk
factors (such as purchasing employees who have unusually close
relationships with vendors or who are known to live beyond their
means), giving close attention to their role in any transactions that
come up during the data analysis process can help focus the team’s
efforts on areas with increased risk. Further, combining the data in
the purchasing system with external sources of information—such as
vendor industry codes to identify payments outside legitimate purchase
categories or maps to identify vendors with addresses in residential
areas—can also be helpful.
In contrast, filtering the transactions to include only those
recorded by expected employees during the expected times would be
counterproductive to reducing false positives. On the contrary,
running tests specifically to identify transactions recorded during
nonbusiness hours (e.g., nights, weekends, and holidays) or recorded
by employees who should not be involved in such transactions can help
illuminate internal control breaches and potentially fraudulent transactions.
If you answered nine or 10 questions correctly, congratulations.
Your solid knowledge of data analysis will assist you in detecting
fraud for your clients or employer.
If you answered seven or eight questions correctly, you’re on the
right track. Continue to build your knowledge of data analytics to
help uncover the red flags of fraud in the data.
If you answered fewer than seven questions correctly, consider
strengthening your understanding of data analysis and fraud detection
concepts to help ensure that you have what it takes to stay one step
ahead of fraud perpetrators.
Andi McNeal (
) is director of research for the Association of Certified Fraud Examiners.
To comment on this article or to suggest an idea for another
article, contact Jeff Drew, senior editor, at
Research & References of What’s your fraud IQ?|A&C Accounting And Tax Services