Tuesday, January 28, 2014

Solving the Puzzle

Expanding the answer choices
The body of a logic puzzle question contains a
(unique) wh-term (typically “which of the following”),
a modality (such as “must be true” or
“could be true”), and (possibly) an added condition.
Each answer choice is expanded by substituting
its SL form for the wh-term in the question
body. For example, the expansion for answer
choice (A) of question 1 in Figure 1 would
be the SL form corresponding to: “If sculpture
D is exhibited . . . , then [Sculpture C is exhibited
in room 1 ] must be true”.
Translating SL to FOL
To translate an SL representation to pure FOL,
we eliminate event variables by replacing an SL
form 9e.P(e)^R1(e, t1)^..^Rn(e, tn) with the
FOL form P(t1, .., tn). An ordering is imposed
on role names to guarantee that arguments are
always used in the same order in relations. Numeric
quantifiers are encoded in FOL in the obvious
way, e.g., Q(¸2, x, ', Ã) is translated to
9x19x2. x1 6= x2 ^('^Ã)[x1/x]^('^Ã)[x2/x].
Each expanded answer choice contains one
modal operator. Modals are moved outward
of negation as usual, and outward of conditionals
by changing A ! ¤B to ¤(A ! B) and
A ! §B to §(A^B). A modal operator in the
outermost scope can then be interpreted as a
directive to the reasoning module to test either
entailment (¤) or consistency (§) between the
preamble and the expanded answer choice.
Using FOL reasoners
There are two reasons for using both theorem
provers and model builders. First, they
are complementary reasoners: while a theorem
prover is designed to demonstrate the inconsistency
of a set of FOL formulas, and so can
find the correct answer to “must be true” questions
through proof by contradiction, a model
builder is designed to find a satisfying model,
and is thus suited to finding the correct answer
to “could be true” questions.7 Second, a
reasoner may take a very long time to halt on
some queries, but the complementary reasoner
may still be used to answer the query in the
context of a multiple-choice question through
a process of elimination. Thus, if the model
builder is able to show that the negations of four
choices are consistent with the preamble (indicating
they are not entailed), then it can be
concluded that the remaining choice is entailed
by the preamble, even if the theorem prover has
not yet found a proof.
We use the Otter 3.3 theorem prover and
the MACE 2.2 model builder (McCune, 1998).8
The reasoning module forks parallel subprocesses,
two per answer choice (one for Otter,
one for MACE). If a reasoner succeeds for an answer
choice, the choice is marked as correct or
incorrect, and the dual sub-process is killed. If
all answer-choices but one are marked incorrect,
the remaining choice is marked correct even if
its sub-processes did not yet terminate.

Filling Information Gaps


To find a unique answer to every question of a
puzzle, background information is required beyond
the literal meaning of the text. In Question
1 of Figure 1, for example, without the constraint
that a sculpture may not be exhibited in
multiple rooms, answers B, D and E are all correct.
Human readers deduce this implicit constraint
from their knowledge that sculptures are
physical objects, rooms are locations, and physical
objects can have only one location at any
given time. In principle, such information could
be derived from ontologies. Existing ontologies,
however, have limited coverage, so we also plan
to leverage information about expected puzzle
structures.
Most puzzles we collected are formalizable
as constraints on possible tuples of objects.
The crucial information includes: (a)
the object classes; (b) the constants naming
the objects; and (c) the relations used to
link objects, together with their arguments’
classes. For the sculptures puzzle, this information
is: (a) the classes are sculpture and
room; (b) the constants are C,D,E, F, G,H for
sculpture and 1, 2, 3 for room; (c) the relation
is exhibit(sculpture, room). This information is
obtainable from the parse trees and SL formulas.
Within this framework, implicit world knowledge
can often be recast as mathematical properties
of relations. The unique location constraint
on sculptures, for example, is equivalent
to constraining the mapping from sculptures to
rooms to be injective (one-to-one); other cases
exist of constraining mappings to be surjective
(onto) and/or total. Such properties can be obtained
from various sources, including cardinality
of object classes, pure lexical semantics, and
even through a systematic search for sets of implicit
constraints that, in combination with the
explicitly stated constraints, yield exactly one
answer per question. Figure 3 shows the num
Figure 3: Effect of explicit and implicit constraints
on constraining the number of possible
models
ber of possible models for the sculptures puzzle
as affected by explicit and implicit constraints
in the preamble.

Reference Resolution


SL is not convenient for representing directly
the meaning of referring expressions because (as
in FOL) the extent of a quantifier in a formula
cannot be extended easily to span variables in
subsequent formulas. We therefore use Discourse
Logic (DL), which is SL extended with
DRSes and ®-expressions as in (Blackburn and
Bos, 2000) (which is based on Discourse Representation
Theory (Kamp and Reyle, 1993) and
its recent extensions for dealing with presuppositions).
6 This approach (like other dynamic semantics
approaches) supports the introduction
of entities that can later be referred back to,
and explains when indefinite NPs should be in-
terpreted as existential or universal quantifiers
(such as in the antecedent of conditionals). The
reference resolution framework from (Blackburn
and Bos, 2000) provides a basis for finding all
possible resolutions, but does not specify which
one to choose. We are working on a probabilistic
reference-resolution module, which will pick
from the legal resolutions the most probable one
based on features such as: distance, gender, syntactic
place and constraints, etc.
5E.g. there is a strong preference for ‘each’ to take
wide scope, a moderate preference for the first quantifier
in a sentence to take wide scope, and a weak preference
for a quantifier of the grammatical subject to take wide
scope.
6Thus, the URs calculated from parse trees are actually
URs of DL formulas. The scope resolution phase
resolves the URs to explicit DL formulas, and the reference
resolution phase converts these formulas to SL
formulas.

Scope Resolution


One way of dealing with scope ambiguities is by
using underspecified representations (URs). A
UR is a meta-language construct, describing a
set of object-language formulas.3 It describes
the pieces shared by these formulas, but possibly
underspecifies how they combine with each
other. A UR can then be resolved to the specific
readings it implicitly describes.
We use an extension of Hole Semantics
(Blackburn and Bos, 2000)4 for expressing URs
and calculating them from parse trees (modulo
the modifications in §5). There are several advantages
to this approach. First, it supports
the calculation of just one UR per sentence in
a combinatorial process that visits each node of
the parse tree once. This contrasts with approaches
such as Categorial Grammars (Carpenter,
1998), which produce explicitly all the
scopings by using type raising rules for different
combinations of scope, and require scanning the
entire parse tree once per scoping.
Second, the framework supports the expression
of scoping constraints between different
parts of the final formula. Thus it is possible
to express hierarchical relations that must exist
between certain quantifiers, avoiding the problems
of naive approaches such as Cooper storage
(Cooper, 1983). The expression of scoping
constraints is not limited to quantifiers and is
applicable to all other operators as well. Moreover,
it is possible to express scope islands by
constraining all the parts of a subformula to be
outscoped by a particular node.
Another advantage is that URs support efficient
elimination of logically-equivalent readings.
Enumerating all scopings and using
a theorem-prover to determine logical equivalences
requires O(n2) comparisons for n scopings.
Instead, filtering methods (Chaves, 2003)
can add tests to the UR-resolution process,
disallowing certain combinations of operators.
Thus, only one ordering of identical quantifiers
is allowed, so “A man saw a woman” yields
only one of its two equivalent scopings. We also
filter 8¤ and 9§ combinations, allowing only
the equivalent ¤8 and §9. However, numeric
quantifiers are not filtered (the two scopings of
“Three boys saw three films” are not equivalent).
Such filtering can result in substantial
speed-ups for sentences with a few quantifiers
(see (Chaves, 2003) for some numbers).
Finally, our true goal is determining the correct
relative scoping in context rather than enumerating
all possibilities. We are developing
a probabilistic scope resolution module that
learns from hand-labeled training examples to
predict the most probable scoping, using features
such as the quantifiers’ categories and
their positions and grammatical roles in the sentence.

Combinatorial Semantics


Work in NLP has shifted from hand-built grammars
that need to cover explicitly every sentence
structure and that break down on unexpected
inputs to more robust statistical parsing.
However, grammars that involve precise semantics
are still largely hand-built (e.g. (Carpenter,
1998; Copestake and Flickinger, 2000)). We aim
at extending the robustness trend to the semantics.
We start with the compositional semantics
framework of (Blackburn and Bos, 2000; Bos,
2001) and modify it to achieve greater robustness
and coverage.2
One difference is that our lexicon is kept
very small and includes only a few words with
special semantic entries (like pronouns, connectives,
and numbers). Open-category words
come with their part-of-speech information in
the parse trees (e.g. (NN dog)), so their semantics
can be obtained using generic semantic templates
(but cf. §3.5).
In classic rule-to-rule systems of semantics
like (Blackburn and Bos, 2000), each syntactic
rule has a separate semantic combination rule,
and so the system completely fails on unseen
syntactic structures. The main distinguishing
goal of our approach is to develop a more robust
process that does not need to explicitly specify
how to cover every bit of every sentence. The
system incorporates a few initial ideas in this
direction.
First, role and argument-structure information
for verbs is expensive to obtain and unreliable
anyway in natural texts. So to deal with
verbs and VPs robustly, their semantics in our
system exports only an event variable rather
than variables for the subject, the direct object,
etc. VP modifiers (such as PPs and ADVPs)
combine to the VP by being applied on the exported
event variable. NP modifiers (including
the sentence subject) are combined to the event
variable through generic roles: subj, np1, np2,
etc. The resulting generic representations are
suitable in the puzzles domain because usually
only the relation between objects is important
and not their particular roles in the relation.
This is true for other tasks as well, including
some broad-coverage question answering.
All NPs are analyzed as generalized quantifiers,
but a robust compositional analysis for
the internal semantics of NPs remains a serious
challenge. For example, the NP “three rooms”
should be analyzed as Q(num(3), x, room(x), ..),
but the word “three” by itself does not contribute
the quantifier – compare with “at least
three rooms” Q(¸3, x, room(x), ..). Yet another
case is “the three rooms” (which presupposes
2Our system uses a reimplementation in Lisp rather
than their Prolog code.
a group g such that g µ room ^ |g| = 3). The
system currently handles a number of NP structures
by scanning the NP left-to-right to identify
important elements. This may make it easier
than a strictly compositional analysis to extend
the coverage to additional cases.
All other cases are handled by a flexible combination
process. In case of a single child, its
semantics is copied to its parent. With more
children, all combinations of applying the semantics
of one child to its siblings are tried,
until an application does not raise a type error
(variables are typed to support type checking).
This makes it easier to extend the coverage
to new grammatical constructs, because usually
only the lexical entry needs to be specified, and
the combination process takes care to apply it
correctly in the parse tree.

Morpho-Syntactic Analysis


While traditional hand-built grammars often include
a rich semantics, we have found their
coverage inadequate for the logic puzzles task.
For example, the English Resource Grammar
(Copestake and Flickinger, 2000) fails to parse
any of the sentences in Figure 1 for lack of coverage
of some words and of several different syntactic
structures; and parsable simplified versions
of the text produce dozens of unranked
parse trees. For this reason, we use a broadcoverage
statistical parser (Klein and Manning,
2003) trained on the Penn Treebank. In addition
to robustness, treebank-trained statistical
parsers have the benefit of extensive research
on accurate ambiguity resolution. Qualitatively,
we have found that the output of the parser on
logic puzzles is quite good (see §10). After parsing,
each word in the resulting parse trees is
converted to base form by a stemmer.
A few tree-transformation rules are applied
on the parse trees to make them more convenient
for combinatorial semantics. Most of them
are general, e.g. imposing a binary branching
structure on verb phrases, and grouping expressions
like “more than”. A few of them correct
some parsing errors, such as nouns marked as
names and vice-versa. There is growing awareness
in the probabilistic parsing literature that
mismatches between training and test set genre
can degrade parse accuracy, and that small
amounts of correct-genre data can be more important
than large amounts of wrong-genre data
(Gildea, 2001); we have found corroborating evidence
in misparsings of noun phrases common
in puzzle texts, such as “Sculptures C and E”,
which do not appear in the Wall Street Journal
corpus. Depending on the severity of this problem,
we may hand-annotate a small amount of
puzzle texts to include in parser training data.

Challenges

Combinatorial Semantics
The challenge of combinatorial semantics is to
be able to assign exactly one semantic representation
to each word and sub-phrase regardless
of its surrounding context, and to combine
these representations in a systematic way until
the representation for the entire sentence is obtained.
There are many linguistic constructions
in the puzzles whose compositional analysis is
difficult, such as a large variety of noun-phrase
structures (e.g., “Every sculpture must be exhibited
in a different room”) and ellipses (e.g.,
“Brian saw a taller man than Carl [did]”).
Scope Ambiguities
A sentence has a scope ambiguity when quantifiers
and other operators in the sentence can
have more than one relative scope. E.g., in constraint
(4) of Figure 1, “each room” outscopes
“at least one sculpture”, but in other contexts,
the reverse scoping is possible. The challenge
is to find, out of all the possible scopings, the
appropriate one, to understand the text as the
writer intended.
Reference Resolution
The puzzle texts contain a wide variety of
anaphoric expressions, including pronouns, definite
descriptions, and anaphoric adjectives. The
challenge is to identify the possible antecedents
that these expressions refer to, and to select
the correct ones. The problem is complicated
by the fact that anaphoric expressions interact
with quantifiers and may not refer to any particular
context element. E.g., the anaphoric expressions
in “Sculptures C and E are exhibited
in the same room” and in “Each man saw a different
woman” interact with sets ({C,E} and
the set of all men, respectively).
Plurality Disambiguation
Sentences that include plural entities are potentially
ambiguous between different readings:
distributive, collective, cumulative, and combinations
of these. For example, sentence 1 in
Figure 1 says (among other things) that each
of the six sculptures is displayed in one of the
three rooms – the group of sculptures and the
group of rooms behave differently here. Plurality
is a thorny topic which interacts in complex
ways with other semantic issues, including
quantification and reference.
Lexical Semantics
The meaning of open-category words is often
irrelevant to solving a puzzle. For example,
the meaning of “exhibited”, “sculpture”, and
“room” can be ignored because it is enough to
understand that the first is a binary relation
that holds between elements of groups described
by the second and third words.1 This observa-
tion provides the potential for a general system
that solves logic puzzles.
Of course, in many cases, the particular
meaning of open-category words and other expressions
is crucial to the solution. An example
is provided in question 2 of Figure 1: the system
has to understand what “a complete list”
means. Therefore, to finalize the meaning computed
for a sentence, such expressions should be
expanded to their explicit meaning. Although
there are many such cases and their analysis is
difficult, we anticipate that it will be possible to
develop a relatively compact library of critical
puzzle text expressions. We may also be able
to use existing resources such as WordNet and
FrameNet.
Information Gaps
Natural language texts invariably assume some
knowledge implicitly. E.g., Figure 1 does not explicitly
specify that a sculpture may not be exhibited
in more than one room at the same time.
Humans know this implicit information, but a
computer reasoning from texts must be given
it explicitly. Filling these information gaps is
a serious challenge; representation and acquisition
of the necessary background knowledge are
very hard AI problems. Fortunately, the puzzles
domain allows us to tackle this issue, as
explained in §8.
Presuppositions and Implicatures
In addition to its semantic meaning, a natural
language text conveys two other kinds of content.
Presuppositions are pieces of information assumed
in a sentence. Anaphoric expressions
bear presuppositions about the existence of entities
in the context; the answer choice “Sculptures
C and E” conveys the meaning {C,E},
but has the presupposition sculpture(C) ^
sculpture(E); and a question of the form A !
B, such as question 1 in Figure 1, presupposes
that A is consistent with the preamble.
Implicatures are pieces of information suggested
by the very fact of saying, or not saying,
something. Two maxims of (Grice, 1989)
dictate that each sentence should be both consistent
and informative (i.e. not entailed) with
respect to its predecessors. Another maxim dictates
saying as much as required, and hence the
sentence “No more than three sculptures may be
exhibited in any room” carries the implicature
that in some possible solution, three sculptures
are indeed exhibited in the same room.
Systematic calculation of presuppositions and
implicatures has been given less attention in
NLP and is less understood than the calculation
of meaning. Yet computing and verifying
them can provide valuable hints to the system
whether it understood the meaning of the text
correctly.

System Overview


This section explains the languages we use to
represent the content of a puzzle. Computing
the representations from a text is a complex process
with several stages, as shown in Figure 2.
Most of the stages are independent of the puzzles
domain. Section 3 reviews the main challenges
in this process, and later sections outline
the various processing stages. More details of
some of these stages can be found at (Stanford
NLP Group, 2004).
First-Order Logic (FOL)
An obvious way of solving logic puzzles is to
use off-the-shelf FOL reasoners, such as theorem
provers and model builders. Although most
GRE logic puzzles can also be cast as constraintsatisfaction
problems (CSPs), FOL representations
are more general and more broadly applicable
to other domains, and they are closer
to the natural language semantics. GRE logic
puzzles have finite small domains, so it is practicable
to use FOL reasoners.
The ultimate representation of the content of
a puzzle is therefore written in FOL. For example,
the representation for the first part of
constraint (4) in Figure 1 is: 8x.room(x) !
9y.sculpture(y)^exhibit(y, x). (The treatment
of the modal ‘must’ is explained in §9.2).
Semantic Logic (SL)
Representing the meaning of natural language
texts in FOL is not straightforward because
human languages employ events, plural entities,
modal operations, and complex numeric
expressions. We therefore use an intermediate
representation, written in Semantic Logic
(SL), which is intended to be a general-purpose
semantic representation language. SL extends
FOL with event and group variables, the modal
operators ¤ (necessarily) and § (possibly), and
Generalized Quantifiers (Barwise and Cooper,

1981) Q(type, var, restrictor, body), where type
can be 8, 9, at-least(n), etc. To continue the example,
the intermediate representation for the
constraint is:
¤Q(8, x1, room(x1),Q(¸1, x2, sculpture(x2),
9e.exhibit(e) ^ subj(e, x2) ^ in(e, x1)))
Non-determinism
Although logic puzzles are carefully designed
to reduce ambiguities to ensure that there
is exactly one correct answer per question,
there are still many ambiguities in the analysis,
such as multiple possibilities for syntactic
structures, pronominal reference, and quantifier
scope. Each module ranks possible output representations;
in the event that a later stage reveals
an earlier choice to be wrong (it may be
inconsistent with the rest of the puzzle, or lead
to a non-unique correct answer to a question),
the system backtracks and chooses the next-best
output representation for the earlier stage.

Why Logic Puzzles?


Logic puzzles have a number of attractive characteristics
as a target domain for research placing
a premium on precise inference.
First, whereas for humans the language understanding
part of logic puzzles is trivial but
the reasoning is difficult, for computers it is
clearly the reverse. It is straightforward for a
computer to solve a formalized puzzle, so the
research effort is on the NLP parts rather than
a difficult back-end AI problem. Moreover, only
a small core of world knowledge (prominently,
temporal and spatial entailments) is typically
crucial to solving the task.
Second, the texts employ everyday language:
there are no domain-restrictions on syntactic
and semantic constructions, and the situations
described by the texts are diverse.
Third, and most crucial, answers to puzzle
questions never explicitly appear in the text and

Preamble: Six sculptures – C, D, E, F, G, and H
– are to be exhibited in rooms 1, 2, and 3 of an art
gallery. The exhibition conforms to the following
conditions:
(1) Sculptures C and E may not be exhibited in
the same room.
(2) Sculptures D and G must be exhibited in the
same room.
(3) If sculptures E and F are exhibited in the same
room, no other sculpture may be exhibited in that
room.
(4) At least one sculpture must be exhibited in each
room, and no more than three sculptures may be
exhibited in any room.
Question 1: If sculpture D is exhibited in room
3 and sculptures E and F are exhibited in room 1,
which of the following may be true?
(A) Sculpture C is exhibited in room 1.
(B) No more than 2 sculptures are exhibited in
room 3.
(C) Sculptures F and H are exhibited in the same
room.
(D) Three sculptures are exhibited in room 2.
(E) Sculpture G is exhibited in room 2.
Question 2: If sculptures C and G are exhibited
in room 1, which of the following may NOT be a
complete list of the sculpture(s) exhibited in room
2?
(A) Sculpture D (B) Sculptures E and H (C). . .
must be logically inferred from it, so there is
very little opportunity to use existing superficial
analysis methods of information-extraction and
question-answering as a substitute for deep understanding.
A prerequisite for successful inference
is precise understanding of semantic phenomena
like modals and quantifiers, in contrast
with much current NLP work that just ignores
such items. We believe that representations
with a well-defined model-theoretic semantics
are required.
Finally, the task has a clear evaluation metric
because the puzzle texts are designed to yield
exactly one correct answer to each multiplechoice
question. Moreover, the domain is another
example of “found test material” in the
sense of (Hirschman et al., 1999): puzzle texts
were developed with a goal independent of the
evaluation of natural language processing systems,
and so provide a more realistic evaluation
framework than specially-designed tests such as
TREC QA.
While our current system is not a real world
application, we believe that the methods being
developed could be used in applications such as
a computerized office assistant that must understand
requests such as: “Put each file containing
a task description in a different directory.”

Solving Logic Puzzles:

Traditional approaches to natural language understanding
(Woods, 1973; Warren and Pereira,
1982; Alshawi, 1992) provided a good account
of mapping from surface forms to semantic representations,
when confined to a very limited
vocabulary, syntax, and world model, and resulting
low levels of syntactic/semantic ambiguity.
It is, however, difficult to scale these
methods to unrestricted, general-domain natural
language input because of the overwhelming
problems of grammar coverage, unknown words,
unresolvable ambiguities, and incomplete domain
knowledge. Recent work in NLP has
consequently focused on more robust, broadcoverage
techniques, but with the effect of
overall shallower levels of processing. Thus,
state-of-the-art work on probabilistic parsing
(e.g., (Collins, 1999)) provides a good solution
to robust, broad coverage parsing with automatic
and frequently successful ambiguity resolution,
but has largely ignored issues of semantic
interpretation. The field of Question Answering
(Pasca and Harabagiu, 2001; Moldovan et al.,
2003) focuses on simple-fact queries. And socalled
semantic parsing (Gildea and Jurafsky,
2002) provides as end output only a flat classification
of semantic arguments of predicates,
ignoring much of the semantic content, such as
quantifiers.
A major research question that remains unanswered
is whether there are methods for getting
from a robust “parse-anything” statistical
parser to a semantic representation precise
enough for knowledge representation and automated
reasoning, without falling afoul of the
same problems that stymied the broad application
of traditional approaches. This paper
presents initial work on a system that addresses
this question. The chosen task is solving logic
puzzles of the sort found in the Law School Admission
Test (LSAT) and the old analytic section
of the Graduate Record Exam (GRE) (see
Figure 1 for a typical example). The system integrates
statistical parsing, “on-the-fly” combinatorial
synthesis of semantic forms, scope- and
reference-resolution, and precise semantic representations
that support the inference required
for solving the puzzles. Our work complements
research in semantic parsing and TRECstyle
Question Answering by emphasizing complex
yet robust inference over general-domain
NL texts given relatively minimal lexical and
knowledge-base resources.

Monday, January 27, 2014

Limitations of Analytic Results


Any study of this nature is necessarily limited in several
respects. First of all, the survey approach used here is but one of
several that can be used to inform decisions about extending the
measurement of analytical abilities. Tucker’s (1985) results provide
useful information from different perspectives--those of cognitive
psychologists and philosophers. Other approaches that might also be
informative include the methods of cognitive psychology, which could
be used not only to supplement but also to extend the survey results
reported here. These methods would seem especially appropriate
because they relate more directly to actual skills and abilities than
to perceptions.
Second, the diversity that characterizes graduate education
renders the results of this study incomplete. Some clues have been
gained as to similarities and differences among a limited sample of
graduate fields. However, the substantial differences found among
fields are a source of concern, since we cannot be certain whether or
not some other sample of fields might exhibit even greater variation.
Finally, as several survey respondents pointed out, many of the
reasoning skills about which we asked are expected to, and do, improve
as the result of graduate study. In some sense these skills may
represent competencies that differ from, say, the verbal skills
measured by the GENEG eneral Test in the respect that these analytical
skills may develop much more rapidly. A question of interest, then,
is how to accommodate the measurement of these skills in the context
of graduate admissions testing, which currently focuses on the
predictive effectiveness of abilities that are presumed to develop
slowly over a significant period of time.
Future Directions
The study suggested several possible future directions. Because
of the substantial variation among fields, one possibility would
involve extending the survey to include additional fields of graduate
study. Some refinements could now be made on the basis of past
experience. For example, ratings of the frequency with which skills
are used, as well as the frequencies of errors and critical incidents,
could probably be omitted without much loss of information. OII the
other hand, it would seem desirable to add categories allowing ratings
of the differential importance of various reasoning skills at
different stages of graduate education, ranging from entry level to
dissertation writing.
Finally, based on the reasoning skills identified as most
important, criterion tasks might be developed against which the
validity of the current GRE analytical measure could be gauged. This
strategy would make especially good sense for those important skills
that may not be measurable in an operational test like the GRE General
Test, but which might correlate highly with the abilities now measured
by the test. One specific possibility would be the development of
rating forms, which could be used by faculty to rate the analytical
abilities of their students. These ratings could then be used as a
criterion against which GRE analytical scores could be judged.

Implications of Analytic Results


In providing some information on faculty perceptions of the
involvement of various reasoning skills in their disciplines, the
study has, we hope, implications for developing future versions of the
GRE analytical ability measure. Converting this information to
operational test items will represent a significant step, however, and
it is not crystal clear at this stage exactly how helpful these
results may be eventually. Nonetheless, the findings do seem to
contain several useful bits of information:
1. Among the specific reasoning skills perceived as the most
important were several, e.g., "deducing new information from
a set of relationships" and "understanding, evaluating, and
analyzing arguments," that seem well represented in the two
item types (analytical reasoning and logical reasoning)
currently included in the analytical section of the General
Test. This suggests that these item types should continue to
play a role in future editions of the GRE General Test.
2. Some skills that are not measured by the current version of
the analytical measure were rated as very important.
"Reasoning or problem solving in situations in which all the
needed information is not known" was among the skills rated
as most important in each discipline, but currently
unmeasured, at least in any explicit manner, by the
analytical measure. In this regard, however, the previous
GRE-sponsored work of Ward, Carlson, and Woisetschlager
(1983) is noteworthy. These investigators studied
“ill-structured” problems, i . e. , problems that do not provide
all the information necessary to solve the problem, and noted
the resemblance of these problems to one variant of the
logical reasoning item type used in the analytical measure.
They concluded that there was no indication that “illstructured”
problems measure different aspects of analytical
ability than do “well-structured” problems, and therefore
that “ill-structured” problems could not be expected to
extend the range of cognitive skills already measured by the
GRE General Test. They did note, however, that the
“ill-structured” item type could be used to increase the
variety of items types in the test. The findings of the
current study suggest that the inclusion of this item type
would probably meet with faculty approval in most fields of
study.
3. With respect to their perceived importance, skills involving
the generation of hypotheses/alternatives/explanations tended
to cluster together, and the inability to generate hypotheses
independently was one of the incidents rated consistently as
having a substantial effect on faculty perceptions of
students’ analytical abilities.
A number of years ago the GRE Board sponsored a series
of studies (Frederiksen & Ward, 1978; Ward, Frederiksen, &
Carlson, 1978; Ward & Frederiksen, 1977; Frederiksen & Ward,
1975) that explored the development and validation of tests
of scientific thinking, including one especially promising
item type called “Formulating Hypotheses,” which required
examinees to generate hypotheses. Although the research
suggested that this item type complemented the GRE verbal and
quantitative measures in predicting success in graduate
school, the work was discontinued, largely because of
problems in scoring items that require examinees to
construct, not merely choose, a correct response. Carlson
and Ward (1986) have proposed to renew work on the
“Formulating Hypotheses” item type in light of recent
advances in evaluating questions that involve constructed
responses. The results of the faculty survey reported here
would appear to support this renewal.
4. Some of the highly important skills that are currently
well represented in the analytical measure are viewed as more
important for success in some disciplines than in others.
For example, “understanding, analyzing, and evaluating
arguments” was seen as more important in English than in
computer science. However, some skills seen as highly
important in some disciplines but not in others may not be as
well represented currently. For example, “breaking down
complex problems into simpler ones” was perceived as
extremely important in computer science and engineering but
not at all important in English. This would suggest,
perhaps, the need to balance the inclusion of items
reflecting particular skills, so that skills thought to be
important (or unimportant) in particular disciplines are
neither over- nor underrepresented.
The several dimensions that appear to underlie clusters of
reasoning skills may provide an appropriate way to extend the
current test specifications for the analytical measure,
especially if new item types are developed to represent some
of these dimensions.
The reasoning skills that were rated as very important, and
consistently so, across disciplines point to a potential
common core of skills that could be appropriately included in
an “all-purpose” measure like the GRE General Test. Other
skills judged to be very important in only a few disciplines
might best be considered for extending the measurement of
reasoning skills in the GRE Subject Tests. Faculty comments
about the difficulty in separating reasoning from subject
matter knowledge would seem to support this strategy.

Other Comments from Respondents


A number of general comments were made about the study--some
positive and some negative. The study was described alternately as
“very well done” and “interesting, ” but also, by one respondent, as a
“complete waste of time. ” Most of the comments were positive,
however, and many pertained more specifically to the kinds of
questions that were asked. The consensus seemed to be that the
questionnaire was not easy to complete. Moreover, faculty in the
several disciplines sometimes had different ideas as to what kinds of
questions would have been appropriate. For example, one English
faculty member noted the lack of questions on the use of language in
critical writing, and a computer science faculty member observed that
questions on abilities involved in formulating proofs, which are vital
to success in computer science, were only partially covered in the
questionnaire. An education faculty member noted that the survey did
a better job of assessing skills associated with hypothesis-testing
than with other research skills.
Along these same lines, a number of other respondents also
believed that the questions were more relevant to other disciplines
than to theirs. Several computer science professors, for example,
characterized the questions as oriented more toward argument than
problem solving, in which they had greater interest. An engineering
professor said that some of the questions were more pertinent to
educational research than to scientific or technical research, and one
English faculty found that questions seemed “geared to the hard
sciences. ” Finally, some noted ambiguities or redundancies, or
lamented that the questions were “too fine.” Even with these
difficulties, however, most of the comments about questions were
positive: “Items seem especially well chosen,” “questions are
appropriate, ” “questions were quite thorough,” “a good set of
questions, ” “topics covered are critical,” and “your lists are right
on target. ” The majority of comments, therefore, suggested that the
questionnaire was pitched at about the right level and included
appropriate kinds of reasoning skills.
-lOA
number of comments were made about the relationship between
subject matter and analytical skills, e.g., that successful problem
solving is predicated on having specific knowledge in a field. One
respondent believed that the questionnaire downplayed the importance
of “context effects” in favor of “strict reasoning ability,” and
another noted that the measurement of analytical abilities is quite
discipline specific. Another commented on the difficulty of measuring
analytical ability without regard to the amount of knowledge
available.
Several faculty commented on the development of analytical skills
in graduate school and on the differential importance of these skills
at various stages of graduate education. one respondent said, “I
rated entering behavior or behavior across the entire program
(courses, internships, dissertation). If I were to rate the
dissertation experience alone, the ratings would have been much
higher. ” Many noted that by the end of their programs, skills would
be expected to increase and various reasoning errors could be expected
to occur less frequently: “Entering students are more likely to make
these errors and graduates to make far fewer.” Another said, “In some
sense, the essence of graduate training is analytical skills.’ “These
are skills which students acquire. Flhen they enter they make most of
the mistakes you mentioned. If they can’t learn, they leave the
program. ” Another said, “I’m more concerned about the presence of
these behaviors after my course than before it. One simply does not
harshly judge a beginning student who makes an error, but one could be
very critical of a student about to finish a Ph.D. thesis....’

Factor Analytic Results


To condense the many questions into a more manageable form,
factor analyses were computed for each section of the questionnaire.
For the section on reasoning skills, only the importance ratings were
analyzed because they were so highly correlated with frequency
ratings. Because frequency ratings were slightly less correlated with
ratings of seriousness and criticality in the other two sections, they
too were analyzed for the questionnaire sections on reasoning errors
and critical incidents.
The reader should bear in mind that the factors resulting from
this analysis should not be construed as representing dimensions of
analytical ability, but rather only as reflecting the dimensions that
underlie faculty perceptions of analytical abilities. These
dimensions merely reflect the extent to which graduate faculty tended
to rate certain skills as about equally important (or equally
unimportant ) , not the degree to which these dimensions represent
“factors of the mind.” Thus, the results presented below are intended
to provide a parsimonious representation of faculty perceptions rather
than a basis for postulating distinct analytical abilities.
Reasoning skills. For the ratings of importance of reasoning
skills, the largest eigenvalues were 16.4, 3.9, 2.4, 1.6, and 1.1, and
the application of a scree test (Cattell, 1966) suggested the
appropriateness of a five-factor solution, which was then rotated
according to the varimax criterion (Kaiser, 1958). The five-factor
varimax rotation accounted for 80% of the common variance. The factor
loadings and communalities are given in Appendix B. Table 7
summarizes the variables that were most instrumental in defining each
factor.
Factor I, which accounted for about a third of the common
variance, was characterized by highest loadings, generally, from
skills involving arguments. Thus, Factor I seems to involve a kind of
critical thinking related to argumentation.
Factor II accounted for about 29% of the common variance, and was
defined primarily by variables related to the drawing of conclusions,
e.g. I generating valid explanations, supporting conclusions with
sufficient data, and drawing sound inferences from observations. The
conclusion-oriented skills that define this second critical thinking
factor would seem to be of a more active or productive nature,
involving the construction of inferences or conclusions, rather than
evaluating the soundness of arguments or inferences, as is the case
for Factor I.
Factors III-V each accounted for a somewhat smaller proportion of
common variance (10% - 15%) than did Factors I and II. Factor III is
best defined by skills related to defining and setting up problems or
analyzing their components as a prelude to solving them. Factor IV is
best characterized by inductive reasoning skills, i.e., the drawing of
conclusions that have some evidential support, but not enough to
indicate logical necessity. Factor V is somewhat difficult to define,
but, by virtue of its two highest loadings, it seems to reflect an
ability to generate alternatives.

Reasoning errors. For the ratings of seriousness of reasoning
errors, the largest eigenvalues were 6.5 and 1.1, and the two factors
accounted for 96% of the common variance. (Frequency ratings were
also factor analyzed and are presented in Appendix B. Because the
results were so similar to the analysis of seriousness ratings, they
are not discussed here.) As shown in Table 8, Factor I, which
explained about 52% of the common variance, was characterized by
loadings from errors involved in the evaluation of evidence, e.g.,
offering irrelevant evidence to support a point. Factor II, on the
other hand, seemed to involve more formal logical errors, particularly
as related to reasoning with more statistically oriented material--for
example, failing to take account of a base rate, failing to recognize
differences between populations and samples, and confusing correlation
with causation.

Item-level Results


Tables 1-3 show the mean ratings by discipline for each question
included in the survey instrument. The numbers in the total column
are the grand means for all disciplines. Numbers under each
discipline represent for each item the deviations from these means.
The F tests in the right-most column indicate whether the means are
significantly different among the six disciplines. Because the
average ratings, over all respondents, for “frequency of use” and
“importance for success” correlated .99, only the importance ratings
are presented for reasoning skills. Likewise, only the “seriousness”
ratings are presented for reasoning errors, since their correlation
with frequency ratings was .98, and, for critical incidents, only the
average “effect” ratings are presented, since their correlation with
frequency ratings was .94.
Tables 1-3 show a substantial number of significant differences
among disciplines with respect to the importance placed on various
reasoning skills (Table l), the seriousness with which they regard
particular kinds of reasoning errors (Table 2), and the impact that
various critical incidents have on the estimation of students’
analytical abilities (Table 3). Table 4, showing only the very
highest rated skills and most critical errors and incidents, gives a
flavor of the differences among these six disciplines. For example,
chemistry faculty placed a high premium on being able to generate
hypotheses, questions, or experiments, to draw sound inferences from
observations, and to analyze and evaluate previous research. Engl i sh
faculty, on the other hand, saw greater importance in skills involving
argumentation-- being able to understand, evaluate, analyze, elaborate,
recognize, and support aspects of an argument.
Faculty in the six disciplines also appeared to have quite
different views as to the numbers of skills that were important in
their respective disciplines. The numbers of reasoning skills that
received average ratings of 4.0 or higher varied markedly by
discipline as follows: 23 for chemistry, 5 for computer science, 27
for education, 22 for engineering, 29 for English, and 26 for
psychology. These differences may have arisen, for example, from our
particular choice of questions, from differences in standards amonq
disciplines, or from some other factor(s) .
It can be seen, even from Table 4, however, that some skills were
-6-
viewed as very important by several disciplines. For example,
‘breaking down complex problems into simpler ones” was rated as the
single most important skill (of the 56 skills listed) in both computer
science and engineering. “Determining whether conclusions are
logically consistent with, and adequately supported by, the data” was
rated as one of the three most important skills by both education and
psychology faculty; “drawing sound inferences from observations” was
the highest rated skill in chemistry and nearly the highest in
education.
The extent to which faculty in different disciplines agreed on
the importance of various skills, errors, or incidents can be examined
in a slightly different manner. To get some idea of the skills,
errors, and incidents that were viewed as relatively important, and
for which average ratings did not differ significantly across
disciplines, Table 5 was prepared. This table shows only those skills
that received average ratings of importance of more than 3.5 over all
six disciplines combined, and for which analyses of variance did not
detect any significant differences among disciplines.
“Reasoning or problem solving in situations in which all the
needed information is not known” was the skill rated as most important
overall. Such skills as ‘detecting fallacies and logical
contradictions in arguments,” “deducing new information from a set of
relationships, ” and “recognizing structural similarities between one
type of problem or theory and another” were the next most highly rated
skills. These were followed closely by “taking well-known principles
and ideas from one area and applying them to a different specialty,’
“monitoring one’s own progress in solving problems,” and “deriving
from the study of single cases structural features or functional
principles that can be applied to other cases.”
Table 6 lists the reasoning errors and critical incidents that
were judged overall to be the most serious or to have the most effect
on the estimation of students’ abilities. Three errors/incidents were
judged to be most serious or critical: ‘accepting the central
assumptions in an argument without questioning them,” “being unable to
integrate and synthesize ideas from various sources,” and “being
unable to generate hypotheses independently.”
It should be noted that there are many other decision rules,
based on average ratings and differences among disciplines, that could
have been used here to form a “common core” of skills or errors/
incidents. Tables l-3 could be consulted to apply alternative rules.

Data Analysis


Means and standard deviations were calculated for each question
by academic field of study, and analyses of variance were run for each
question to assess differences among the six fields. The various
ratings were correlated within questionnaire sections. For example,
within the section on reasoning-skills, the ratings of frequency-and
importance were correlated; within the section on reasoning errors,
the ratings of frequency and seriousness were correlated. -
Finally, within each section (and for each kind of rating), the
data were factor analyzed to effect some reduction in the large number
of questions. A principal axis factoring, with squared multiple
correlations as the initial estimates of communalities, was used to
determine the number of factors to be retained for each section,
according to both the magnitude of the eigenvalues and the breaks in
their size. (Our inclination was to err on the side of retaining too
many factors at this exploratory stage.) Various numbers of factors
were then rotated according to the varimax criterion. Although other
oblique rotations could have been used also, it was felt that-the
detection of uncorrelated factors would best serve the objectives of
further test development.
The Sample
A total of 165 chairpersons (65% of those contacted) nominated a
total of 297 faculty members, of whom 255 (86%) returned usable
questionnaires. The response rates across fields were generally
comparable.
Full professors constituted a slight majority of the responding
sample (51%) ; associate professors made up the next largest proportion
(34%). About 13% were assistant professors, and the remaining small
proportion were deans, associate deans, or lecturers.

Questionnaire Development


Initially, 30 department chairs (in English, education,
engineering, chemistry, computer science, or psychology) were
contacted in 30 graduate institutions, and asked to identify three
faculty members in their departments who would be willing to provide
their insights into the analytical or reasoning skills that are most
critical for successful performance in graduate school. These 30
institutions were chosen from the GRE Directory of Graduate Programs
in such a way as to ensure some degree of geographical representation.
All of these departments require or recommentdh at applicants submit
GPE General Test scores; it was felt that these departments might be
more interested than nonrequiring departments in efforts to improve
the GRE General Test.
At this preliminary stage, faculty membersw ere informed of the
purpose of the project and asked to give, in an open-ended fashion,
examples of:
(a) the analytical, reasoning, or thinking skills they
perceived as most important for successful graduate
study in their fields (e.g., identifying assumptions
on which an argument is based), particularly as these
skills differentiate successful from marginal students
(b) specific critical incidents related to thinking or reasoning
that caused them to either raise or lower their estimation
of a student's analytical ability (e.g., failing to qualify
a conclusion as appropriate)
(c) particular reasoning or thinking "flaws" they have observed
in their students (e.g., using the conclusion as the premise
of an argument).
Useable responses were obtained from 33 faculty members, who
suggested a total of 138 important reasoning or thinking skills, 86
critical incidents, and 75 reasoning "flaws." Some of these responses
were duplicates. Several other respondents did not specify discrete
skills or errors but chose rather to send helpful discursive replies
to our inquiry. All responses were condensed and edited, and
generally evaluated with respect to whether they should be included in
the larger, more structured questionnaire that was planned. Some
responses constituted usable questionnaire items essentially as stated
by respondents (e.g., "the ability to break complex problems into
simpler components"). Other responses were revised or eliminated
because they were too general (e.g., "the ability to think
independently"), and others because they were too specific or applied
only to a particular field (e.g., "the ability to resolve into
enthymemic form any argumentative work" or "the ability to take
ecological validity into account").
The structured questionnaire was constructed on the basis of this
preliminary survey, on a review of relevant literature (Duran, Powers,
& Swinton, in press) and on a number of additional books or texts on
reasoning (e.g., Campbell, 1974; Fischer, 1970; Johnson & Blair, 1983;
Kahane, 1984, Nosich, 1982; Salmon, 1984; Striven, 1976; Toulmin,
Rieke, & Janik, 1984; Wason & Johnson-Laird (1972); and Weddle, 1978).
Several other articles, e.g., a seminal work by Ennis (1962) and a
list of skills by Arons (1979), proved especially useful. Various
issues of CT News, published by the Critical Thinking Project at
California State University at Sacramento, were also perused.
Previous work on critical incidents in graduate student performance
(Reilly, 1974a, 1974b) was also consulted, and several of the
incidents related to critical facility were included in the present
study. Finally, the list generated by Tucker (1985), who gathered the
impressions of ETS test development staff, philosophers, and cognitive
psychologists, also proved to be a valuable resource.
The final questionnaire (see Appendix A) was structured to
include questions about the importance and frequency of various
reasoning skills, of commonlyo bserved errors in reasoning, and of
specific incidents that may have led faculty to adjust their estimates
of students' analytical abilities. Questions were grouped under
several headings, mainly to give respondents some sense of their
progress in responding to the rather lengthy questionnaire.
The Sample
Six academic fields (English, education, psychology, chemistry,
computer science, and engineering) were included in the final survey.
These fields were thought to represent the variety of fields of
graduate study and the variation in the kinds of reasoning abilities
involved in graduate education. Using the data tapes of the Higher
Education General Information Survey (HEGIS), nonoverlapping samples
of 64 graduate institutions with doctoral programs were drawn for each
of the six graduate fields. A random sampling procedure was used such
that eight institutions from each of the eight HEGIS geographic
regions were selected for each field. This sampling was greatly
facilitated by the work of Oltman (1982). The admission requirements
of these institutions were determined from the Directory of Graduate
Programs (GRE/CGS, 1983), and only those that either required or
recommendeGd REG eneral Test scores were included in the sample. In
this manner, 40 institutions were selected for the final sample for
each field. In addition, one institution with a relatively large
proportion of Black students and one with a relatively large
percentage of Hispanic students were included in the samples for each
field, thus raising the total number of institutions to 42 per field.
Letters were then sent to departmental chairpersons, who were asked to
nominate two faculty membersw ho would be willing to complete the
questionnaire. Respondents were paid $25 for their participation.

Analytical Reasoning Skills Involved in Graduate Study


Perceptions of Faculty in Six Fields
Despite the complexity of human cognitive abilities, standardized
admissions tests have tended to focus almost exclusively on the
measurement of broadly applicable verbal and quantitatice aptitudes.
One criticism of such omnibus verbal and quantitative ability measures
is that they provide only limited descriptions of students' academic
strengths and weaknesses, and that they do not therefore adequately
reflect test takers' differential development in other important
cognitive areas.
In 1974 the GRE Board approved a plan to restructure the GRE
Aptitude Test in order to allow examinees to demonstrate a broader
range of academic skills (Altman, Carlson, & Donlon, 1975). A survey
of constituents revealed that, of several possible new areas of
measurement (e.g., abstract reasoning, scientific thinking, and study
skills), graduate faculty, administrators, and students were most
receptive to assessing analytical or abstract reasoning skills
(Miller & wild, 1979). Developmental activities then followed and,
after careful psychometric study of several alternative analytical
item types, four distinct kinds of items were selected for the new
analytical section of the GRE Aptitude Test, which was introduced
operationally in the 1977-78 testing year. Graduate institutions were
cautioned against using the scores from the new analytical section
until further evidence could be generated on the validity of the new
measure. Subsequently, the administration of the new measure to large
numbers of examinees under operational conditions enabled the further
collection of information about the new measure.
Some research strongly suggested the promise of the analytical
section: it appeared to measure an ability that was distinguishable
from the verbal and quantitative abilities measured by the test
(Powers & Swinton, 1981), and the score derived from it was related to
successful performance in graduate school (Wilson, 1982).
Unfortunately, however, further research suggested serious problems
with the two item types (analysis of explanations and logical
diagrams) that comprised the bulk of the analytical section.
Performance on these item types was shown to be extremely susceptible
to special test preparation (Swinton & Powers, 1983; Powers & Swinton,
1984) and to within-test practice (Swinton, wild, & Wallmark, 1983).
Consequently, in 1981 the two problematic item types were deleted from
the test, and additional numbers of analytical reasoning and logical
reasoning items, which constituted a very small part of the original
analytical measure, were inserted.
The most recent research on the General Test (Stricker & Rock,
1985; Wilson, 1984) has given us some reason to question both the
convergent and the discriminant validity of the two remaining item
types. Two currently used GRE analytical item types reflect only a limited
portion of the reasoning skills that are required of graduate
students. The most notable omission is the assessment of inductive
reasoning skills, i.e., reasoning from incomplete knowledge, where the
purpose is to learn new subject matter, to develop hypotheses, or to
integrate previously learned materials into a more useful and
comprehensive body of information. Thus, it seemed, the analytical
ability measure of the GPE General Test might be improved through
further effort.
The objective of the study reported here was to generate
information that might guide the development of future versions of the
GRE analytical measure. More specifically, the intention was to gain
a better understanding of what reasoning (or analytical) skills are
involved in successful academic performance at the graduate level, and
to determine the relative importance of these skills or abilities both
within and across academic disciplines. It was thought that this
information might be especially useful for developing additional
analytical item types.
As mentioned earlier, the initial version of the GRE analytical
ability measure was developed after a survey had suggested the
importance of abstract reasoning to success in graduate education.
This survey, however, was not designed to provide any detailed
information on the importance of specific analytical skills, as was
the intention here.

Saturday, January 25, 2014

SOLVING THE PEOPLE PUZZLE


JUST HOW POWERFUL IS MOTIVATION, REALLY?
Our goal at ABA is to help our clients with solid business council which will help them outwit
their competition, grow their business, and create wealth. In so doing, it is our desire that
they use a portion of that wealth to “give back” to their communities and to worthwhile causes
wherever in the world they exist. In the last two editions of the monthly ABA Insider we have
been offering practical advice on a subject which has great power and, yet, seems to be one
of the greatest mysteries to business leaders. It represents perhaps the area of greatest
positive potential for improving the performance of your business. That subject is BUILDING
YOUR TEAM and the challenges which surround MANAGING it. We have called our series,
“Solving the People Puzzle.” Some people really enjoy working on and solving puzzles, and
some really find it exhausting and somewhat maddening. But, like it or not, your team is the
key to your success as a business, and even more so during this challenging business
climate. Most leaders who find this part of running a business the most difficult, feel their
skills are inadequate in this area, and/or haven’t ever learned the principles which define how
to build and manage a successful and growing team. We will continue to discuss some of
those principles in this article.
TEAM BUILDING’S MOST CRITICAL COMPONENTS
Previously, we have tackled the issues of Attitudes and how profoundly they affect your
ability to build and manage a winning team which sets your company apart from your
competition. Another important building block which must be in place if you are to achieve
optimal performance from your team is Clarity of Expectations for every member. They
must each fully understand exactly what good behavior looks like for the position they fill.
Additionally, they must have been given all the tools and resources they need in order to be
successful. One such tool is Empowerment, which allows people to do their best work and
leverage their strengths for the benefit of the company as well as themselves. There are
others like: Mutual Trust, Alignment with Mission, and Leadership by Example.
WHAT’S NEXT?
The mortar which holds all of the building blocks of successful behavior together is
motivating people to produce.
1. Defining Motivation.
Before you can motivate people to produce at high levels, you must know exactly
what it is. First of all, the definition of motivation comes primarily from the two
words inside it. They are motive and action. Therefore, motivation is: the
creation of desires and situations which move people to action.
All human activities (or actions) are undertaken as a direct effort to meet some
need. In other words, everything we do, from the simplest act to very complicated
tasks, we do for a reason, and that reason is based around a need that we have.
That need has to do with either something in our lives, or the lives of those we love.
It may be something we want to acquire, (Psychologists call this “approach”) or
something in our lives, or the lives of those we love, that we do not want.
(Avoidance)
 2. Types of Motivation.The two basic types of motivation are fear motivation and incentive motivation. Obviously, the
former has to do with avoiding the things we do not want to happen and the latter has to do
with the things we desire for us and for those we love. Both types of motivation can be
effective tools in moving us to action, but they each have specific limitations and guidelines for
their use. In terms of needs which we want to fulfill or avoid in our lives, there are both
biological needs and psychological needs. To design an environment of motivation for
your team, you must understand which needs are the strongest for which members on
the team, and then proceed to set up a system of cause and effect which allows them to
meet those needs. Though, this is not a simple task, the rewards of your doing so are huge.
For when you create a clear path for a person to meet their needs, whether they are biological
or psychological, you ignite a very powerful cache of human dynamite.
3. “Win-Win”.
If an effective system of motivation is to be established, all team members must be committed
to the principle that everyone in the equation must win.. They must perceive that no one in
the company or who is touched by the company will win at the expense of anyone else. Any
and all systems of motivation which do not have this win-win component are doomed for
failure. Although we cannot treat everyone the same in our company, we must establish
systems, policies, and procedures which are fair.
Volumes could be written regarding motivation, and still everything would not have been covered.. But
we at ABA do know from experience how incredibly valuable it is to understand its principles. That
understanding has revolutionized the business of many or our clients. To create an effective system of
motivation you must know your team members and what’s important to them. Additionally, you must
really care about them, and they must feel that caring attitude from their leadership. Finally, you must
understand exactly what are the needs, goals, and resources of your organization, so that you can meld
the two agendas effectively.. Once you have done that, hang on for the ride, for it will be an exciting one
both for you and your team.

Wednesday, January 22, 2014

Evaluating the reliability of evidence


For each of the following passages, evaluate the reliability and plausibility of the evidence
presented, and make a judgement as to what conclusions, if any, can be drawn from it.
1 The missing money
Eight-year-olds Jane, Lucy and Sally were playing in the back garden of Jane’s home,
which is in a quiet residential street. When they heard the bell of the ice-cream van,
they ran around to the front, and Mr Black, Jane’s father, gave them money to buy
ice creams. He left his wallet on a table just inside the front door, and went back to
tidying the attic. When he came down an hour later, he found the front door open,
although he was sure he had closed it. The evening newspaper was on the doormat,
and his wallet was where he had left it, but the two £10 notes it had contained were
gone, although his credit cards remained.
The girls all said that they hadn’t been at the front of the house since they bought
the ice creams, and had not seen or heard anyone come to the house. When Lucy
and Sally had gone home, Jane told her father that whilst the girls were playing
hide-and-seek, Lucy had hidden around the side of the house, out of view of the
back garden. Jane said, ‘Lucy could have gone into the house then and taken
the money. I wouldn’t be surprised if she did. She doesn’t get much pocket money,
and once she took some money out of another girl’s purse.’
Mr Black talked to Sally’s mother who asked Sally whether any of the girls had gone
into the house that afternoon. Sally said that Jane had gone into the house through
the side door, saying that she needed to use the bathroom. Sally also said that Lucy
had been out of sight for only a minute or two before they found her. Mr Black
asked his daughter if she had gone indoors. She said she had done so to use the
bathroom next to the side door, but had not gone into the hall, so had not seen
whether the front door was open.
The boy who delivered the newspaper lived a few doors away from Jane’s house. Mr.
Black asked him if he had seen anything suspicious. He said that the door was open
when he arrived, and he just threw the paper onto the doormat, instead of putting it
through the letter box, as he usually did. He claimed that he did not notice the
wallet, and that he had not seen anyone else at the front of the house.
Mr Black decided not to talk to Lucy’s family about the disappearance of the money,
and not to report the apparent theft to the police, because the amount stolen was
relatively small.
2 Do gun laws have an impact on the rate of gun crime?
The following extracts on the topic of gun laws and/or crime are from a number of sources,
as indicated. Consider the reliability and plausibility of each of the sources in turn; then
decide whether you can draw any conclusion, based on all the evidence. An internet search
may help you to make judgements about reliability of the authors.
(i) Below are two extracts from ‘Banning guns has backfired’, written in 2004 by John R.
Lott Jr., a resident scholar at the American Enterprise Institute, and author of a book, The
Bias Against Guns. Various websites report that John Lott has claimed that in 98 per cent
of instances of defensive gun use, the defender merely has to brandish the gun to stop an
attack. Other academic writers dispute this finding.
The government recently reported that gun crime in England and Wales nearly
doubled in the four years from 1998–99 to 2002–03.
Crime was not supposed to rise after handguns were banned in 1997. Yet, since
1996 the serious violent crime rate has soared by 69%: robbery is up by 45% and
murders up by 54%. Before the law, armed robberies had fallen by 50% from 1993
to 1997, but as soon as handguns were banned the robbery rate shot back up, almost
back to their 1993 levels.
The 2000 International Crime Victimization Survey, the last survey done, shows
the violent crime rate in England and Wales was twice the rate in the U.S. When the
new survey for 2004 comes out, that gap will undoubtedly have widened even
further as crimes reported to British police have since soared by 35%, while declining
6% in the U.S.
. . .
Britain is not alone in its experience with banning guns. Australia has also seen its
violent crime rates soar to rates similar to Britain’s after its 1996 Port Arthur gun
control measures. Violent crime rates averaged 32% higher in the six years after the
law was passed (from1997 to 2002) than they did the year before the law in 1995.
The same comparisons for armed robbery rates showed increases of 74%.
During the 1990s, just as Britain and Australia were more severely regulating guns,
the U.S. was greatly liberalizing individuals’ abilities to carry guns. Thirty-seven of
the 50 states now have so-called right-to-carry laws that let law-abiding adults carry
concealed handguns once they pass a criminal background check and pay a fee. Only
half the states require some training, usually around three to five hours’ worth. Yet
crime has fallen even faster in these states than the national average. Overall, the
states that have experienced the fastest growth rates in gun ownership during the
1990s have experienced the biggest drops in murder rates and other violent crimes.
(ii) The following table is from ‘Some facts about guns’, which can be found on the website
for Gun Control Network, an organisation which campaigns for tighter controls on guns
of all kinds.

EVALUATING EVIDENCE AND DRAWING CONCLUSIONS


Chapter 3 contained exercises in drawing conclusions on the basis of information assumed
to be true. The following examples illustrate some of the circumstances in which one
may attempt to draw conclusions on the basis of consideration of the reliability and the
plausibility of evidence. Example 1 is an imaginary scenario in which the reliability of
the evidence provided by witnesses or participants must be assessed. Someone engaged in
police or legal work, and people who are members of juries at criminal trials, would have
to make these kinds of assessments, and draw conclusions from them. Examples 2 and 3
present extracts from newspaper articles on topical issues. You may often read such articles
and wonder how reliable is the evidence they contain, and what conclusions can be drawn
from it.
Example 1: Matt in the night club

Last Friday night there was an alleged incident in the Jazza club. Matt, a university
student, claims that at about 10.30 pm he was head-butted on the dance floor of
the club, thereby receiving an injury to his nose. A visit to the A and E Department
later that evening confirmed that his nose was broken. He said that because it was
dark, and because he had just turned to walk off the dance floor, he did not get a
clear view of the person who attacked him. Matt’s friend Joe told the management
of the club that a group of four young men left the dance floor immediately after
Matt stepped off, and that one of them was Carl, who had been known to be
involved in attacks on university students, including Joe himself.
A girl who had been talking to Matt and Joe earlier in the evening told the management
that she knew Carl, that he had always been pleasant towards her, and that she
was sure that he wouldn’t do anything violent. She insisted that she was not his
girlfriend, although he had asked her for a date recently, which she didn’t accept
because she was still with her last boyfriend.
Two bouncers at the club were in the room at the time, but said they had seen
nothing of the incident. One of them said, ‘People are always claiming that somebody
has injured them, but it usually turns out that they were drunk and either fell
over or bumped into something.’
Carl was identified from a CCTV video which showed him and his friends leaving
the club at 10.30 pm, apparently laughing and joking. After Matt had reported the
incident to the police, they interviewed Carl and his friends, all of whom denied any
knowledge of such an incident, and claimed never to have noticed Matt at the club.
Let us consider first the plausibility of the allegation. Matt’s claim is plausible, in that a
blow on the nose from someone’s skull could cause a fracture of the bone, and it would
have been possible, in the darkness and confusion, for him to have been unaware who
attacked him. He hasn’t accused any particular person, so his allegation is simply that
someone struck him on the nose, causing a fracture, and this is the kind of thing that could
happen. From the information given, we have no reason to think that Matt was lying, but
he may have been confused as a result of drinking alcohol. However, it is unlikely that he
would have sustained this kind of injury from falling or bumping into something.
What can we conclude from Joe’s evidence? Given his previous experience of Carl, he may
have been more likely to notice him than to notice others who had been close to where
Matt was standing, and who could have attacked him. It is also possible that Joe simply
wanted to get Carl into trouble, and that Carl, though he was certainly in the club, was not
close enough to Matt to have caused the injury.
The girl’s evidence seems to give support to Carl, though she is not an eye witness. She is
not his girlfriend, so her statement cannot be discounted on the grounds that she has a
personal interest in protecting him. On the other hand it is not clear how well she knows
him, and her comments that she turned him down when he asked for a date suggest that
Carl may have had a motive to attack Matt, if he had seen her talking to Matt and Joe.
The bouncers’ evidence is not conclusive. Matt could have been attacked without their
noticing it, and given that it is part of their job to minimise trouble in the club, they would
want to play down the possibility of any such incident.
The fact that Carl and his friends were leaving the club, apparently laughing and joking,
at around the time the incident is said to have occurred proves nothing. Their leaving at
this relatively early time in the evening could have been for reasons other than wanting to
avoid accusations. The denial of involvement by Carl and his friends can be discounted,
because if they had been involved they would not have admitted to it, and in this case
corroboration does not confirm the reliability of their evidence, because either group
loyalty or possibly threats from Carl may influence their statements.
The most that it is reasonable to conclude, without evidence from other witnesses, is that
Matt probably was injured as a result of being head-butted by someone, but there is
insufficient evidence to conclude that Carl, or any one of his friends, is the guilty person.
Example 2: Is homework for school children necessary or desirable?

PLAUSIBILITY OF CLAIMS


Another important question to ask when evaluating evidence is ‘how plausible is this
claim, or piece of evidence?’
We need to clarify what is meant by ‘plausible’ in this context, in particular to make
clear the difference between questions about plausibility of evidence and questions about
reliability of evidence. We have already used the word ‘plausible’ in relation to
explanations. In that context its meaning was ‘possibly correct, or likely to be correct’.
Sometimes in everyday speech the word plausible is applied to someone’s manner. For
example, members of a jury may judge a defendant ‘plausible’ and therefore be inclined to
believe their statements, on the basis of their speech, facial expression and body language,
rather than on the basis of a process of reasoning. It is possible that human beings are very
good at making accurate judgements of character on this basis, but they are not using
critical thinking skills when they do this. Of course, it is also possible that someone could
use a process of reasoning to judge plausibility in this sense, if there were well-established
criteria which indicated when individuals were lying. So in critical thinking texts and
examinations, a question could in principle be asked about the plausibility of a particular
person, based on evidence of their behaviour.
However, in general when questions about plausibility of evidence are asked in critical
thinking texts, the question refers to the plausibility of what is said or claimed, and not the
plausibility of the person who says it. Thus the question ‘Is Ms Brown’s evidence plausible?’
should be interpreted as meaning ‘Is Ms Brown’s claim the kind of thing that could
be true, or could have happened?’

Factors affecting someone’s judgement


Someone who aims to tell the truth, and who is in a position to have the relevant knowledge
may nevertheless be unreliable because of circumstances which interfere with the
accuracy of his or her judgement. For example, emotional stress, drugs and alcohol can
affect our perceptions. We can be distracted by other events which are happening concurrently.
A parent with fractious children in the car may notice less about a road accident
than someone who is travelling alone. We can forget important aspects of what has
happened, particularly if some time elapses before we report an incident. In the case of
people gathering and assessing evidence, for example scientists and psychologists, the
accuracy of their observations and interpretations can be affected by their strong
expectation of a particular result, or their strong desire to have a particular theory confirmed.
Expectation and desire can also play a part in evidence provided by people who
have prejudices against particular groups or individuals, so we need to be aware that
prejudice may influence someone’s belief as to what they saw or what happened.
Corroboration
Sometimes when we have evidence from more than one source, we find that two (or more)
people agree in their descriptions of events – that is to say, their evidence corroborates
the statements of others. In these circumstances, unless there is any reason to think that
the witnesses are attempting to mislead us, or any reason to think that one witness has
attempted to influence others, we should regard corroboration as confirming the reliability
of evidence.
Summary: assessing reliability of evidence/authorities
Here is a summary of the important questions to ask yourself about the reliability
of evidence and of authorities.
1 Is this person likely to be telling a lie, to be failing to give full relevant
information, or to be attempting to mislead?
• Do they have a record of being untruthful?
• Do they have a reason for being untruthful?
• Would they gain something very important by deceiving me?
• Would they lose something very important by telling me the truth?
2 Is this person in a position to have the relevant knowledge?
• If expert knowledge is involved, are they an expert, or have they been
informed by an expert?
• If first-hand experience is important, were they in a position to have that
experience?
• If observation is involved, could they see and/or hear clearly?
3 Are there any factors which would interfere with the accuracy of this person’s
judgement?
• Was, or is, the person under emotional stress?
• Was, or is, the person under the influence of alcohol or drugs?
• Was the person likely to have been distracted by other events?
• Does the person have a strong desire or incentive to believe one version of
events, or one explanation, rather than another?
• Does the person have a strong prejudice that may influence their beliefs
about events?
• In the case of first-hand experience of an event, was information received
from the person immediately following the event?
4 Is there evidence from another source that corroborates this person’s
statement?

RELIABILITY OF AUTHORITIES


We can approach the assessment of reliability by thinking about the characteristics of the
person who is giving information. We have to think about the circumstances which could
make it likely that what someone said was untrue. Before reading further, try to write a
list of characteristics or tendencies of other people which would make you think that the
information they were giving you was not reliable.
Reputation
If one of your acquaintances has a record of being untruthful, then you are much more
cautious about accepting their statements as true than you would be about believing
someone whom you thought had never lied to you. For example, if someone who always
exaggerates about his success with women tells you that at last night’s disco several women
chatted him up, you will be inclined to be sceptical. The habitual liar is an obvious case of
someone whose statements are unreliable.
Vested interest
Of course, people who are not habitual liars may deceive others on occasions. They may do
so because they stand to lose a great deal – money, respect or reputation – by telling the
truth. So when we have to make judgements about the reliability of people we know to
be generally truthful, and about people with whom we are not acquainted, we should bear
this consideration in mind. That is not to say that we should assume people are being
untruthful, simply because it would be damaging to them if others believed the opposite
of what they say. But when we have to judge between two conflicting pieces of information
from two different people, we should consider whether one of those people has a
vested interest in making us believe what they say. For example, if an adult discovers two
children fighting, then each child has a vested interest in claiming that the other started
the fight. But the evidence of a third child who observed the fight, but knows neither of
the protagonists, could be taken to be more reliable in these circumstances.
Relevant experience or expertise
If someone was not in a position to have the relevant knowledge about the subject under
discussion, then it would be merely accidental if their statements about the subject were
true. There are a number of circumstances which prevent people from having the relevant
knowledge. The subject under discussion may be a highly specialised subject which is
understood only by those who have had appropriate education or training. We would not
expect reliable information on brain surgery to be given by people who have had absolutely
no medical training. This is why in many areas of knowledge we have to rely on what
experts say. It is important to note, however, that being an expert, no matter how eminent,
in one field, does not confer reliability on topics beyond one’s area of expertise.
People who are not experts can read about specialised subjects, and pass on information
to us about such subjects, so we do not have to disbelieve people simply because they are
not experts. But we would be wise to ask the source of their information. For example, if
someone told us that they had read that a new car had better safety features than any other
model, we should regard the information as more reliable if it came from a consumer
magazine or a motoring association than if it was a report of a comment made by a famous
person who owned such a car.
Another circumstance in which someone would not be in a position to have the relevant
knowledge would be where eye-witness testimony was crucial, and the person could not
have seen clearly what happened – perhaps because of poor eyesight, or perhaps because
he or she did not have a clear line of vision on the incident. In the case of a road accident,
for example, we would expect to get a more accurate account of what happened from
someone with good vision who was close to the accident and whose view was not obscured
in any way, than from someone with poor eyesight, or who was at some distance from the
accident, or who was viewing it from an angle, or through trees. Similar considerations
would apply in the case of information dependent upon hearing rather than vision.

Sunday, January 19, 2014

Recognising and applying principles


Arguments which rely on general principles have implications beyond their own subject
matter, because it is in the nature of a general principle that it is applicable to more than
one case. A piece of reasoning may use such a principle without explicitly describing it as a
general principle, so we need to be alert to the fact that some of the statements in an
argument may apply to cases other than the one under discussion. There can be many
kinds of principle, for example, legal rules, moral guidelines, business practices, and so on.
Principles may function in an argument as reasons, as conclusions or as unstated assumptions.
So, when we are going through the usual process of identifying reasons, conclusions
and assumptions, we should ask ourselves whether any of them is a statement with general
applicability.
The skill of identifying principles is valuable, because sometimes the application of a
principle to other cases – that is to say, the further implications of a principle – may show
us that the principle needs to be modified, or maybe even rejected. Suppose, for example,
someone wants to argue against the use of capital punishment, and offers as a reason
‘Killing is wrong’. This principle, stated as it is without any qualification, obviously has
very wide applicability. It applies to all cases of killing. So, if we are to accept it as a
principle to guide our actions, it means that killing in wartime is wrong, and killing in selfdefence
is wrong. If we are convinced that killing in self-defence cannot be wrong, then we
have to modify our original principle in order to take account of exceptions to it. Applying
principles involves being consistent in our reasoning, recognising all the implications of
our own and others’ reasoning.
Another example is offered by a debate in the sphere of medical ethics. It has been
suggested that when the demand for treatment for illness exceeds the resources available,
and thus decisions have to be made about priorities, one type of illness which should come
very low on the list of priorities for treatment is illness which individuals brings upon
themselves by their actions or lifestyles. Such illness can be described as ‘self-inflicted’.
Most doctors would not take the view that self-inflicted illness should not be treated, but it
is an issue which is often mentioned when public opinion is consulted about how best to
use the resources available for health care. For example, someone may say, ‘We should not
give high priority to expensive heart treatments for smokers, because they have brought
their illness on themselves’.
Clearly the principle underlying this is that ‘We should not give high priority to the
treatment of self-inflicted illness’, and it is a principle with wider applicability. But in order
to understand to which cases of illness it properly applies, we need to be clearer about
what exactly is meant by ‘self-inflicted illness’. At the very least it must mean an illness
which has been caused by the actions or behaviour of the person who is ill. On this
definition, the principle would apply to a very wide range of illnesses – for example,
smoking related diseases, alcohol and drug related diseases, diseases caused by unsuitable
diet, some sports injuries, some road accident injuries, some cases of sexually transmitted
disease. However, it may be claimed that one cannot properly be said to have inflicted a
disease on oneself unless one knew that the action or behaviour would cause the illness, or
it may be claimed that a disease cannot properly be said to be self-inflicted, if the action
which caused the disease was carried out under some kind of compulsion or addiction.
So, perhaps one would wish to modify the definition of ‘self-inflicted illness’ to read, ‘an
illness which has knowingly been caused by the deliberate and free action of an individual’.
This definition would give the principle narrower applicability. For example, it would not
be applicable to diseases caused by bad diet when the individual did not know the effects
of a bad diet. Nor would it apply to cases of illness caused by addiction. But we may still
find that those cases to which it did apply – for example, a motor-cyclist injured in a road
accident through not wearing a crash helmet – suggested to us that there was something
wrong with the principle.
Applying and evaluating principles
For each of the following principles, think of a case to which it applies, and consider
whether this particular application suggests to you that the principle should be modified
or abandoned. This exercise would work well as the basis for a class discussion.
1 No one should have to subsidise, through taxation, services which they themselves
never use.
2 We should not have laws to prevent people from harming themselves, provided their
actions do not harm others.
3 There should be absolute freedom for the newspapers to publish anything they wish.
4 Doctors should be completely honest with their patients.
5 You should never pass on information which you have promised to keep secret.