Sunday, March 16, 2014

Common Mistakes Companies Make With Social Media

Now that we‟ve talked about social media, it‟s important to note what social media is not. This is best answered by talking about the common mistakes companies make with social media.These mistakes fall into three categories: mistakes with strategy, mistakes with tools, and mistakes with messaging. Most of these mistakes are easily avoidable if a company is willing to take time to understand the users‟ wants and needs on each medium.
Common Mistakes Companies Make With Strategy
Some of the most common mistakes companies make with social media revolve around making decisions that aren‟t consistent with having good business sense. Because social media tools are free, some companies tend to take the pasta approach: throwing noodles at a wall to see what sticks. Here are some of the most common mistakes to avoid with social media strategy:
Not developing a social media strategy
Because social media is the hottest trend in marketing, companies assume that all they have to do is set up a Twitter account and a Facebook fan page. This is the equivalent of pulling random magazines out of off the rack and purchasing full page color ad in each one, then throwing together a quick and dirty PowerPoint flyer to run. Just like any other communication medium, social media requires a well-thought out marketing strategy plan.
Perfecting a social media strategy
Even though a social media strategy is important, don‟t wait for the strategy to set up your company‟s accounts. Reserving your company‟s name on various social media sites is of the utmost importance. Furthermore, because it takes time to build social media accounts, every minute you waste by not being there is followers you could be losing.
Gathering followers rather than building a network
There are no shortcuts in social media, and the bottom line is companies have to build relationships with their customers before they can sell anything. Social media may seem free, but the hidden time costs to build relationships Social media is not a quick way to make more sales; in fact, social media actually adds cycle time to the sales process. Just like any other process, a company must consider how much of its resources to invest.
Putting all eggs in one basket
It‟s exciting to see extraordinary results on one form of social media, and tempting to invest all your resources into what‟s working. Try to resist. With the speed at which technology changes, social media is starting to look similar to the fashion cycle: one day you‟re in, the next day you‟re out. Tools fall in and out of fashion all the time – remember Friendster, and more recently, MySpace? Companies that build a large equity on one tool will find themselves with nothing if the tool loses popularity.
Putting the horse before the carriage
Another cliché is the company that doesn‟t follow a logical process with social media and then wonders why it isn‟t seeing results. Common sense comes in handy here. For example, consider a company that doesn‟t currently have many customers, but creates a Facebook fan page and starts promoting it with Facebook ads. The keyword is “fan;” people who haven‟t experienced the product are not likely to join a fan club for it. Make sure your company is following a logical customer acquisition process by thinking about social media from the user-perspective.

Social Media: A Publishing Technology for Everyone


Social media is unlike any other technology in history. It has created a modern-day renaissance for several reasons, which are broken down here:
Social media is online
Social media is something that takes place online. It is a type of communication that takes place outside of in-person meetings, phone calls, or foot traffic. That means social
media is location-independent, which makes it a valuable part of any company‟s business strategy.
Social media is user-generated
Content used to be something that very few people created. Reporters, TV anchors, movie directors, authors, radio DJs, and magazine editors created content, and everyone else consumed it. Now, everyone is a publisher, and the people who use the content are also the ones who create it.
Social media is highly accessible and scalable
Social media is highly accessible and scalable to the public, which means that social media has lots of users and offers plenty of opportunity for companies. Because social media is easy to access, the tools for social media are easy and intuitive enough for the common person to use. Even if you don‟t use social media now, there‟s no reason not to jump in!
Social Media: A Way to Diffuse Information
No other existing media is capable of diffusing information faster than social media. Here‟s why:
Social media is a shift in how people discover, read, and share news, media, and content
Television and newspapers areno longer king when it comes to filtering and sharing news. People are more likely to get their news by reading Trending Topics on Twitter, and they are more likely to share a link to a friend‟s blog post than MSN‟s homepage.
Social media is a fusion of sociology and technology
Social media is user-controlled, which means that sociologic components play a large role in any company‟s social media business strategy. The limits of social media are only set by the limits of the technology of social media tools.
Social media is a dialogue
At one time, companies had a monologue with its customers. Companies put out television commercials or print ads about its products and waited to see whether the sales rolled in to determine success rates. Now, social media allows companies to have a dialogue with its customers and gain valuable feedback and input as it creates the message.
Social media is the democratization of information
Information and messaging for a company was once controlled by its marketing and sales departments. Now, with the democratization of information, no one owns the message about a product or company. Every company must become part of the conversation or risk letting users become the voice of the company.
Social Media: A Way to Build Relationships by Sharing
Humans are interesting in that the way they build relationships is through sharing. It may be sharing something that happened in their personal lives, or it may be sharing something funny on TV. Sharing is an essential component of social media, so let‟s break it down into parts:
Social media is people connecting with others
Social media allows each person to connect with others, which means most of the messaging a person receives is from his or her network. It‟s essential for companies to learn to network with its customers in order to promote the company message.
Social media is content readers become content publishers
Content readers are not only consumers. Social media allows content readers to become content publishers. In this way, social media allows content readers to share the content with their own network of followers by publishing or republishing the message in their own words.
Social media is forming relationships for personal, political, and business use
Social media is not just about content or messaging in a different format. Social media is about relationships. For companies, social media is about creating a more personal relationship with end-consumers to build a network around a service or product.
It is clear that there are many components to social media, but the best way to understand social media is to just try it. I hope by now you‟re excited to see what social media can do for your business!

What is Social Media?


I know that you‟re ready to jump right in and start building a social media strategy. Before we do this, it is essential that you understand what social media is and why so many people are using it. Let‟s start with a definition. According to Wikipedia,
“Social media is online content created by people using highly accessible and scalable publishing technologies. At its most basic sense, social media is a shift in how people discover, read and share news, information and content. It's a fusion of sociology and technology, transforming monologues (one too many) into dialogues (many to many) and is the democratization of information, transforming people from content readers into publishers. Social media has become extremely popular because it allows people to connect in the online world to form relationships for personal, political and business use.”
Wow, that‟s a lot of information! So let‟s break it down into its 3 main components: publishing, information diffusion, and relationship building.
Social media can be leveraged to create wonderful marketing masterpieces. Big name companies like Skittles and Dell have successfully used social media to increase their sales, brand, and the community around their products. Small companies like Kogi BBQ are using social media to increase their sales and dominate the late night food craze in Los Angeles. No matter your company size, social media can be used to start a conversation with your target market and elevate your brand.
SEOP‟s Social Media Consulting Team has successfully worked for clients and built strong campaigns that drive traffic and build community. Through our experience, we have developed the proprietary 5 Pillar Model that teaches our clients how to use social media for business and how to execute the strategies that we develop together.
This EBook is your guide to the 5 Pillars of social media marketing, and how you can leverage social media for your company‟s success:
Pillar 1: What is Social Media? – We breakdown social media marketing into its core components so that you can understand the fundamentals. Social Media is about building a conversation with your clients and consumers.
Pillar2: Common Mistakes Companies Make With Social Media – Though you may understand social media and have a solid foundation to build on top of, it is still possible to fall into the common pitfalls and mistakes that most companies make. Avoid the crucial mistakes and you‟ll be well positioned for social media success.
Pillar 3: The Different Ways Companies Use Social Media – Now that you understand the common pitfalls, it‟s time to take a look at the companies that get it right. Companies are successfully using social media to drive sales, build traffic, hire employees, build a community, and create a positive, well-known brand.
Pillar4: A Framework for Developing a Social Media Strategy – The strategy development portion of a social media campaign is crucial for the success of a company. You must ask and answer the appropriate and relevant questions to develop the right campaign: What social sites are your target market currently using? How much time is needed to consistently interact with your social community?
Pillar5: How toMeasure Return on Investment – Strategy and research are of course only half the battle. Once your strategy is set, it‟s time to execute by either building a team, training your current team, or hiring outside consultants to implement. This is by far the most important part of social media marketing so it is of the utmost importance that you get it right.

Friday, February 7, 2014

IQ Questions & Answers 16-20


17. Which is the odd one out?
femur, mandible, fibula, tibia, patella
18. My watch was correct at noon, after which it started to lose 17 minutes per hour
until six hours ago it stopped completely. It now shows the time as 2.52 pm.
What time is it now?

Answers:
16. D;
17. mandible: it is the jaw bone, the rest are bones in the leg;
18. 10 pm: 12 noon = 12 noon, 1 pm = 12.43, 2 pm = 1.26, 3 pm = 2.09, 4 pm = 2.52, +6 hours = 10 pm;
19. OPTICAL ILLUSION;
20. A: each line across and down
contains five black dots and four white dots;

IQ Questions & Answers 11-16

11. Which word in brackets is closest in meaning to the word in capitals?
MONITOR (observe, order, meddle, intrude, conclude)
 13. Which two words are most opposite in meaning?
liberty, frivolity, chastity, sobriety, irrationality, polarity
15. The following clue leads to which pair of rhyming words?
measure bulk of grass fodder
Answers:
11. observe;
15. weigh hay;

IQ Test 06-10

6. Solve the anagrams to find a well-known saying. The number of letters in each
word is shown.
(**** ** ********)(**** *******)
(asserting craft)(hint of antic)

8. 0, 4, 2, 6, 3, 7, 3.5, ?
What number should replace the question mark?
9. Identify two words (one from each set of brackets) that have a connection
(analogy) with the words in capitals and relate to them in the same way.
LONGITUDE (degree, tropics, meridian)
LATITUDE (parallel, line, equinox)
Answers:
6. fact is stranger than fiction;
7. B: black objects
turn to white and vice versa;
8. 7.5: the sequence proceeds +4, ÷2, +4, etc;
9. meridian, parallel;
10. B: in lines and columns, add the first three numbers to
arrive at the fourth number;

IQ Questions & Answers 01-05


2. Which four-letter word, when placed in the brackets, will complete a word on
the left and start another word on the right?
RAM (****) RIDGE
Answers:
1. E: the number of white dots is increased by one each time, both vertically and
horizontally, and all white dots are connected;
2. PART: RAMPART and PARTRIDGE;
3. B: lines across proceed +2, –3, +2. Lines down proceed –3, +2, –3;
4. glass;
5. 5: (8 + 7) × 5 = 75;

IQ Questions & Answers 36-40





Answers:
36. D: so that one dot appears in the triangle and one
circle; and the other dot appears in the triangle and three circles;
37. HANG GLIDER;
38. G;
39. 0: looking at lines of numbers from the top: 9 × 8 = 72; 72 × 8 =
576; 576 × 8 = 4608;
40. C: each opposite corner block of four squares are identical.

IQ Questions & Answers

32. How many minutes is it before 12 noon if nine minutes ago it was twice as
many minutes past 10 am?
33. Which two words are closest in meaning?
conclave, medley, theme, conglomeration, dissertation, augury
34. broke rage prose cute dared ?
Which word is missing?
palm hymn evil snow take
35. Find five consecutive numbers below that total 22.
7 3 9 6 4 1 3 7 9 3 5 4 1 7 6 5
Answers:

32. 37 minutes: 12 noon less 37 minutes = 11.23, 11.23 less nine minutes = 11.14.
10 am plus 74 minutes (2 × 37) = 11.14;
33. medley, conglomeration;
34. evil: when joined together each pair of words forms another word – brokerage, prosecute, daredevil;
35. 93541;

Thursday, February 6, 2014

Test One: Questions 21-30

Work clockwise round the circles to spell out two eight-letter words that are
synonyms. Each word commences in a different circle, and you must find the
starting point of each. Every letter is used once each and all letters are
consecutive.
24. 10, 30, 32, 96, 98, 294, 296, ?, ?
What two numbers should replace the question marks?
25. able, rot, son, king
Which word below shares a common feature with all the words above?
line, sit, take, hope, night
26. Identify two words (one from each set of brackets) that have a connection
(analogy) with the words in capitals and relate to them in the same way.
SEA (wet, swimmer, ship)
SNOW (mountain, ice, skier)
27. Which word meaning LOCALITY becomes a word meaning TEMPO when a
letter is removed?
28. Alf has four times as many as Jim, and Jim has three times as many as Sid.
Altogether they have 192. How many has each?
29. Which is the only one of the following that is not an anagram of a word
meaning out of this world?
flow under
sexed Utah
enviable blue
icier blend
30. A man has 53 socks in his drawer: 21 identical blue, 15 identical black and 17
identical red. The lights are fused and he is completely in the dark. How many
socks must he take out to make 100 per cent certain he has a pair of black socks?
Answers:

Test One: Questions 13-20

13. Which is the odd one out?
heptagon, triangle, hexagon, cube, pentagon
14. Switch Aturns lights 1 and 2 on/off or off/on
Switch B turns lights 2 and 4 on/off or off/on
Switch C turns lights 1 and 3 on/off or off/on
16. Which word in brackets is closest in meaning to the word in capitals?
BRUNT (dull, edifice, impact, tawny, nonsense)
17. Which of the following is not an anagram of a type of food?
PAST EIGHT
I CAN ROAM
WIN BOAR
CAN PEAK
COOL CHEAT
Answers:

Test One: Questions 07-12

8. Identify two words (one from each set of brackets) that have a connection
(analogy) with the words in capitals and relate to them in the same way.
GRAM (energy, weight, scales)
KNOT (water, rope, speed)



11. How many minutes is it before 12 noon, if 48 minutes ago it was twice as many
minutes past 9 am?
12. Complete the five words below in such a way that the two letters that end the
first word also start the second word, and the two letters that end the second
word also start the third word etc. The same two letters that end the fifth word
also start the first word, to complete the cycle.
** IV **
** OT **
** IC **
** NG **
** RA **
Answer:
7. C: in all the others the black circle is connected to three white circles. In C it is only
connected to two white circles; 8. weight, speed;
10. B: the rest are the same figure rotated;
11. 44 minutes: 12 noon less 44 minutes =
11.16, 11.16 less 48 minutes = 10.28, 9 am plus 88 minutes (44 × 2) = 10.28;
12. SHIVER,
EROTIC, ICICLE, LENGTH, THRASH;

Test One: Questions 01-06

2. Which word in brackets is most opposite to the word in capitals?
PROSCRIBE (allow, stifle, promote, verify, indict)
3. 0, 1, 2, 4, 6, 9, 12, 16, ?
What number should replace the question mark?
4. Which number is the odd one out?
9678 4572 5261 5133 3527 6895 7768
5. Isotherm is to temperature as isobar is to: atmosphere, wind, pressure, latitude,
current
Answer:
1. B;
2. allow;
3. 20: add 1, 1, 2, 2, 3, 3, 4, 4;
4. 3527: in the others the sum of the first two
numbers is equal to the sum of the second two numbers, for example 5 + 2 = 6 + 1;
5. pressure;

Intelligence Quotient

Of the different methods that purport to measure intelligence, the most famous is
the IQ (Intelligence Quotient) test, which is a standardized test designed to measure
human intelligence as distinct from attainments.
Intelligence quotient is an age-related measure of intelligence level. The word
quotient means the result of dividing one quantity by another, and one definition of
intelligence is mental ability or quickness of mind.
Usually, IQ tests consist of a graded series of tasks, each of which has been standardized
with a large representative population of individuals in order to establish
an average IQ of 100 for each test.
It is generally accepted that a person’s mental ability develops at a constant rate
until about the age of 13, after which development has been shown to slow down,
and beyond the age of 18 little or no improvement is found.
When the IQ of a child is measured, the subject attempts an IQ test that has been
standardized, with an average score recorded for each age group. Thus a 10-yearold
child who scored the result that would be expected of a 12-year-old would have
an IQ of 120, or 12/10 × 100:
Because after the age of 18 little or no improvement is found, adults have to be
judged on an IQ test whose average score is 100, and the results graded above and
below this norm according to known test scores.
Like so many distributions found in nature, the distribution of IQ takes the form
of a fairly regular bell curve (see Figure 0.1 below) in which the average score is 100
and similar proportions occur both above and below this norm.
There are a number of different types of intelligence tests, for example Cattell,
Stanford-Binet and Wechsler, and each has its own different scales of intelligence.
The Stanford-Binet is heavily weighted with questions involving verbal abilities
and is widely used in the United States. The Weschler scales consist of two separate
verbal and performance sub-scales each with its own IQ rating. On the Stanford-
Binet scale half the population fall between 90 and 110 IQ, half of them above 100
and half of them below; 25 per cent score above 110; 11 per cent above 120; 3 per cent
above 130 and 0.6 per cent above 140. At the other end of the scale the same kind of
proportion occurs.
Although it is IQ tests that we are specifically concerned with in this book it
should be pointed out that IQ tests are just one part of what is generally referred to
as psychometric testing. Such test content may be addressed to almost any aspect of
our intellectual or emotional make-up, including personality, attitude, intelligence
or emotion. Psychometric tests are basically tools used for measuring the mind; the
word metric means measure and the word psycho means mind. There are two types
of psychometric tests that are usually used in tandem by employers. These are
aptitude tests, which assess your abilities, and personality questionnaires, which
assess your character and personality.

Tuesday, January 28, 2014

Solving the Puzzle

Expanding the answer choices
The body of a logic puzzle question contains a
(unique) wh-term (typically “which of the following”),
a modality (such as “must be true” or
“could be true”), and (possibly) an added condition.
Each answer choice is expanded by substituting
its SL form for the wh-term in the question
body. For example, the expansion for answer
choice (A) of question 1 in Figure 1 would
be the SL form corresponding to: “If sculpture
D is exhibited . . . , then [Sculpture C is exhibited
in room 1 ] must be true”.
Translating SL to FOL
To translate an SL representation to pure FOL,
we eliminate event variables by replacing an SL
form 9e.P(e)^R1(e, t1)^..^Rn(e, tn) with the
FOL form P(t1, .., tn). An ordering is imposed
on role names to guarantee that arguments are
always used in the same order in relations. Numeric
quantifiers are encoded in FOL in the obvious
way, e.g., Q(¸2, x, ', Ã) is translated to
9x19x2. x1 6= x2 ^('^Ã)[x1/x]^('^Ã)[x2/x].
Each expanded answer choice contains one
modal operator. Modals are moved outward
of negation as usual, and outward of conditionals
by changing A ! ¤B to ¤(A ! B) and
A ! §B to §(A^B). A modal operator in the
outermost scope can then be interpreted as a
directive to the reasoning module to test either
entailment (¤) or consistency (§) between the
preamble and the expanded answer choice.
Using FOL reasoners
There are two reasons for using both theorem
provers and model builders. First, they
are complementary reasoners: while a theorem
prover is designed to demonstrate the inconsistency
of a set of FOL formulas, and so can
find the correct answer to “must be true” questions
through proof by contradiction, a model
builder is designed to find a satisfying model,
and is thus suited to finding the correct answer
to “could be true” questions.7 Second, a
reasoner may take a very long time to halt on
some queries, but the complementary reasoner
may still be used to answer the query in the
context of a multiple-choice question through
a process of elimination. Thus, if the model
builder is able to show that the negations of four
choices are consistent with the preamble (indicating
they are not entailed), then it can be
concluded that the remaining choice is entailed
by the preamble, even if the theorem prover has
not yet found a proof.
We use the Otter 3.3 theorem prover and
the MACE 2.2 model builder (McCune, 1998).8
The reasoning module forks parallel subprocesses,
two per answer choice (one for Otter,
one for MACE). If a reasoner succeeds for an answer
choice, the choice is marked as correct or
incorrect, and the dual sub-process is killed. If
all answer-choices but one are marked incorrect,
the remaining choice is marked correct even if
its sub-processes did not yet terminate.

Filling Information Gaps


To find a unique answer to every question of a
puzzle, background information is required beyond
the literal meaning of the text. In Question
1 of Figure 1, for example, without the constraint
that a sculpture may not be exhibited in
multiple rooms, answers B, D and E are all correct.
Human readers deduce this implicit constraint
from their knowledge that sculptures are
physical objects, rooms are locations, and physical
objects can have only one location at any
given time. In principle, such information could
be derived from ontologies. Existing ontologies,
however, have limited coverage, so we also plan
to leverage information about expected puzzle
structures.
Most puzzles we collected are formalizable
as constraints on possible tuples of objects.
The crucial information includes: (a)
the object classes; (b) the constants naming
the objects; and (c) the relations used to
link objects, together with their arguments’
classes. For the sculptures puzzle, this information
is: (a) the classes are sculpture and
room; (b) the constants are C,D,E, F, G,H for
sculpture and 1, 2, 3 for room; (c) the relation
is exhibit(sculpture, room). This information is
obtainable from the parse trees and SL formulas.
Within this framework, implicit world knowledge
can often be recast as mathematical properties
of relations. The unique location constraint
on sculptures, for example, is equivalent
to constraining the mapping from sculptures to
rooms to be injective (one-to-one); other cases
exist of constraining mappings to be surjective
(onto) and/or total. Such properties can be obtained
from various sources, including cardinality
of object classes, pure lexical semantics, and
even through a systematic search for sets of implicit
constraints that, in combination with the
explicitly stated constraints, yield exactly one
answer per question. Figure 3 shows the num
Figure 3: Effect of explicit and implicit constraints
on constraining the number of possible
models
ber of possible models for the sculptures puzzle
as affected by explicit and implicit constraints
in the preamble.

Reference Resolution


SL is not convenient for representing directly
the meaning of referring expressions because (as
in FOL) the extent of a quantifier in a formula
cannot be extended easily to span variables in
subsequent formulas. We therefore use Discourse
Logic (DL), which is SL extended with
DRSes and ®-expressions as in (Blackburn and
Bos, 2000) (which is based on Discourse Representation
Theory (Kamp and Reyle, 1993) and
its recent extensions for dealing with presuppositions).
6 This approach (like other dynamic semantics
approaches) supports the introduction
of entities that can later be referred back to,
and explains when indefinite NPs should be in-
terpreted as existential or universal quantifiers
(such as in the antecedent of conditionals). The
reference resolution framework from (Blackburn
and Bos, 2000) provides a basis for finding all
possible resolutions, but does not specify which
one to choose. We are working on a probabilistic
reference-resolution module, which will pick
from the legal resolutions the most probable one
based on features such as: distance, gender, syntactic
place and constraints, etc.
5E.g. there is a strong preference for ‘each’ to take
wide scope, a moderate preference for the first quantifier
in a sentence to take wide scope, and a weak preference
for a quantifier of the grammatical subject to take wide
scope.
6Thus, the URs calculated from parse trees are actually
URs of DL formulas. The scope resolution phase
resolves the URs to explicit DL formulas, and the reference
resolution phase converts these formulas to SL
formulas.

Scope Resolution


One way of dealing with scope ambiguities is by
using underspecified representations (URs). A
UR is a meta-language construct, describing a
set of object-language formulas.3 It describes
the pieces shared by these formulas, but possibly
underspecifies how they combine with each
other. A UR can then be resolved to the specific
readings it implicitly describes.
We use an extension of Hole Semantics
(Blackburn and Bos, 2000)4 for expressing URs
and calculating them from parse trees (modulo
the modifications in §5). There are several advantages
to this approach. First, it supports
the calculation of just one UR per sentence in
a combinatorial process that visits each node of
the parse tree once. This contrasts with approaches
such as Categorial Grammars (Carpenter,
1998), which produce explicitly all the
scopings by using type raising rules for different
combinations of scope, and require scanning the
entire parse tree once per scoping.
Second, the framework supports the expression
of scoping constraints between different
parts of the final formula. Thus it is possible
to express hierarchical relations that must exist
between certain quantifiers, avoiding the problems
of naive approaches such as Cooper storage
(Cooper, 1983). The expression of scoping
constraints is not limited to quantifiers and is
applicable to all other operators as well. Moreover,
it is possible to express scope islands by
constraining all the parts of a subformula to be
outscoped by a particular node.
Another advantage is that URs support efficient
elimination of logically-equivalent readings.
Enumerating all scopings and using
a theorem-prover to determine logical equivalences
requires O(n2) comparisons for n scopings.
Instead, filtering methods (Chaves, 2003)
can add tests to the UR-resolution process,
disallowing certain combinations of operators.
Thus, only one ordering of identical quantifiers
is allowed, so “A man saw a woman” yields
only one of its two equivalent scopings. We also
filter 8¤ and 9§ combinations, allowing only
the equivalent ¤8 and §9. However, numeric
quantifiers are not filtered (the two scopings of
“Three boys saw three films” are not equivalent).
Such filtering can result in substantial
speed-ups for sentences with a few quantifiers
(see (Chaves, 2003) for some numbers).
Finally, our true goal is determining the correct
relative scoping in context rather than enumerating
all possibilities. We are developing
a probabilistic scope resolution module that
learns from hand-labeled training examples to
predict the most probable scoping, using features
such as the quantifiers’ categories and
their positions and grammatical roles in the sentence.

Combinatorial Semantics


Work in NLP has shifted from hand-built grammars
that need to cover explicitly every sentence
structure and that break down on unexpected
inputs to more robust statistical parsing.
However, grammars that involve precise semantics
are still largely hand-built (e.g. (Carpenter,
1998; Copestake and Flickinger, 2000)). We aim
at extending the robustness trend to the semantics.
We start with the compositional semantics
framework of (Blackburn and Bos, 2000; Bos,
2001) and modify it to achieve greater robustness
and coverage.2
One difference is that our lexicon is kept
very small and includes only a few words with
special semantic entries (like pronouns, connectives,
and numbers). Open-category words
come with their part-of-speech information in
the parse trees (e.g. (NN dog)), so their semantics
can be obtained using generic semantic templates
(but cf. §3.5).
In classic rule-to-rule systems of semantics
like (Blackburn and Bos, 2000), each syntactic
rule has a separate semantic combination rule,
and so the system completely fails on unseen
syntactic structures. The main distinguishing
goal of our approach is to develop a more robust
process that does not need to explicitly specify
how to cover every bit of every sentence. The
system incorporates a few initial ideas in this
direction.
First, role and argument-structure information
for verbs is expensive to obtain and unreliable
anyway in natural texts. So to deal with
verbs and VPs robustly, their semantics in our
system exports only an event variable rather
than variables for the subject, the direct object,
etc. VP modifiers (such as PPs and ADVPs)
combine to the VP by being applied on the exported
event variable. NP modifiers (including
the sentence subject) are combined to the event
variable through generic roles: subj, np1, np2,
etc. The resulting generic representations are
suitable in the puzzles domain because usually
only the relation between objects is important
and not their particular roles in the relation.
This is true for other tasks as well, including
some broad-coverage question answering.
All NPs are analyzed as generalized quantifiers,
but a robust compositional analysis for
the internal semantics of NPs remains a serious
challenge. For example, the NP “three rooms”
should be analyzed as Q(num(3), x, room(x), ..),
but the word “three” by itself does not contribute
the quantifier – compare with “at least
three rooms” Q(¸3, x, room(x), ..). Yet another
case is “the three rooms” (which presupposes
2Our system uses a reimplementation in Lisp rather
than their Prolog code.
a group g such that g µ room ^ |g| = 3). The
system currently handles a number of NP structures
by scanning the NP left-to-right to identify
important elements. This may make it easier
than a strictly compositional analysis to extend
the coverage to additional cases.
All other cases are handled by a flexible combination
process. In case of a single child, its
semantics is copied to its parent. With more
children, all combinations of applying the semantics
of one child to its siblings are tried,
until an application does not raise a type error
(variables are typed to support type checking).
This makes it easier to extend the coverage
to new grammatical constructs, because usually
only the lexical entry needs to be specified, and
the combination process takes care to apply it
correctly in the parse tree.

Morpho-Syntactic Analysis


While traditional hand-built grammars often include
a rich semantics, we have found their
coverage inadequate for the logic puzzles task.
For example, the English Resource Grammar
(Copestake and Flickinger, 2000) fails to parse
any of the sentences in Figure 1 for lack of coverage
of some words and of several different syntactic
structures; and parsable simplified versions
of the text produce dozens of unranked
parse trees. For this reason, we use a broadcoverage
statistical parser (Klein and Manning,
2003) trained on the Penn Treebank. In addition
to robustness, treebank-trained statistical
parsers have the benefit of extensive research
on accurate ambiguity resolution. Qualitatively,
we have found that the output of the parser on
logic puzzles is quite good (see §10). After parsing,
each word in the resulting parse trees is
converted to base form by a stemmer.
A few tree-transformation rules are applied
on the parse trees to make them more convenient
for combinatorial semantics. Most of them
are general, e.g. imposing a binary branching
structure on verb phrases, and grouping expressions
like “more than”. A few of them correct
some parsing errors, such as nouns marked as
names and vice-versa. There is growing awareness
in the probabilistic parsing literature that
mismatches between training and test set genre
can degrade parse accuracy, and that small
amounts of correct-genre data can be more important
than large amounts of wrong-genre data
(Gildea, 2001); we have found corroborating evidence
in misparsings of noun phrases common
in puzzle texts, such as “Sculptures C and E”,
which do not appear in the Wall Street Journal
corpus. Depending on the severity of this problem,
we may hand-annotate a small amount of
puzzle texts to include in parser training data.

Challenges

Combinatorial Semantics
The challenge of combinatorial semantics is to
be able to assign exactly one semantic representation
to each word and sub-phrase regardless
of its surrounding context, and to combine
these representations in a systematic way until
the representation for the entire sentence is obtained.
There are many linguistic constructions
in the puzzles whose compositional analysis is
difficult, such as a large variety of noun-phrase
structures (e.g., “Every sculpture must be exhibited
in a different room”) and ellipses (e.g.,
“Brian saw a taller man than Carl [did]”).
Scope Ambiguities
A sentence has a scope ambiguity when quantifiers
and other operators in the sentence can
have more than one relative scope. E.g., in constraint
(4) of Figure 1, “each room” outscopes
“at least one sculpture”, but in other contexts,
the reverse scoping is possible. The challenge
is to find, out of all the possible scopings, the
appropriate one, to understand the text as the
writer intended.
Reference Resolution
The puzzle texts contain a wide variety of
anaphoric expressions, including pronouns, definite
descriptions, and anaphoric adjectives. The
challenge is to identify the possible antecedents
that these expressions refer to, and to select
the correct ones. The problem is complicated
by the fact that anaphoric expressions interact
with quantifiers and may not refer to any particular
context element. E.g., the anaphoric expressions
in “Sculptures C and E are exhibited
in the same room” and in “Each man saw a different
woman” interact with sets ({C,E} and
the set of all men, respectively).
Plurality Disambiguation
Sentences that include plural entities are potentially
ambiguous between different readings:
distributive, collective, cumulative, and combinations
of these. For example, sentence 1 in
Figure 1 says (among other things) that each
of the six sculptures is displayed in one of the
three rooms – the group of sculptures and the
group of rooms behave differently here. Plurality
is a thorny topic which interacts in complex
ways with other semantic issues, including
quantification and reference.
Lexical Semantics
The meaning of open-category words is often
irrelevant to solving a puzzle. For example,
the meaning of “exhibited”, “sculpture”, and
“room” can be ignored because it is enough to
understand that the first is a binary relation
that holds between elements of groups described
by the second and third words.1 This observa-
tion provides the potential for a general system
that solves logic puzzles.
Of course, in many cases, the particular
meaning of open-category words and other expressions
is crucial to the solution. An example
is provided in question 2 of Figure 1: the system
has to understand what “a complete list”
means. Therefore, to finalize the meaning computed
for a sentence, such expressions should be
expanded to their explicit meaning. Although
there are many such cases and their analysis is
difficult, we anticipate that it will be possible to
develop a relatively compact library of critical
puzzle text expressions. We may also be able
to use existing resources such as WordNet and
FrameNet.
Information Gaps
Natural language texts invariably assume some
knowledge implicitly. E.g., Figure 1 does not explicitly
specify that a sculpture may not be exhibited
in more than one room at the same time.
Humans know this implicit information, but a
computer reasoning from texts must be given
it explicitly. Filling these information gaps is
a serious challenge; representation and acquisition
of the necessary background knowledge are
very hard AI problems. Fortunately, the puzzles
domain allows us to tackle this issue, as
explained in §8.
Presuppositions and Implicatures
In addition to its semantic meaning, a natural
language text conveys two other kinds of content.
Presuppositions are pieces of information assumed
in a sentence. Anaphoric expressions
bear presuppositions about the existence of entities
in the context; the answer choice “Sculptures
C and E” conveys the meaning {C,E},
but has the presupposition sculpture(C) ^
sculpture(E); and a question of the form A !
B, such as question 1 in Figure 1, presupposes
that A is consistent with the preamble.
Implicatures are pieces of information suggested
by the very fact of saying, or not saying,
something. Two maxims of (Grice, 1989)
dictate that each sentence should be both consistent
and informative (i.e. not entailed) with
respect to its predecessors. Another maxim dictates
saying as much as required, and hence the
sentence “No more than three sculptures may be
exhibited in any room” carries the implicature
that in some possible solution, three sculptures
are indeed exhibited in the same room.
Systematic calculation of presuppositions and
implicatures has been given less attention in
NLP and is less understood than the calculation
of meaning. Yet computing and verifying
them can provide valuable hints to the system
whether it understood the meaning of the text
correctly.

System Overview


This section explains the languages we use to
represent the content of a puzzle. Computing
the representations from a text is a complex process
with several stages, as shown in Figure 2.
Most of the stages are independent of the puzzles
domain. Section 3 reviews the main challenges
in this process, and later sections outline
the various processing stages. More details of
some of these stages can be found at (Stanford
NLP Group, 2004).
First-Order Logic (FOL)
An obvious way of solving logic puzzles is to
use off-the-shelf FOL reasoners, such as theorem
provers and model builders. Although most
GRE logic puzzles can also be cast as constraintsatisfaction
problems (CSPs), FOL representations
are more general and more broadly applicable
to other domains, and they are closer
to the natural language semantics. GRE logic
puzzles have finite small domains, so it is practicable
to use FOL reasoners.
The ultimate representation of the content of
a puzzle is therefore written in FOL. For example,
the representation for the first part of
constraint (4) in Figure 1 is: 8x.room(x) !
9y.sculpture(y)^exhibit(y, x). (The treatment
of the modal ‘must’ is explained in §9.2).
Semantic Logic (SL)
Representing the meaning of natural language
texts in FOL is not straightforward because
human languages employ events, plural entities,
modal operations, and complex numeric
expressions. We therefore use an intermediate
representation, written in Semantic Logic
(SL), which is intended to be a general-purpose
semantic representation language. SL extends
FOL with event and group variables, the modal
operators ¤ (necessarily) and § (possibly), and
Generalized Quantifiers (Barwise and Cooper,

1981) Q(type, var, restrictor, body), where type
can be 8, 9, at-least(n), etc. To continue the example,
the intermediate representation for the
constraint is:
¤Q(8, x1, room(x1),Q(¸1, x2, sculpture(x2),
9e.exhibit(e) ^ subj(e, x2) ^ in(e, x1)))
Non-determinism
Although logic puzzles are carefully designed
to reduce ambiguities to ensure that there
is exactly one correct answer per question,
there are still many ambiguities in the analysis,
such as multiple possibilities for syntactic
structures, pronominal reference, and quantifier
scope. Each module ranks possible output representations;
in the event that a later stage reveals
an earlier choice to be wrong (it may be
inconsistent with the rest of the puzzle, or lead
to a non-unique correct answer to a question),
the system backtracks and chooses the next-best
output representation for the earlier stage.

Why Logic Puzzles?


Logic puzzles have a number of attractive characteristics
as a target domain for research placing
a premium on precise inference.
First, whereas for humans the language understanding
part of logic puzzles is trivial but
the reasoning is difficult, for computers it is
clearly the reverse. It is straightforward for a
computer to solve a formalized puzzle, so the
research effort is on the NLP parts rather than
a difficult back-end AI problem. Moreover, only
a small core of world knowledge (prominently,
temporal and spatial entailments) is typically
crucial to solving the task.
Second, the texts employ everyday language:
there are no domain-restrictions on syntactic
and semantic constructions, and the situations
described by the texts are diverse.
Third, and most crucial, answers to puzzle
questions never explicitly appear in the text and

Preamble: Six sculptures – C, D, E, F, G, and H
– are to be exhibited in rooms 1, 2, and 3 of an art
gallery. The exhibition conforms to the following
conditions:
(1) Sculptures C and E may not be exhibited in
the same room.
(2) Sculptures D and G must be exhibited in the
same room.
(3) If sculptures E and F are exhibited in the same
room, no other sculpture may be exhibited in that
room.
(4) At least one sculpture must be exhibited in each
room, and no more than three sculptures may be
exhibited in any room.
Question 1: If sculpture D is exhibited in room
3 and sculptures E and F are exhibited in room 1,
which of the following may be true?
(A) Sculpture C is exhibited in room 1.
(B) No more than 2 sculptures are exhibited in
room 3.
(C) Sculptures F and H are exhibited in the same
room.
(D) Three sculptures are exhibited in room 2.
(E) Sculpture G is exhibited in room 2.
Question 2: If sculptures C and G are exhibited
in room 1, which of the following may NOT be a
complete list of the sculpture(s) exhibited in room
2?
(A) Sculpture D (B) Sculptures E and H (C). . .
must be logically inferred from it, so there is
very little opportunity to use existing superficial
analysis methods of information-extraction and
question-answering as a substitute for deep understanding.
A prerequisite for successful inference
is precise understanding of semantic phenomena
like modals and quantifiers, in contrast
with much current NLP work that just ignores
such items. We believe that representations
with a well-defined model-theoretic semantics
are required.
Finally, the task has a clear evaluation metric
because the puzzle texts are designed to yield
exactly one correct answer to each multiplechoice
question. Moreover, the domain is another
example of “found test material” in the
sense of (Hirschman et al., 1999): puzzle texts
were developed with a goal independent of the
evaluation of natural language processing systems,
and so provide a more realistic evaluation
framework than specially-designed tests such as
TREC QA.
While our current system is not a real world
application, we believe that the methods being
developed could be used in applications such as
a computerized office assistant that must understand
requests such as: “Put each file containing
a task description in a different directory.”

Solving Logic Puzzles:

Traditional approaches to natural language understanding
(Woods, 1973; Warren and Pereira,
1982; Alshawi, 1992) provided a good account
of mapping from surface forms to semantic representations,
when confined to a very limited
vocabulary, syntax, and world model, and resulting
low levels of syntactic/semantic ambiguity.
It is, however, difficult to scale these
methods to unrestricted, general-domain natural
language input because of the overwhelming
problems of grammar coverage, unknown words,
unresolvable ambiguities, and incomplete domain
knowledge. Recent work in NLP has
consequently focused on more robust, broadcoverage
techniques, but with the effect of
overall shallower levels of processing. Thus,
state-of-the-art work on probabilistic parsing
(e.g., (Collins, 1999)) provides a good solution
to robust, broad coverage parsing with automatic
and frequently successful ambiguity resolution,
but has largely ignored issues of semantic
interpretation. The field of Question Answering
(Pasca and Harabagiu, 2001; Moldovan et al.,
2003) focuses on simple-fact queries. And socalled
semantic parsing (Gildea and Jurafsky,
2002) provides as end output only a flat classification
of semantic arguments of predicates,
ignoring much of the semantic content, such as
quantifiers.
A major research question that remains unanswered
is whether there are methods for getting
from a robust “parse-anything” statistical
parser to a semantic representation precise
enough for knowledge representation and automated
reasoning, without falling afoul of the
same problems that stymied the broad application
of traditional approaches. This paper
presents initial work on a system that addresses
this question. The chosen task is solving logic
puzzles of the sort found in the Law School Admission
Test (LSAT) and the old analytic section
of the Graduate Record Exam (GRE) (see
Figure 1 for a typical example). The system integrates
statistical parsing, “on-the-fly” combinatorial
synthesis of semantic forms, scope- and
reference-resolution, and precise semantic representations
that support the inference required
for solving the puzzles. Our work complements
research in semantic parsing and TRECstyle
Question Answering by emphasizing complex
yet robust inference over general-domain
NL texts given relatively minimal lexical and
knowledge-base resources.

Monday, January 27, 2014

Limitations of Analytic Results


Any study of this nature is necessarily limited in several
respects. First of all, the survey approach used here is but one of
several that can be used to inform decisions about extending the
measurement of analytical abilities. Tucker’s (1985) results provide
useful information from different perspectives--those of cognitive
psychologists and philosophers. Other approaches that might also be
informative include the methods of cognitive psychology, which could
be used not only to supplement but also to extend the survey results
reported here. These methods would seem especially appropriate
because they relate more directly to actual skills and abilities than
to perceptions.
Second, the diversity that characterizes graduate education
renders the results of this study incomplete. Some clues have been
gained as to similarities and differences among a limited sample of
graduate fields. However, the substantial differences found among
fields are a source of concern, since we cannot be certain whether or
not some other sample of fields might exhibit even greater variation.
Finally, as several survey respondents pointed out, many of the
reasoning skills about which we asked are expected to, and do, improve
as the result of graduate study. In some sense these skills may
represent competencies that differ from, say, the verbal skills
measured by the GENEG eneral Test in the respect that these analytical
skills may develop much more rapidly. A question of interest, then,
is how to accommodate the measurement of these skills in the context
of graduate admissions testing, which currently focuses on the
predictive effectiveness of abilities that are presumed to develop
slowly over a significant period of time.
Future Directions
The study suggested several possible future directions. Because
of the substantial variation among fields, one possibility would
involve extending the survey to include additional fields of graduate
study. Some refinements could now be made on the basis of past
experience. For example, ratings of the frequency with which skills
are used, as well as the frequencies of errors and critical incidents,
could probably be omitted without much loss of information. OII the
other hand, it would seem desirable to add categories allowing ratings
of the differential importance of various reasoning skills at
different stages of graduate education, ranging from entry level to
dissertation writing.
Finally, based on the reasoning skills identified as most
important, criterion tasks might be developed against which the
validity of the current GRE analytical measure could be gauged. This
strategy would make especially good sense for those important skills
that may not be measurable in an operational test like the GRE General
Test, but which might correlate highly with the abilities now measured
by the test. One specific possibility would be the development of
rating forms, which could be used by faculty to rate the analytical
abilities of their students. These ratings could then be used as a
criterion against which GRE analytical scores could be judged.

Implications of Analytic Results


In providing some information on faculty perceptions of the
involvement of various reasoning skills in their disciplines, the
study has, we hope, implications for developing future versions of the
GRE analytical ability measure. Converting this information to
operational test items will represent a significant step, however, and
it is not crystal clear at this stage exactly how helpful these
results may be eventually. Nonetheless, the findings do seem to
contain several useful bits of information:
1. Among the specific reasoning skills perceived as the most
important were several, e.g., "deducing new information from
a set of relationships" and "understanding, evaluating, and
analyzing arguments," that seem well represented in the two
item types (analytical reasoning and logical reasoning)
currently included in the analytical section of the General
Test. This suggests that these item types should continue to
play a role in future editions of the GRE General Test.
2. Some skills that are not measured by the current version of
the analytical measure were rated as very important.
"Reasoning or problem solving in situations in which all the
needed information is not known" was among the skills rated
as most important in each discipline, but currently
unmeasured, at least in any explicit manner, by the
analytical measure. In this regard, however, the previous
GRE-sponsored work of Ward, Carlson, and Woisetschlager
(1983) is noteworthy. These investigators studied
“ill-structured” problems, i . e. , problems that do not provide
all the information necessary to solve the problem, and noted
the resemblance of these problems to one variant of the
logical reasoning item type used in the analytical measure.
They concluded that there was no indication that “illstructured”
problems measure different aspects of analytical
ability than do “well-structured” problems, and therefore
that “ill-structured” problems could not be expected to
extend the range of cognitive skills already measured by the
GRE General Test. They did note, however, that the
“ill-structured” item type could be used to increase the
variety of items types in the test. The findings of the
current study suggest that the inclusion of this item type
would probably meet with faculty approval in most fields of
study.
3. With respect to their perceived importance, skills involving
the generation of hypotheses/alternatives/explanations tended
to cluster together, and the inability to generate hypotheses
independently was one of the incidents rated consistently as
having a substantial effect on faculty perceptions of
students’ analytical abilities.
A number of years ago the GRE Board sponsored a series
of studies (Frederiksen & Ward, 1978; Ward, Frederiksen, &
Carlson, 1978; Ward & Frederiksen, 1977; Frederiksen & Ward,
1975) that explored the development and validation of tests
of scientific thinking, including one especially promising
item type called “Formulating Hypotheses,” which required
examinees to generate hypotheses. Although the research
suggested that this item type complemented the GRE verbal and
quantitative measures in predicting success in graduate
school, the work was discontinued, largely because of
problems in scoring items that require examinees to
construct, not merely choose, a correct response. Carlson
and Ward (1986) have proposed to renew work on the
“Formulating Hypotheses” item type in light of recent
advances in evaluating questions that involve constructed
responses. The results of the faculty survey reported here
would appear to support this renewal.
4. Some of the highly important skills that are currently
well represented in the analytical measure are viewed as more
important for success in some disciplines than in others.
For example, “understanding, analyzing, and evaluating
arguments” was seen as more important in English than in
computer science. However, some skills seen as highly
important in some disciplines but not in others may not be as
well represented currently. For example, “breaking down
complex problems into simpler ones” was perceived as
extremely important in computer science and engineering but
not at all important in English. This would suggest,
perhaps, the need to balance the inclusion of items
reflecting particular skills, so that skills thought to be
important (or unimportant) in particular disciplines are
neither over- nor underrepresented.
The several dimensions that appear to underlie clusters of
reasoning skills may provide an appropriate way to extend the
current test specifications for the analytical measure,
especially if new item types are developed to represent some
of these dimensions.
The reasoning skills that were rated as very important, and
consistently so, across disciplines point to a potential
common core of skills that could be appropriately included in
an “all-purpose” measure like the GRE General Test. Other
skills judged to be very important in only a few disciplines
might best be considered for extending the measurement of
reasoning skills in the GRE Subject Tests. Faculty comments
about the difficulty in separating reasoning from subject
matter knowledge would seem to support this strategy.

Other Comments from Respondents


A number of general comments were made about the study--some
positive and some negative. The study was described alternately as
“very well done” and “interesting, ” but also, by one respondent, as a
“complete waste of time. ” Most of the comments were positive,
however, and many pertained more specifically to the kinds of
questions that were asked. The consensus seemed to be that the
questionnaire was not easy to complete. Moreover, faculty in the
several disciplines sometimes had different ideas as to what kinds of
questions would have been appropriate. For example, one English
faculty member noted the lack of questions on the use of language in
critical writing, and a computer science faculty member observed that
questions on abilities involved in formulating proofs, which are vital
to success in computer science, were only partially covered in the
questionnaire. An education faculty member noted that the survey did
a better job of assessing skills associated with hypothesis-testing
than with other research skills.
Along these same lines, a number of other respondents also
believed that the questions were more relevant to other disciplines
than to theirs. Several computer science professors, for example,
characterized the questions as oriented more toward argument than
problem solving, in which they had greater interest. An engineering
professor said that some of the questions were more pertinent to
educational research than to scientific or technical research, and one
English faculty found that questions seemed “geared to the hard
sciences. ” Finally, some noted ambiguities or redundancies, or
lamented that the questions were “too fine.” Even with these
difficulties, however, most of the comments about questions were
positive: “Items seem especially well chosen,” “questions are
appropriate, ” “questions were quite thorough,” “a good set of
questions, ” “topics covered are critical,” and “your lists are right
on target. ” The majority of comments, therefore, suggested that the
questionnaire was pitched at about the right level and included
appropriate kinds of reasoning skills.
-lOA
number of comments were made about the relationship between
subject matter and analytical skills, e.g., that successful problem
solving is predicated on having specific knowledge in a field. One
respondent believed that the questionnaire downplayed the importance
of “context effects” in favor of “strict reasoning ability,” and
another noted that the measurement of analytical abilities is quite
discipline specific. Another commented on the difficulty of measuring
analytical ability without regard to the amount of knowledge
available.
Several faculty commented on the development of analytical skills
in graduate school and on the differential importance of these skills
at various stages of graduate education. one respondent said, “I
rated entering behavior or behavior across the entire program
(courses, internships, dissertation). If I were to rate the
dissertation experience alone, the ratings would have been much
higher. ” Many noted that by the end of their programs, skills would
be expected to increase and various reasoning errors could be expected
to occur less frequently: “Entering students are more likely to make
these errors and graduates to make far fewer.” Another said, “In some
sense, the essence of graduate training is analytical skills.’ “These
are skills which students acquire. Flhen they enter they make most of
the mistakes you mentioned. If they can’t learn, they leave the
program. ” Another said, “I’m more concerned about the presence of
these behaviors after my course than before it. One simply does not
harshly judge a beginning student who makes an error, but one could be
very critical of a student about to finish a Ph.D. thesis....’

Factor Analytic Results


To condense the many questions into a more manageable form,
factor analyses were computed for each section of the questionnaire.
For the section on reasoning skills, only the importance ratings were
analyzed because they were so highly correlated with frequency
ratings. Because frequency ratings were slightly less correlated with
ratings of seriousness and criticality in the other two sections, they
too were analyzed for the questionnaire sections on reasoning errors
and critical incidents.
The reader should bear in mind that the factors resulting from
this analysis should not be construed as representing dimensions of
analytical ability, but rather only as reflecting the dimensions that
underlie faculty perceptions of analytical abilities. These
dimensions merely reflect the extent to which graduate faculty tended
to rate certain skills as about equally important (or equally
unimportant ) , not the degree to which these dimensions represent
“factors of the mind.” Thus, the results presented below are intended
to provide a parsimonious representation of faculty perceptions rather
than a basis for postulating distinct analytical abilities.
Reasoning skills. For the ratings of importance of reasoning
skills, the largest eigenvalues were 16.4, 3.9, 2.4, 1.6, and 1.1, and
the application of a scree test (Cattell, 1966) suggested the
appropriateness of a five-factor solution, which was then rotated
according to the varimax criterion (Kaiser, 1958). The five-factor
varimax rotation accounted for 80% of the common variance. The factor
loadings and communalities are given in Appendix B. Table 7
summarizes the variables that were most instrumental in defining each
factor.
Factor I, which accounted for about a third of the common
variance, was characterized by highest loadings, generally, from
skills involving arguments. Thus, Factor I seems to involve a kind of
critical thinking related to argumentation.
Factor II accounted for about 29% of the common variance, and was
defined primarily by variables related to the drawing of conclusions,
e.g. I generating valid explanations, supporting conclusions with
sufficient data, and drawing sound inferences from observations. The
conclusion-oriented skills that define this second critical thinking
factor would seem to be of a more active or productive nature,
involving the construction of inferences or conclusions, rather than
evaluating the soundness of arguments or inferences, as is the case
for Factor I.
Factors III-V each accounted for a somewhat smaller proportion of
common variance (10% - 15%) than did Factors I and II. Factor III is
best defined by skills related to defining and setting up problems or
analyzing their components as a prelude to solving them. Factor IV is
best characterized by inductive reasoning skills, i.e., the drawing of
conclusions that have some evidential support, but not enough to
indicate logical necessity. Factor V is somewhat difficult to define,
but, by virtue of its two highest loadings, it seems to reflect an
ability to generate alternatives.

Reasoning errors. For the ratings of seriousness of reasoning
errors, the largest eigenvalues were 6.5 and 1.1, and the two factors
accounted for 96% of the common variance. (Frequency ratings were
also factor analyzed and are presented in Appendix B. Because the
results were so similar to the analysis of seriousness ratings, they
are not discussed here.) As shown in Table 8, Factor I, which
explained about 52% of the common variance, was characterized by
loadings from errors involved in the evaluation of evidence, e.g.,
offering irrelevant evidence to support a point. Factor II, on the
other hand, seemed to involve more formal logical errors, particularly
as related to reasoning with more statistically oriented material--for
example, failing to take account of a base rate, failing to recognize
differences between populations and samples, and confusing correlation
with causation.

Item-level Results


Tables 1-3 show the mean ratings by discipline for each question
included in the survey instrument. The numbers in the total column
are the grand means for all disciplines. Numbers under each
discipline represent for each item the deviations from these means.
The F tests in the right-most column indicate whether the means are
significantly different among the six disciplines. Because the
average ratings, over all respondents, for “frequency of use” and
“importance for success” correlated .99, only the importance ratings
are presented for reasoning skills. Likewise, only the “seriousness”
ratings are presented for reasoning errors, since their correlation
with frequency ratings was .98, and, for critical incidents, only the
average “effect” ratings are presented, since their correlation with
frequency ratings was .94.
Tables 1-3 show a substantial number of significant differences
among disciplines with respect to the importance placed on various
reasoning skills (Table l), the seriousness with which they regard
particular kinds of reasoning errors (Table 2), and the impact that
various critical incidents have on the estimation of students’
analytical abilities (Table 3). Table 4, showing only the very
highest rated skills and most critical errors and incidents, gives a
flavor of the differences among these six disciplines. For example,
chemistry faculty placed a high premium on being able to generate
hypotheses, questions, or experiments, to draw sound inferences from
observations, and to analyze and evaluate previous research. Engl i sh
faculty, on the other hand, saw greater importance in skills involving
argumentation-- being able to understand, evaluate, analyze, elaborate,
recognize, and support aspects of an argument.
Faculty in the six disciplines also appeared to have quite
different views as to the numbers of skills that were important in
their respective disciplines. The numbers of reasoning skills that
received average ratings of 4.0 or higher varied markedly by
discipline as follows: 23 for chemistry, 5 for computer science, 27
for education, 22 for engineering, 29 for English, and 26 for
psychology. These differences may have arisen, for example, from our
particular choice of questions, from differences in standards amonq
disciplines, or from some other factor(s) .
It can be seen, even from Table 4, however, that some skills were
-6-
viewed as very important by several disciplines. For example,
‘breaking down complex problems into simpler ones” was rated as the
single most important skill (of the 56 skills listed) in both computer
science and engineering. “Determining whether conclusions are
logically consistent with, and adequately supported by, the data” was
rated as one of the three most important skills by both education and
psychology faculty; “drawing sound inferences from observations” was
the highest rated skill in chemistry and nearly the highest in
education.
The extent to which faculty in different disciplines agreed on
the importance of various skills, errors, or incidents can be examined
in a slightly different manner. To get some idea of the skills,
errors, and incidents that were viewed as relatively important, and
for which average ratings did not differ significantly across
disciplines, Table 5 was prepared. This table shows only those skills
that received average ratings of importance of more than 3.5 over all
six disciplines combined, and for which analyses of variance did not
detect any significant differences among disciplines.
“Reasoning or problem solving in situations in which all the
needed information is not known” was the skill rated as most important
overall. Such skills as ‘detecting fallacies and logical
contradictions in arguments,” “deducing new information from a set of
relationships, ” and “recognizing structural similarities between one
type of problem or theory and another” were the next most highly rated
skills. These were followed closely by “taking well-known principles
and ideas from one area and applying them to a different specialty,’
“monitoring one’s own progress in solving problems,” and “deriving
from the study of single cases structural features or functional
principles that can be applied to other cases.”
Table 6 lists the reasoning errors and critical incidents that
were judged overall to be the most serious or to have the most effect
on the estimation of students’ abilities. Three errors/incidents were
judged to be most serious or critical: ‘accepting the central
assumptions in an argument without questioning them,” “being unable to
integrate and synthesize ideas from various sources,” and “being
unable to generate hypotheses independently.”
It should be noted that there are many other decision rules,
based on average ratings and differences among disciplines, that could
have been used here to form a “common core” of skills or errors/
incidents. Tables l-3 could be consulted to apply alternative rules.