Analysis MCQ: June 2018

Archive for June 2018

What does 1 - r^2 measure?

a. The relative importance of all other possible predictor variables on y.
b. The percentage of points that are on the regression line.
c. The percentage of points that are off the regression line

Answer: a. The relative importCorrelation and Regression Analysisance of all other possible predictor variables on y.

Which of the following statements is not correct regarding the correlation.

a. It can range from -1 to 1.
b. Its square is the percentage of variance accounted for.
c. It measures the percent of variation explained.
d. It is a measure of the association between two variables

Answer: c. It measures the percent of variation explained.

The primary purpose of a regression equation is to

a. measure the association between two variables.
b. estimate the value of the dependent variable based on the independent variable.
c. estimate the value of the independent variable based on the dependent variable.
d. estimate the percentage of variance accounted for.

Answer: b. estimate the value of the dependent variable based on the independent variable.

The ratio of SS(Regression) divided by the SS(Total) is also called the

a. sum of squares due to regression
b. percentage of variance accounted for
c. standard error
d. coefficient of correlation.

Answer: b. percentage of variance accounted for

Which of the following is not based on a correlation or a regression line relating y to x?

a. The standard error
b. The percentage of variance accounted for
c. SS(Total)
d. (a) and (c) only
e. (a), (b) and (c)

Answer: c. SS(Total)

The percentage of variance accounted for

a. is the square of the coefficient of correlation.
b. cannot be negative.
c. gives the percent of the variation in the dependent variable explained by the independent variable.
d. all of the above

Answer: b. cannot be negative.

The standard error is

a. computed from squared deviations from the regression line.
b. may be negative
c. is given in squared units of the independent variable.
d. all of the above

Answer: a. computed from squared deviations from the regression line.

A regression equation is used to predict women's Weight (in pounds) from their Height (in inches). The correlation between Weight (W) and Height (H) turned out to be 0.70 . The average H was 64 inches and the average W was 119.6 pounds.

The regression equation turned out to be:

W(hat) = 30 + 1.4H

Now consider the regression equation to predict H from W using the same data set:

H(hat) = int + slope * W

where "int" and "slope" are some numbers. Find the value of "slope".

a. 0.00
b. 0.35
c. 0.55
d. 0.25
e. 0.71

Answer: b. 0.35

In a regression problem, n=207 and SS(Residual)=0. Find the correlation coefficient.

a. 0.207
b. 0.00
c. 1.00
d. -1.00
e. either +/- 1.00

Answer: e. either +/- 1.00

What is the formula for the correlation coefficient in terms of the cov?

The covariance, denoted cov, is computed from:

cov = 1/(n-1) * [E](X - X(bar))(Y - Y(bar))

What is the formula for the correlation coefficient in terms of the cov?

a. cov / s(subx)
b. cov / (s(subx) * s(suby))
c. cov / s(subx)^2
d. cov / s(suby)^2
e. (s(suby) / s(subx)) * cov

Answer: b. cov / (s(subx) * s(suby))

The coefficient of correlation was computed to be -0.60. This means

a. the slope and intercept of the regression line are both negative
b. as x increases, y decreases.
c. x and y are both 0.
d. the percentage of variance accounted for equals sqrt(0.6)

Answer: b. as x increases, y decreases.

We obtained the following regression equation: y(hat) = 3.5 + 2.1x.. Which of the following statements are correct?

a. The dependent variable is predicted to increase by 2.1 for each increase of 1 unit in X.
b. The equation crosses the y-axis at 3.5.
c. If x = 5, then y = 14.
d. if x = 5 then =14
e. (a), (b) and (d) only

Answer: e. (a), (b) and (d) only

The coefficient of correlation.

a. Has the same sign as the slope
b. Can range from -1.00 to 1.00
c. Is also called the percentage of variance accounted for.
d. (a) and (b) only
e. none of the above

Answer: d. (a) and (b) only

In a regression problem, n=52, SS(Total)=400, and r = -0.8367 Find the Standard Error.

a. 2.40
b. 0.30
c. 1.55
d. -1.55
e. -2.40

Answer: c. 1.55

Find the intercept and slope of the regression line to predict Y from X.

Given the following information: r = 0.60
Mean Standard Deviation
X 40 4

Y 45 6
Find the intercept and slope of the regression line to predict Y from X.

a. intercept=9.0 slope=0.9
b. intercept=0.9 slope=9.0
c. intercept=9.0 slope=9.0
d. intercept=0.9 slope=0.9

Answer: a. intercept=9.0 slope=0.9

The correlation between X=person's weight and Y=person's height is 0.70 . What is the correlation for the same data set if we had used X=person's height and Y=person's weight?

a. 0.00
b. 0.70
c. -0.70
d. Need the original data so r can be recomputed

Answer: b. 0.70

Compute the correlation r: Given: (E)X=40, (E)Y=20, (E)XY=300, (E)X2=580, (E)Y2=400, n=10.

Compute the correlation r:
Given: (E)X=40, (E)Y=20, (E)XY=300, (E)X2=580, (E)Y2=400, n=10.

a. 0.0000
b. 0.2540
c. -0.5658
d. -0.2540
e. 0.5658

Answer: e. 0.5658

An Inverse Relationship means the trendline will have a _______ slope.

a. positive
b. negative
c. zero

Answer: b. negative

If all the points are on the regression line, then

a. the value of the slope is 0.
b. the value of the intercept is 0.
c. the correlation coefficient is 0.
d. the standard error is 0.
e. both (c) and (d)

Answer: d. the standard error is 0.

A correlation of 0.02 would indicate:

a. a very strong direct relationship
b. a very weak direct relationship
c. a very strong inverse relationship
d. a very weak inverse relationship
e. a computational error had been made.

Answer: b. a very weak direct relationship

In a regression problem, the slope = 0.40 The mean and standard deviation of the X variable are both 100. The mean and standard deviation of the Y variable are both 200. Find the correlation r.

In a regression problem, the slope = 0.40
The mean and standard deviation of the X variable are both 100.
The mean and standard deviation of the Y variable are both 200.
Find the correlation r.

a. -0.20
b. 0.20
c. 0.80
d. -0.80
e. 0.40

Answer: b. 0.20

Here is an Excel printout of a regression problem. Use this for the following 4 questions.

Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.81 0.0310
Residual 87 550.99 6.33
Total 88 581.44
Coefficients Standard Error t Stat P-value
Intercept 33.12 27.6000 1.2000 0.2334
X -2.56 1.1675 -2.1927 0.0310

Find the "percentage of variance accounted for".

a. 22.88%
b. 5.24%
c. 4.15%
d. 2.5166%

Answer: b. 5.24%

What is the value of r?

a. -0.2288
b. 0.2288
c. 0.0524
d. -0.0524
e. 1.0000

Answer: a. -0.2288

What is the predicted value for X=10?

a. 25.60
b. 31.20
c. 7.52
d. 58.72

Answer: c. 7.52

Find a 95% C.L.I. for the answer to the previous problem, where x=10. (y(hat) denotes the answer to previous question.)

a. y(hat) +/- 2.5166
b. y(hat) +/- 1.960
c. y(hat) +/- 16.07
d. y(hat) +/- 4.933

Answer: d. y(hat) +/- 4.933

What is the difference between (beta)1 and b1 ?

a. none; exactly the same; slope of regression line.
b. (beta)1 is the unknown population value, while b1 is its estimate from the data.
c. b1 is the unknown population value, while (beta)1 is its estimate from the data.

Answer: b. (beta)1 is the unknown population value, while b1 is its estimate from the data.

R-Square values can range from _ to __.

a. 0 1
b. -1 1
c. -1 0
d. -100 100

Answer: a. 0 1

If the p-value (for the slope) on a regression printout = 0.00001 then

a. p<0 .05="" a="" and="" at="" being="" can="" conclude="" correlation="" error="" good="" have="" in="" is="" it="" least="" like="" line.="" looks="" not="" of="" p="" predictor="" regression="" s="" safely="" sampling="" shown="" slope="" so="" that="" the="" we="" y="">b. p<0 .05="" a="" have="" it="" like="" looks="" of="" p="" poor="" predictor="" so="" we="" y.="">

Answer: a. p<0 .05="" a="" and="" at="" being="" can="" conclude="" correlation="" error="" good="" have="" in="" is="" it="" least="" like="" line.="" looks="" not="" of="" p="" predictor="" regression="" s="" safely="" sampling="" shown="" slope="" so="" that="" the="" we="" y="">

If X= the month you were born in (1,...,12) and Y=your height (inches), what r would I get if I collected data and computed the correlation?

a. -1
b. exactly 0
c. very close to 0, but not necessarily exactly 0
d. 1

Answer: c. very close to 0, but not necessarily exactly 0

In this regression equation: y(hat) = 10 + 4x suppose X increases from 25 to 28. How much will the predicted Y change?

a. 3
b. 1/3
c. 10
d. 12
e. 22

Answer: d. 12

I want to predict the sales on Saturday August 16 and I want a CLI interval prediction. What formula?

I want to use a regression line to predict the Sales (Y) of ice cream cones at the Baskin-Robbins in Galveston on Saturdays in summer. The X variable I will use will be the High Temperature for the day. In this way I can see what portion of the sales are due to fluctuations in the temperature. Certainly sales will fluctuate for other reasons, like if there is some special event in town or if it rains a lot. So I collect data: Sales and High Temperature for each of the 8 Saturdays in the summer so far.

I want to predict the sales on Saturday August 16 and I want a CLI interval prediction. What formula? (The subscript in the following is the number of degrees of freedom.)

a. y(hat) +- t(sub6) (std error)
b. y(hat) +- t(sub7) (std error)
c. y(hat) +- t(sub8) (std error)

Answer: a. y(hat) +- t(sub6) (std error)

If the __________ of the computed regression line is 0 then we can conclude that r = 0.

a. intercept
b. slope
c. p-value

Answer: b. slope

The simple linear regression (least squares method) minimizes:

A. SS(x)
B. SSE
C. The explained variation
D. SS(y)
E. Total variation

Answer: B. SSE

If the manager decides to spend $3000 on advertising, based on the simple linear regression results given above, the estimated sales are:

Regression analysis
r 0.873
r squared 0.762
Standard error 11.547
n 7
ANOVA
SS
Regression 2.133.3333
Residual 666.6667
Total 2,800,000
Regression output p-value
Intercept 63.3333 .0005
Advertising 6.667 .0103
The local grocery store wants to predict the daily sales in dollars. The manager believes that the amount of newspaper advertising significantly affects the store sales. He randomly selects 7 days of data consisting of daily grocery store sales (in thousands of dollars) and advertising expenditures (in thousands of dollars). The Excel output given above summarizes the results of the regression model.

If the manager decides to spend $3000 on advertising, based on the simple linear regression results given above, the estimated sales are:

A. 83,333
B. 70,000
C. 68,333
D. 20,064,333
E. 20,063.33

Answer: A. 83,333

Be sure to use the number 3 instead of 3000.

The strength of the relationship between two quantitative variables can be measured by:

A. The Y intercept of the simple linear regression equatio
B. The slope of a simple linear regression equation
C. The coefficient of correlation

Answer: C. The coefficient of correlation

The following results were obtained from a simple regression analysis:

y-hat = 37.2895 - (1.2024)X
r2 = .6744
What is the y-intercept of the linear regression equation?

A. .6774
B. .2934
C. -1.2024
D. 37.2895

Answer: D. 37.2895

The following results were obtained as a part of simple regression analysis:

r2 = .9162
p-value = .000
The null hypothesis of no linear relationship between the dependent variable and the independent variable

A. is not an appropriate null hypothesis for this situation
B. is rejected
C. is not rejected
D. cannot be tested with the given information

Answer: B. Is rejected

A simple regression analysis with 20 observations would yield ________ degrees of freedom.

A. 18
B. 20
C. 19
D. 1

Answer: A. 18

In simple regression analysis the quantity that gives the amount by which Y (dependent variable) changes for a unit change in X (independent variable) is called the __________

A. Coefficient of determination
B. Y intercept of the regression line
C. Slope of the regression line
D. Standard Error
E. Correlation Coefficient

Answer: C. Slope of the regression line

The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line.

A. Never
B. Sometimes
C. Always

Answer: C. Always

The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line.

Answer: False

The slope of the simple linear regression equation represents the average change in the value of the dependent variable per unit change in the independent variable (X).

Answer: True

If r = -1, then we can conclude that there is a perfect relationship between X and Y.

Answer: True

When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variable.

Answer: False

The dependent variable is the variable that is being described, predicted, or controlled.

Answer: True

A ____ table shows a logical structure, with all possible combinations of conditions and resulting actions.

a. pseudo
b. logic
c. decision
d. validity

Answer: C

A ____ description documents the details of a functional primitive, which represents a specific set of processing steps and business logic.

a. logic
b. primitive
c. process
d. function-based

Answer: C

In the accompanying figure, the sequence structure is the completion of ____.

a. one or more process steps based on the results of a test or condition
b. steps in a chronological order, one after another
c. a process step that is repeated until a specific condition changes
d. a specific condition that is repeated until a process changes

Answer: B

___ is based on combinations of the three logical structures, or control structures (one of which is shown in the accompanying figure), which serve as building blocks for the process.

a. Modular design
b. General design
c. Global design
d. Total design

Answer: A

In a data dictionary, some data elements have ____ rules, such as an employee's salary must be within the range defined for the employee's job classification.

a. domain
b. range
c. validity
d. mastered

Answer: C

A data dictionary specifies a data element's ____, which is the set of values permitted for the data element.

a. range
b. domain
c. array
d. any of the above

Answer: B

In a data dictionary, ____ is the maximum number of characters for an alphabetic or character data element or the maximum number of digits and number of decimal positions for a numeric data element.

a. domain
b. valence
c. length
d. index

Answer: C

In a data dictionary, ____ refers to whether the data element contains numeric, alphabetic, or character values.

a. value
b. type
c. valence
d. domain

Answer: B

In a data dictionary, any name other than the standard data element name is called a(n) ____.

a. clone
b. cipher
c. alias
d. index

Answer: C

The data dictionary usually records and describes a default value, which is the ____.

a. specification of the set of values permitted for the data element
b. identification of the user(s) responsible for changing values for the data element
c. specification for the origination point for the data element's value
d. value for the data element if a value otherwise is not entered for it

Answer: D

In a data dictionary, data elements are combined into ____, which are meaningful combinations of data elements that are included in data flows or retained in data stores.

a. fields
b. columns
c. records
d. decimals

Answer: C

In a data dictionary, a(n) ____ is the smallest piece of data that has meaning within an information system.

a. field
b. index
c. record
d. pixel

Answer: A

A data ____ is a central storehouse of information about a system's data.

a. glossary
b. knowledgebase
c. content bank
d. repository

Answer: D

Balancing ____.

a. uses a series of increasingly detailed DFDs to describe an information system
b. ensures that the input and output data flows of the parent DFD are maintained on the child DFD
c. uses a series of increasingly sketchy DFDs to describe an information system
d. ensures that the input and output data flows of the child DFD are maintained on the parent DFD

Answer: B

Using ____, an analyst starts with an overall view, which is a context diagram with a single process symbol, and then the analyst creates diagram 0, which shows more detail.

a. balancing
b. indexing
c. exploding
d. leveling

Answer: D

____ maintains consistency among DFDs by ensuring that input and output data flows align properly.

a. Balancing
b. Indexing
c. Leveling
d. Exploding

Answer: A

____ is the process of drawing a series of increasingly detailed DFDs, until all functional primitives are identified.

a. Leveling
b. Balancing
c. Indexing
d. Exploding

Answer: A

Leveling ____.

Because diagram 0 is a(n) ____ version of process 0, it shows considerably more detail than a context diagram.

a. contracted
b. exploded
c. condensed
d. extrapolated

Answer: B

If processes must be performed in a specific sequence, you document the information in the ____.

a. leveling guide
b. process descriptions
c. data dictionary
d. DFD

Answer: B

____ is/are logically impossible in a DFD because a process must act on input, shown by an incoming data flow, and produce output, represented by an outgoing data flow.

a. Spontaneous combustion
b. Gray matter
c. Black holes
d. Black boxes

Answer: C

A gray hole is a process that has ____.

a. no input
b. at least one output and one input, but the output obviously is insufficient to generate the input shown
c. no output
d. at least one input and one output, but the input obviously is insufficient to generate the output shown

Answer: D

A black hole is a process that has ____.

A spontaneous generation process is a process that has ____.

A DFD shows ____.

a. how data are related
b. what key fields are stored in the system
c. how a system transforms input data into useful information
d. what data is stored in the system

Answer: C

DFD symbols are referenced by using all ____ letters for the symbol name.

a. capital
b. lowercase
c. italicized
d. boldfaced

Answer: A

In data and process modeling, a(n) ____ model shows what the system must do, regardless of how it will be implemented physically.

a. operational
b. physical
c. logical
d. relational

Answer: C

What does 1 - r^2 measure?

Which of the following statements is not correct regarding the correlation.

The primary purpose of a regression equation is to

The ratio of SS(Regression) divided by the SS(Total) is also called the

Which of the following is not based on a correlation or a regression line relating y to x?

The percentage of variance accounted for

The standard error is

A regression equation is used to predict women's Weight (in pounds) from their Height (in inches). The correlation between Weight (W) and Height (H) turned out to be 0.70 . The average H was 64 inches and the average W was 119.6 pounds.

In a regression problem, n=207 and SS(Residual)=0. Find the correlation coefficient.

What is the formula for the correlation coefficient in terms of the cov?

The coefficient of correlation was computed to be -0.60. This means

We obtained the following regression equation: y(hat) = 3.5 + 2.1x.. Which of the following statements are correct?

The coefficient of correlation.

In a regression problem, n=52, SS(Total)=400, and r = -0.8367 Find the Standard Error.

Y 45 6Find the intercept and slope of the regression line to predict Y from X.

The correlation between X=person's weight and Y=person's height is 0.70 . What is the correlation for the same data set if we had used X=person's height and Y=person's weight?

Compute the correlation r:Given: (E)X=40, (E)Y=20, (E)XY=300, (E)X2=580, (E)Y2=400, n=10.

An Inverse Relationship means the trendline will have a _______ slope.

If all the points are on the regression line, then

A correlation of 0.02 would indicate:

In a regression problem, the slope = 0.40The mean and standard deviation of the X variable are both 100.The mean and standard deviation of the Y variable are both 200.Find the correlation r.

Here is an Excel printout of a regression problem. Use this for the following 4 questions.

What is the difference between (beta)1 and b1 ?

R-Square values can range from _____ to ______.

If the p-value (for the slope) on a regression printout = 0.00001 then

If X= the month you were born in (1,...,12) and Y=your height (inches), what r would I get if I collected data and computed the correlation?

In this regression equation: y(hat) = 10 + 4x suppose X increases from 25 to 28. How much will the predicted Y change?

I want to predict the sales on Saturday August 16 and I want a CLI interval prediction. What formula? (The subscript in the following is the number of degrees of freedom.)

If the __________ of the computed regression line is 0 then we can conclude that r = 0.

The simple linear regression (least squares method) minimizes:

If the manager decides to spend $3000 on advertising, based on the simple linear regression results given above, the estimated sales are:

The strength of the relationship between two quantitative variables can be measured by:

The following results were obtained from a simple regression analysis:

The following results were obtained as a part of simple regression analysis:

A simple regression analysis with 20 observations would yield ________ degrees of freedom.

In simple regression analysis the quantity that gives the amount by which Y (dependent variable) changes for a unit change in X (independent variable) is called the __________

The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line.

The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line.

The slope of the simple linear regression equation represents the average change in the value of the dependent variable per unit change in the independent variable (X).

If r = -1, then we can conclude that there is a perfect relationship between X and Y.

When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variable.

The dependent variable is the variable that is being described, predicted, or controlled.

A ____ table shows a logical structure, with all possible combinations of conditions and resulting actions.

A ____ description documents the details of a functional primitive, which represents a specific set of processing steps and business logic.

In the accompanying figure, the sequence structure is the completion of ____.

___ is based on combinations of the three logical structures, or control structures (one of which is shown in the accompanying figure), which serve as building blocks for the process.

In a data dictionary, some data elements have ____ rules, such as an employee's salary must be within the range defined for the employee's job classification.

A data dictionary specifies a data element's ____, which is the set of values permitted for the data element.

In a data dictionary, ____ is the maximum number of characters for an alphabetic or character data element or the maximum number of digits and number of decimal positions for a numeric data element.

In a data dictionary, ____ refers to whether the data element contains numeric, alphabetic, or character values.

In a data dictionary, any name other than the standard data element name is called a(n) ____.

The data dictionary usually records and describes a default value, which is the ____.

In a data dictionary, data elements are combined into ____, which are meaningful combinations of data elements that are included in data flows or retained in data stores.

In a data dictionary, a(n) ____ is the smallest piece of data that has meaning within an information system.

A data ____ is a central storehouse of information about a system's data.

Balancing ____.

Using ____, an analyst starts with an overall view, which is a context diagram with a single process symbol, and then the analyst creates diagram 0, which shows more detail.

____ maintains consistency among DFDs by ensuring that input and output data flows align properly.

____ is the process of drawing a series of increasingly detailed DFDs, until all functional primitives are identified.

Leveling ____.

Because diagram 0 is a(n) ____ version of process 0, it shows considerably more detail than a context diagram.

If processes must be performed in a specific sequence, you document the information in the ____.

____ is/are logically impossible in a DFD because a process must act on input, shown by an incoming data flow, and produce output, represented by an outgoing data flow.

A gray hole is a process that has ____.

A black hole is a process that has ____.

A spontaneous generation process is a process that has ____.

A DFD shows ____.

DFD symbols are referenced by using all ____ letters for the symbol name.

In data and process modeling, a(n) ____ model shows what the system must do, regardless of how it will be implemented physically.

Y 45 6
Find the intercept and slope of the regression line to predict Y from X.

Compute the correlation r:
Given: (E)X=40, (E)Y=20, (E)XY=300, (E)X2=580, (E)Y2=400, n=10.

In a regression problem, the slope = 0.40
The mean and standard deviation of the X variable are both 100.
The mean and standard deviation of the Y variable are both 200.
Find the correlation r.

R-Square values can range from _ to __.