Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
128 Cards in this Set
- Front
- Back
inference
|
the process of learning about a population by studying a sample
|
|
sample regression
|
estimates the association between x and y in the entire population
|
|
regression line
|
an estimate from a sample trying to describe the true regression line from the population
|
|
observational study
|
a statistical study in which the subjects are not modified (just observed) so that researchers can measure and record certain characteristics
|
|
experiment (experimental study)
|
A statistical study in which a "treatment" is applied to the subjects (i.e. they are modified) and researchers measure the effect of the treatment
|
|
lurking variable (confounding variable)
|
-other variables that may influence the response that are not studied
|
|
explanatory variable
|
variable that explains or causes the differences in another variable, ( "x" or independent variable)
|
|
response variable
|
variable which is thought to depend on the value of the explanatory variable, ("y", dependent variable)
|
|
study question
|
the question about the population that the study is attempting to answer
|
|
population
|
the complete set of all individuals/objects the study is attempting to answer a question about, the whole group of individuals we are interested in
|
|
study subjects
|
the individuals actually measured in the study (i.e. the selected sample of individuals/objects from the population)
|
|
treatment
|
what the research does/gives to some or all of the study subjects; the factor whose effect is under study; also called the explanatory variable
|
|
response variable
|
the quantity or characteristic that is measured to determine the treatment effect
|
|
control group
|
group of subjects that have the same sources of variability as those receiving the treatment but does NOT receive treatment; sometimes called the placebo group
|
|
confounding factor
|
any factor other than the experimental treatment that can affect the response variable in the experiment
|
|
completely randomized design
|
a design in which the treatments in the experiment are randomly assigned to the experimental units without using matched pairs or blocks
|
|
researchers
|
people who make measurements
|
|
single blinding
|
subject doesn't know if he/she is in the treatment or control group
|
|
double blinding
|
neither RESEARCHERS nor SUBJECTS know where the participants are assigned between the control and treatment group
|
|
matched pair design
|
makes two measures on each subject
|
|
blocking design
|
-extension of completely randomized design
- put similar subjects into blocks, expect the blocks to differ with respect to the response variable -then do a completely randomized experiment within each block |
|
block
|
a group of subjects that are similar in some way
|
|
"blocks" refers to ...
|
individuals
|
|
"experimental units" refers to...
|
repeated time periods in which the blocks receive the varying treatments
|
|
scatter plot
|
used to compare variables
-must measure two variables on a common individual (an individual can be a person, place, or even time) -then plot the two variables |
|
positive association
|
this type of association occurs when the value of one variable tends to increase as the value of the other variable increases
|
|
negative association
|
this type of association occurs when the value of one variable tends to decrease as the value of the other variable tends to increase
|
|
non-linear association
|
this type of association occurs when there is no linear relationship between two values
|
|
correlation
|
a number that indicates the strength and the association of a straight-line relationship between two quantitative variables
|
|
strength of correlation
|
determined by the absolute value of the correlation, indicates the overall closeness of the points to a straight line
|
|
direction of the correlation
|
determined by the sign of the correlation
|
|
magnitude of r
|
absolute value of r, indicates the strength of the relationship
|
|
r = 1 or r = -1
|
indicates that there is a perfect linear relationship and all data points fall in the straight line
|
|
squared correlation, r²
|
this is the proportion of variation in the response variable that is explained by the explanatory variable. It is positive between 0 and 1.
Referring to a correllation |
|
r
|
correlation coefficient, used to measure linear relationship between x and y
|
|
the line of best fit
|
-this estimates the average value of y when you know x and individual's values will vary around the predicted value
- can be used to give a prediction of a value of y, given a specific value of x |
|
randomization test
|
a test on two groups when paired data is NOT available
|
|
sampling frame
|
a list of all individuals in the population
|
|
in hypothesis testing, population parameter =
|
null value
|
|
null hypothesis
|
-the statement being tested
-a statement that describe some aspect of the statistical behavior of a set of data -this statement is treated as valid unless the actual behavior of the data contradicts this assumption |
|
null value
|
-the specific # the parameter equals if the null hypothesis is true
- value of population parameter being tested in the null hypothesis |
|
alternative hypothesis
|
- a statement that something is happening
- researchers want to prove this - it may be a statement that the assumed status quo is false, or that there is a relationship, or there is a difference |
|
two types of alternative hypothesis
|
one sided test, two sided test
|
|
one-sided test
|
when Ha specifies a single direction
|
|
two-sided test
|
when Ha includes values in both directions
|
|
p-value
|
the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming Ho is true
|
|
level of significance
|
(α) is the border line for deciding that the p-value is low enough to justify choosing the alternative hypothesis
|
|
hypothesis testing about paired differences
|
matched pairs design
|
|
matched pairs design
|
taking two measures on the same subject to see if there is a difference between the two measurements
|
|
paired t-test
|
a one-sample t-test used on the sample of differences to examine whether the sample mean difference is significantly different from 0
|
|
sampling distribution
|
-describes the possible values the statistic might have when random samples are taken from a population
the distribution of statistics ("xbar" or "p hat") for all possible samples from the same population of a given sample size (n) |
|
statistical inference
|
gives us methods for drawing conclusions about a population based on data from samples
|
|
confidence interval
|
an interval of values computed from sample data that is likely to include the true population
|
|
standard error
|
is the estimated standard deviation of the sample distribution of the statistic
|
|
confidence level
|
proportion of samples for which the confidence interval will capture the true parameters, % of time we expect the procedure to work, determines how frequently the observed interval contains the parameter
|
|
standard error of sample mean
|
(s) is the sample standard deviation
|
|
statistic
|
a number summarized by the same characteristic of the sample data, computed from the sample values, a known value that varies from sample to sample
|
|
is the distribution of possible values of the statistic for repeated samples of the same size taken from the same population
|
sampling distribution
|
|
mean of a sampling distribution
|
the average of all possible values of the statistic for repeated samples of the same size from a population
|
|
the standard deviation(SD) of a sampling distribution
|
measures the average distance of the possible values of the statistic from the mean of the sampling distribution, roughly speaking
|
|
there is a difference between N and n!
N= n= |
n= sample size (number of values in one sample/subgroup)
N= number of samples (number of subgroups) |
|
Law of Large Numbers (LLN)
|
as you average more observations, sample mean settles down at population mean
|
|
graphs used for categorical variables
|
1. pie chart
2. bar graph |
|
graphic representations for quantitative variables
|
1. histogram
2. stem-and-leaf plot 3. box plot |
|
standard deviation
|
a value that measures the variability (spread) of data.
|
|
density curve
|
the outline of the histogram which approximates the overall pattern of a distribution
1. Its always on or above the horizontal axis 2. It has area of exactly 1 underneath it |
|
standard normal distribution
|
-this is a normal distribution with a mean of 0 and a standard deviation of 1
-all other normal distributions are compared to this |
|
z-score
|
(a standardized value) that is the distance between a specified value and the mean, measured in number of standard deviations
|
|
observation (individual)
|
an individual or the value of a single measurement
|
|
variable
|
a characteristic that can differ from one individual to the next
|
|
categorical variables
|
the observational units are being divided into units, there is no special ordering of the categories
|
|
ordinal variables
|
the observational units are being divided into categories which have an order
basically a categorical variable with ordered categories |
|
quantitative variables
|
-variables that take numerical values
- you should be able to do mathematical operations with these numbers such as adding, multiplying, etc. (A social security number would not be one of these) |
|
graphs for quantitative variables
|
1. Histogram
2. Stem-and-Leaf Plot 3. Dot Plot |
|
Pie Chart
|
each slice of a pie corresponds to a category and the size of the angle of the slice shows the percentage of the individuals in the corresponding category
|
|
Bar Graph
|
-each category is presented as a bar
- the height of the bar represents the number (or percentage) of individuals in the corresponding category |
|
range
|
highest value subtract the lowest value
|
|
histogram
|
bar graphs for a quantitative range of possible value are broken into categories
|
|
frequency
|
actual number of individuals who fall into each interval (of a histogram)
|
|
relative frequency
|
proportion or percentage that are in an interval (of a histogram)
|
|
stem and leaf plot
|
every individual data value is shown
|
|
dot plot
|
display a dot for each observation along a number line
|
|
distribution
|
the overall pattern of how often the possible values occur
|
|
shape of a distribution
|
shows how values are distributed in a distribution
|
|
center
|
location, average, mean and median measure this
|
|
outlier
|
unusual values that do not fit with the rest of the pattern
(may be due to data entry errors or may be actual unusual values) |
|
symmetric distribution
|
one half of the distribution is the mirror image of the other (bell shape)
|
|
bimodal distributions
|
has two peaks which can be caused by two or more groups of values in the sample
|
|
multimodal distribution
|
distribution with several peaks
|
|
median
|
the middle number of the data when it is ordered, 50% of the data is above it and 50% of the data is below it
|
|
two measures of the center
|
mean and median
|
|
symmetric distribution
(mean ? median) |
mean = median
|
|
right skewed distribution
(mean ? median) |
mean>median
mean is greater than median |
|
left skewed distribution
(mean ? median) |
mean<median
mean is less than median |
|
First Quartile (Q1)
|
25% of the data is at or below this number
|
|
Third Quartile (Q3)
|
75% of the data is at or below this number
|
|
Inter-Quartile Range (IQR)
|
A value describing the spread over approximately the middle 50% of the data
|
|
the five number summary includes
|
1) maximum
2) minimum 3) Q1 4) median 5) Q3 |
|
boxplot
|
a graphical representation of the 5 number summary
|
|
1.5*IQ Rule
|
an outlier is any value that lays more than one and a half times the length of the box
|
|
variance
|
measures the distance of all individuals from the mean
|
|
strata
|
sub groups of population which might have different responses to the question of interest
|
|
stratified sample
|
is a collection of samples taken in each stratum of the population
|
|
cluster samples
|
sampling technique used when natural groups are evident in a statistical population
|
|
systematic samples
|
select ever k-th individual from the sampling frame
|
|
under coverage
|
sampling frame does not include all the population
|
|
over coverage
|
sampling frame includes individuals who are not in the population being examined
|
|
data entry errors
|
person recording the data makes mistakes
|
|
question wording error
|
the set up of the question can have a big influence on the answers
|
|
definition of statistics
|
a collection of procedures and principles for gathering data and analyzing information to help people make decisions when face with uncertainty
|
|
individuals
|
the objects described by the data set
(each student in the class is an observational unit or individual) |
|
variables
|
characteristics of the individuals
(max speed, sex of the students, height, time of sleep) |
|
sample
|
subgroup of the population examined to measure the variables and gather information
|
|
parameter
|
a number that describes a characteristic of the population. It is mostly a summary of a population. It's value is unknown.
|
|
statistic
|
summary of a sample, the value of this is usually known
|
|
census
|
taken to measure ALL individuals in the population
|
|
selection bias
|
this method of selection of participants favors a particular outcome
|
|
non response bias
|
some part of the individuals in the sample cannot be reached or do not respond, this creates a bias because respondents may differ in meaningful ways from non-respondents.
|
|
response bias
|
participants give incorrect information
|
|
response rate
|
the proportion of the sample that responded to the question
|
|
non-response rate
|
the proportion of the sample that didn't respond to the question
|
|
convenience samples
|
investigators choose individuals that are easy to reach
|
|
volunteer response samples
|
individuals decide whether to answer the questions or not
|
|
simple random sample
|
definition?
|
|
statistical significance
|
a result is unlikely to have occurred just by chance
|
|
practical significance
|
the difference from the claimed value we observe is actually meaningful
|
|
numbers in"stem"column of stem and leaf plot
|
first digit of each number in the data set
|
|
numbers in"leaf"column of stem and leaf plot
|
contains only the last digit of the # regardless of whether it falls before or after the decimal point
|