Math 109 Project Part 1 Description For this project you will collect and compare quantitative data from two populations. The project will be completed in 2 parts, in which you will use statistical methods to answer the question, “Is there a difference between the means of the two populations? You will present your results and conclusions for this first part of project in an essay format that is at least 1000 words in length. (For details see Project Format below) Do NOT include definitions or state how to calculate results in your project. State your findings and conclusions in your own words. Phase 1: Choose your populations and variable, and collect your data For Check Point 1: • You will need to pick two populations, that each contain at least 250 individuals or objects), and one quantitative variable to study. • The variable should be something you would like to compare between your two populations. Your variable must be

quantitative (numerical). It cannot be qualitative (categorical). You will collect data for your variable from 25 randomly selected individuals in each population. In total you will collect 50 observations. Phase 2: Exploratory Analysis For Check Point 2: . Enter your collected data in StatCrunch and save your data. Label appropriately your columns as any output generated will automatically use the column titles when labeling your graphics. Generate summary statistics for both samples and save the output. Create a histogram for each of your samples and save the graphs. Create a single graph containing both boxplots (be sure to check off the use fences to identify outliers” option) and save the graph.

Phase 3: Project write-up In your Project discuss the following for both samples: Discussion of Topic (Total: 6pts) *Clearly describes the quantitative variable and two populations studied: What is your project about? What are you studying, and what populations did you select? Remember that you need to choose your populations carefully, making sure that your data is actually representative of the populations you chose. *Discusses why the selected topic was chosen… What piqued your interest about this topic. Is it related to your major in some way, or just something you thought was interesting.

Maybe you already had a preconceived notion of how the project would turn out Discussion of Data Collection (Total: 4pts) *Clearly describes how data was collected… Tell me how you collected your data. Who did you ask, how did you decide who ask, and where? * and decides if the collection method meets the requirements of a random sample… You probably did not meet the requirements of a random sample. If there was any bias at all in choosing who you ask and where, then you don’t have a random sample. That’s fine, you just need to say that. …if not there is a clear and accurate discussion of how to correct the issues: Generally the reason that students don’t have random samples is due to a lack of resources. You don’t have the time or ability to ask thousands of people a question and then randomly choose 50 of them.

Imagine that you did, though. What are some ways that you could actually get a random sample? The textbook offers some great examples. Discussion of Histograms (Total 6pts) **Clearly and correctly discusses the shape of both distributions based on the histograms: Is the distribution right skewed, left skewed, or symmetric? How can you tell? Is the distribution multi-modal, bi-modal, or uni-modal. How can you tell? *Correctly determines if there are any outliers present in the histograms… DO NOT LOOK AT YOUR BOXPLOT. According to your histogram, are there any outliers in the distribution? * ..and explains how they were identified: In a histogram, how do you identify outliers?

Do not think about the formula for outliers, and do not look at your boxplot. There is only one way to identify potential outliers in a histogram. What is it? Discussion of Outliers in Boxplot (Total 4pts) *Correctly and clearly discusses if there are any outliers present in the boxplots… Now, you can look at your box plot, which should very clearly show you the outliers, if you have any. This answer may not match your answer to the question regarding outliers in your histogram, and that is fine. Do not go back and change your answer. * ..and explains how they were identified: Yes, the answer to this question is that obvious. You have to say it anyway, because that is what proves that you used your boxplots to answer the question.. Discussion of Visual Sample Comparison (Total 8pts) *Correctly and clearly discusses if the centers of the two samples are different from each other using both the histograms… DO NOT LOOK AT YOUR BOX PLOTS. DO NOT LOOK AT YOUR SUMMARY STATISTICS.

Using your histograms alone, make a rough estimate about where the “center” for each distribution is. Then compare them. Are they basically the same or is one bigger than the other? …and boxplots: Now, answer the exact same question, but only look at your box plots. For each graph, what are the “centers” that represent the typical values of your data? Are they basically the same or is one of them significantly bigger? *Discussed reasoning. Here, talk about what methods you were using to identify where the “center” of your histograms and box plots were. Note that the MODE of a distribution is not the same thing as the center, so do not talk about how one value has a really high frequency.

That is not an appropriate defense for that value being the center. The mode and the center could have the same value, but for different reasons. * and correctly identifies implications for the two populations: All of the information you just studied and gathered only pertains to your samples, but we use that information to make inferences about the populations that those samples came from. Basically, this just means, what is the answer to that overarching question you had to identify in the first paragraph? Discussion of Measures of Center and Spread: *Correctly and clearly describes the two samples using the two measures of center and spread: Talk about the values of mean and median, Standard deviation and IQR.

Discussion of Appropriate Measures to Use (Total 4pts) **Correctly determines which measure of center and spread would be appropriate to use when comparing the two samples: Use what you know about skewness and outliers to decide which is better for your data sets: The mean and the standard deviation OR the median and the IQR. You will need to decide for each data set individually. However, if one of your data sets uses median and IQR, but the other uses mean and standard deviation, you will move forward with your comparison using the median and IQR for BOTH DATA SETS. If this is the case, please state this in your paper.. *Correctly supports reasoning for the decision: Why did you choose what you chose for each data set? Discussion of Samples Using Chosen Center and Spread Measurement (Total 6pts) *Clearly and correctly compares the two samples using the chosen measure of center and spread: First, think about the measure of center you chose for comparison.

Which group has a higher measure of center? That group has a higher typical value. Now, think about the measure of spread that you chose to compare. Which group has a greater measure of spread? That group has more variability. Having more variability and having a higher typical value are not the same thing. Having more variability DOES NOT MEAN that one group has a higher typical value. *Correctly discusses what the results suggest about the two populations the samples were taken from Which group had the higher typical value? Then, what is the answer to that overarching question you had to ask in the beginning? Now, which group had greater variability? What does that mean about the populations?

Does it affect the answer to the overall question in any way? It may, it may not, it depends on your data. Discussion of Expectations and Limitations (Total 4pts) *Clearly discusses if the analysis of the samples matched initial expectations for the two populations: What did you think the results would be in the beginning, before collecting your data? Were you right or wrong? *Clearly and correctly discusses what the implications of the analysis are for the main question that is trying to be answered through the analysis: One more time, answer the overarching project question and a brief summary of how you arrived at that conclusion Project Formatting 1. On the First page: Before the title of the assignment, include the following information in the upper left hand corner: Your name , course name and your section.

2. Text: Use align text left to create text that is flush with the left margin 3. Spacing: Double space your paragraphs 4. Margins: Use 1 inch margins 5. Graphics: Include all of your output from StatCrunch. Place graphs and results by the relevant discussions. You will be penalized if any graphics are missing. This includes the • Summary statistics for both samples • Both histograms . Both boxplots on one graph 6. Data: Include a table with your data at the end of your project. You will be penalized if your data is missing. 7. Length: You should have 1000 words of meaningful text, not including the StatCrunch output or data. (Spelling out numbers will be heavily penalized.)

If you are using a word count feature, make sure your word processor is not counting the StatCrunch output or data. 8. General: Since this project is in an essay format be sure to include an introduction and a conclusion. Quality of writing does count so be sure to review your paper for obvious grammatical errors.

