The Mathematics of Data Analysis

The Mathematics of Data Analysis



One of the key tools used in data analysis is graphing. Graphs allow one to not only visualize patterns in data but also to develop mathematical equations which relate the different variables one is considering. The following information will help both teachers and students develop the necessary skills need to effectively use graphing techniques to analyze a wide range of data.




Where to Start: Finding the Slope of a Line

Before you can use a graph with any degree of success you must first understand some basic concepts. The first relates to the steepness or slope of a line. To visualize this concept let's consider the following scenario.

Scenario: You are riding your bike to the local hang-out. There are two different routes that you can take (see the two figures below). You don't want to show up all sweaty so you would like to take the less steep road, but you cannot tell which road is the one to take. You could measure the angle of the road, but you left your protractor in your other pants. Luckily, you did bring a measuring stick. You measure both the horizontal and vertical distances you would have to travel for both routes. Now, to determine which route is steepest, you need to compare the slopes of the two roads.

Slope: The slope of a straight line is the ratio of the change in the vertical distance
to the change in the horizontal distance (note triangle shape in figures 1 and 2).

Thus, in the case of the two routes we have:

The change in vertical distance = 10 ft

The change in horizontal distance = 50 ft

Thus,

Then since 3/5 > 1/5, we know for sure that road DE is the steepest since its slope is the greatest.

In Practice: Using Data to Find Slope

Now that we understand the concept of the slope of a road, we can carry this concept of steepness or slope of a road over to a graph and see how it can be used in data analysis. The following scenario should help illustrate this point:

Scenario: Let's say you go on a winter vacation and return home after a week to find the house freezing cold. You quickly turn on the heat and at the same time begin to record the temperature of the living room every minute to see how long it takes to warm up. You record the readings and produce the following table after five minutes.

TIME (Min.) TEMPERATURE (F)
0 45
1 50
2 55
3 60
4 65
5 70

Now, let's graph this data with the Temperature on the vertical axis (y) and Time on the horizontal (x) axis.

Now, how do we find the slope of the line on the graph? Well, imagine again that this line represents the road from before. Recall, that the slope is a ratio of the change in the vertical distance to the change in the horizontal distance.

So, what is the change in vertical "distance" of the line on the graph?

Well, it went from 45 F to 70 F. To find the "distance" we subtract 45 from 70.....

70 - 45 = 25

And the change in horizontal "distance"?

It went from 0 minutes to 5 minutes, thus,

5 - 0 = 5

Then to find the slope, we need to find the ratio of these two numbers:

So what???

It is very important to remember the physical quantities we were measuring: Time and Temperature of the living room. So, in this case the slope does not just tell us how steep the line on the graph is (that would be fairly useless) BUT it also tells us that the living room was warming up at a rate of 5 degrees every minute or 5 degrees per minute.

This example illustrates a very important point: whenever you are using a graph to plot physical data, the slope of the line you obtain tells you how much your y-variable has changed as compared to how much your x-variable has changed or in other words the rate of change of the variables.

Now, to simplify things we can write an equation which we can use on any line graph to find the slope:

If we pick two random points on the line, let's call them A and B, where point A is located at (X1, Y1) and point B is located at (X2, Y2). Then to find the ratio of the change in y to the change in x we can say:

Now that we have done all this work, what do we have to show for it? Well, we can now accurately tell the future! Since the slope tells you the rate at which the living room temperature is changing, we can now stop measuring the temperature and use the slope to predict how hot it will get. For example, at the end of five minutes the room was at 70 degrees F. Let's say we wanted to know how hot it would be after ANOTHER five minutes. Well, we now know that the temperature will rise five degrees each minute, thus in five minutes it will rise 25 degrees. Therefore, the final temperature should be 95 degrees F provided the heat was left on for 10 minutes.

Where Are We Starting From: Initial Conditions

We just concluded that the final temperature should be 95 degrees F provided the heat was left on for 10 minutes. However, this statement is not exactly true because it is missing a very important condition. It should have read: "The final temperature should be 95 degrees F provided the heat was left on for 10 minutes and started at 45 degrees initially" Why did we need to add this last part? Well, the slope does not give us all the information we need to tell the future. We also need the temperature or condition at which the entire "experiment" started. To illustrate this point, imagine that you knew the slope was 5 deg./min. but you did not know at what temperature the house was initially. Could you find the temperature after 10 minutes? No.

This starting point is often referred to as the initial condition of the experiment and is a key to developing an equation which we can use for any straight line graph. Before we move on it is important to note that our line on the Temperature vs. Time graph intersects the y-axis (vertical axis) at exactly 45 degrees F which is our initial condition. This happens because the y-axis (vertical axis) passes through the zero mark on the x-axis, which corresponds to "time zero" or the beginning of the experiment. Thus, the point at which the line passes through the y-axis is the initial condition or in our case the initial temperature at which the experiment started. We call this point on the y-axis (vertical) the y-intercept (the letter 'b' is often used as a symbol for the y-intercept) and it is the last key we need to develop an equation for any straight line graph.

Putting it All Together: Creating an Equation for a Line

As we have seen, there are two pieces of information one needs to predict the future based on graphical information. One is the rate at which the phenomena one is studying are changing, which we call the slope, and the other is the place where one started, or the initial condition (y-intercept). Now, wouldn't it be nice to have an equation which tied all these concepts together? Well, we have one and it is called the equation of a straight line. This equation takes on the following form:

Y = mX + b

Where 'm' stands for slope and 'b' stands for the y- intercept.

Thus, from a straight line on a graph we can first calculate the slope from equation (1) and then the y-intercept by observing where the line crosses the y-axis. Recall, for our living room experiment, the slope was 5 deg./min and the y-intercept was 45 deg. F. Thus, the equation for this straight line would look like this:

Y = 5(X) + 45

Now we can substitute the variables we were measuring for X and Y in the above equation and we get:

Temperature = 5(Time) + 45

Finally, we now have one equation which incorporates both the slope and the initial condition or y-intercept which we can use to predict the living room temperature for any time in the future. For example, what will the temperature be in 10 minutes, in 15, or in 20?

At Time = 10 minutes,

Temperature = 5(10) + 45 = 50 + 45 = 95 deg. F (what we found before!)

At Time = 15 minutes,

Temperature = 5(15) + 45 = 75 + 45 = 120 deg. F

Can you calculate the temperature in 20 min?

One good way to check your work.......

In the above examples, I did not carry the units through the calculations. It is often a good idea to do this since your answer will automatically have the correct units. It is also a good way to check your work because if you finish your calculations and find that your units do not come out correctly you know you made a mistake someplace. The following is an example of how to carry the units through your calculations.