## Predicted Value Y-hat

Y-hat is the symbol that represents the predicted equation for a line of best fit in linear regression.

The good news is that they can be created in Unicode, but it's quirky. The trick here is to forget math and think phonetics. For HTML, I recommend inputting the base letter x or p then the appropriate numeric escape code for the combining diacritic. See code examples below:. Relatively few fonts support combining diacritics well not even math fonts support diacritics well. Those that do are phonetics oriented and these include include:.

Not a very pretty solution at the moment. And now the fun really begins. You can input these characters in other programs see below , but editing them will be odd see below. When you edit, you will discover that sometimes you will delete the accent, and sometimes you will delete the letter beneath the accent very entertaining.

This modelling process is termed regression more colloquially known as "line fitting" or "curve fitting". This section describes linear least-squares regression , which fits a straight line to data. If a variable y is linearly related to x , then we use the formula for a line: This is pronounced "y-hat" and means it is our estimated value of y. We need this symbol because is because it is extremely rare for actual data points to fall exactly on a line.

The term e i is the residual for observation i ; that is, the difference between the prediction and the observed value. Note that the 'hat' is gone for the y. We take b 0 and b 1 to be estimates of two abstract parameters in the equation: The larger our sample size, the more confident we are about our function, as long as the assumptions are correct. Once we know the mean and standard deviation for x and y , and the correlation coefficient r between them, we can calculate b 0 and b 1: Note that both b 1 and b 0 have units associated with them; they are not dimensionless.

The slope will be in units of y-units divided by x-units, and the intercept will be in y-units. Also note that r must have the same sign as b 1. In other words, if r is positive, the slope must be positive, and if r is negative, the slope must be negative. Sometimes computer programs provide you only with R and not r. By performing the appropriate calculations, the fitted regression function for the invertebrate example turns out to be: The above figure displays data green crosses , as well as the fitted regression line through the red squares, our "y-hats".

Note that the line fits the data fairly well. However, given that our sample is incomplete we have not collected data from all possible lakes , the data have a little bit of randomness involved with them.

We can therefore never say the line we find is the "real" or "true" line This is akin to the reason we can never truly accept the null hypothesis. However, we can say that the line is our best hypothesis of the relationship between x and y, given the assumption that the variables are linearly related.

Often relationships are not linear; in this case more sophisticated techniques are needed. We know in this particular case that the assumption is technically incorrect, since the model predicts But in most cases, we are not too concerned with misbehavior of the model outside the range of our data. How is the calculated line our "best" hypothesis? The technique chooses the one line, out of all possible lines, which is closest to all the data points.

In particular, the sum of the squares of the vertical distances between the line and the data points is as small a number as possible: The residual is taken as a measure of the abstract parameter e i , or true error, mentioned previously. Of course, 'error' is not to be interpreted to imply mistakes or sloppiness - though such is not be ruled out entirely, either.

Variables are often in different units, so how can they be compared? Ranks - Data are sorted by value, and the values are replaced by the order in which the data points are listed. A constant unit equals a constant multiple on the arithmetic scale. For example, A tenfold difference in length will be the same distance apart as a tenfold difference in dollars. The difference between 1g and 30g will be the same distance as 1 ton and 30 tons.

