The formula to compute the standard deviation from a sample of data looks complex – ugly even. But understanding the the notion of variation (e.g., standard deviation) is fundamental to statistical thinking.

At its core, the computation for sample standard deviation is very sensible (like so many other complex-appearing mathematical formulas, e.g., the distance formula). Essentially, the formula is computing a typical deviation – on “average” how far are the data from the mean. Thus the reason for computing the difference from each data point to the mean. Squaring and square rooting is important, as without it, the sum of all the deviations (which include positive and negative values) will always be zero – which would be a useless calculation. [Absolute values can be used instead of square – i.e., mean absolute deviation – though squaring is often preferred because, in contrast to absolute value, squaring is a smooth function with nice derivatives.]

Yet, if the goal is an “average” deviation, why would you divide by *n-1* (and not *n*)? When you compute the mean, or average, you sum all the values and divide by *n, *so why does the sample standard deviation divide by *n-1*? This is an important question. One which mathematics and statistics educators need to be ready to answer.

Some might say something to the effect of, well it has to do with the “degrees of freedom” – there are *n-1* degrees of freedom, so we divide by *n-1*. This is, to some degree, explanatory. But for the most part, this response feels like a smoke screen – it masks the explanation with sophisticated jargon, to cover up the fact that, well, it’s complicated. It’s a torero holding out a muleta as though the real deal were just behind it. Others might say that dividing by *n-1 *results in a slightly larger value than dividing by *n*, which provides a little bit of “wiggle room” for describing a typical deviation (e.g., data are unpredictable, so we need some buffer built into our statistics). Again, partly true. But then why not *n-2*? Might that be even better? Or why not *n-1.5* (which is, in fact, at times better)? The real truth about variance and standard deviation comes from a fundamental distinction between *descriptive* and *inferential* statistics.

*Descriptive statistics* attempt to describe a dataset. Mean, median, and mode can be regarded as a typical value – a measure of central tendency. In this sense, the notion of standard deviation acts as a description of how spread out (varied) the data are from the mean. It’s a measure of spread – and one in which the units are the same as the data. With data that have a normal distribution, “fatter” distributions have more variance and thus a larger standard deviation, and “skinnier” distributions have less variance and thus a smaller standard deviation. It is a measure of a *typical* deviation from the mean, and as such plays a descriptive role – describing a feature (spread) of the dataset’s distribution. But if this is the case, why don’t we divide by *n?* Well, in fact, we might – if the data comprised an entire population, and not just a sample.

*Inferential statistics* attempt to infer information from a smaller sample to an entire population. They are a “best guess”. And this, in fact, is the most common explanation for why the standard deviation formula divides by *n-1*. It has to do with the fact that this value has less to do with describing the current dataset and more to do with inferring something about the population’s dataset. If you really only want to describe the current dataset, then a true “average” deviation (dividing by *n*) would likely be better. However, dividing by *n-1* instead of *n* gives us very similar numbers (many times within hundredths of each other) – so much so that some argue for just dividing by *n* in most cases – and so the corrected sample standard deviation (dividing by *n-1*) also gives a near-descriptive statistic for a dataset. But in fact, its main purpose is inferential. Based on the arithmetic of expected values, the square of the standard deviation, *s*^2 (and not the square of the true “average” deviation), is an unbiased estimator for the population parameter, variance. (Proof Unbiased Estimator.) In general, standard deviation is referred to more often than variance – because it is simpler to grasp conceptually (i.e., same units as data) – but, in fact, it gets its calculation primarily from the fact that computing variance in this way (dividing by *n-1*) gives an unbiased estimate of the population’s variance. (The same is not quite true for standard deviation; the corrected sample standard deviation (dividing by *n-1*) gives a *better* estimate of the population parameter than the uncorrected sample standard deviation (dividing by *n*), though not quite completely unbiased.) Briefly, we note that there are times when it may be preferable to use other estimators; the maximum likelihood estimator for variance, for example, uses division by *n*, and has a lower mean squared error. The primary point, however, is that the statistics we use are often selected for their inferential ability to estimate, not just their descriptive power.

Sample standard deviation is prevalent in statistics for a variety of reasons. Firstly, it is rare that one ever has an entire population’s data. It is much more common to have a sample. And secondly, standard deviation is linked to one of the fundamental theorems in probability and statistics: the Central Limit Theorem (CLT). The CLT indicates that regardless of the underlying distribution of a population’s data, the distribution of the mean of *n*-sized (random) samples (with *n* sufficiently large, approximately greater than 30, and all independent and identically distributed random variables have the same mean, *mu*, and variance, *sigma^2*) will have an approximately normal distribution with parameters, N(*mu*, *sigma*^2/*n*). Given that the square of the sample standard deviation, *s*^2, is an unbiased estimator of *sigma*^2 (variance), and as a result of the CLT, the computation *s*/sqrt(*n*) is frequently used to provide confidence intervals for the true mean of a population. It is fairly incredible that from a single sample of, say, 100 people, we can provide with relatively high accuracy (most frequently, 95% is used) a range that contains the true population mean.

In closing, recognition that statistics are not always meant to be descriptive, instead, serving in a primarily inferential role is important. But this difference is not always that intuitive, for students or for teachers (e.g., Casey & Wasserman, 2015). This distinction must be clearer. Although the issues between standard deviation, variance, bias, and estimators are more nuanced, the broader idea that statistics are computed to have inferential meanings, and not just descriptive ones, is critical. And such key understandings must serve to guide our instruction. Otherwise we, as educators, risk providing students with smoke screens as a substitute for real reasoning and understanding.

Reference: Casey, S., & Wasserman, N. (2015). Teachers’ knowledge about informal line of best fit. *Statistics Education Research Journal, 14*(1), pp. 8-35.

For school mathematics teachers: Is this fact important to know? Is it important to be able to prove it? Does it come up in classrooms? Do students care? If it is important to know, why? What is important to know about it, if anything at all? As a professor and teacher educator interested in teachers’ knowledge – particularly in ways that more advanced mathematics becomes productive for teachers – I wrestle with these kinds of questions regularly. The proof of this fact, 0.9999… = 1, can come in a variety of forms, but it draws on the notion of infinity and limits, etc. It can be proved through arguments from analysis about convergent series (sum on k of an infinite geometric series, (9/10)^k), algebraic techniques (e.g., let x=0.9999…, then 10x=9.9999…, etc.), or by computational arguments (e.g., 1/3=0.3333…, multiply both sides by 3…). But does this constitute important knowledge for teachers? And if so, why?

Simon’s (2006) notion of a Key Developmental Understanding (KDU) has become one way that I have begun to consider such questions. Simon describes a KDU as “a change in a students’ ability to think about and/or perceive particular mathematical relationships” (p. 362). In this case, the “students” being referred to are teachers. Simon emphasizes that KDUs are not *missing* pieces of information, but rather key understandings that foster one’s ability to think about and perceive mathematical ideas and relationships. They represent ontological shifts and transformations in teachers’ available assimilatory structures – that mathematical ideas have been re-understood, re-organized, re-structured, etc. For my own interests specifically, in what ways can mathematics that is not in a local neighborhood of the content a teacher teaches influence their understanding of and/or perceptions about the content they teach?

So what key understanding is gained from knowing that 0.9999…=1? To me, one of the understandings that is important is about the structure of the real numbers. One of the reasons that students (and teachers) like decimals – as opposed to fractions – is that it always produces the same decimal expansion. In other words, a conceptually difficult issue with fractions is that 1/4=2/8=3/12=… (an infinite number of equivalent representations) gets resolved with decimals. Type 1/4, 2/8, 3/12, etc. on the calculator and they all produce 0.25. So decimals seem easier or more consistent in some ways. But does the issue really get resolved? In fact, using decimal expansions, which necessitate infinite decimal expansions (e.g., 1/3=0.3333…) comes with its own set of conceptually difficult issues. Namely, that if we agree to an infinite decimal expansion, we have to deal with infinity. Which means we have to grapple with the odd conclusion that 0.9999… is, in fact, equal to 1. And so, in reality, decimals have equivalence classes in the same way that fractions do. And not just the seemingly trivial ones, such as 0.25 = 0.2500; but, additionally, that 0.25 must also equal 0.249999… . Indeed, any terminating decimal will have such an equivalence class; repeating decimals or irrational numbers will not. For me, this is perhaps one of the fundamental understandings that comes from knowing 0.9999…=1: that the set of real numbers has a structure that contains similarities to the set of fractions, and differences. Decimal representations, like fractions, have equivalence classes, albeit different in nature than those for the set of fractions. And this is something to be grappled with. Although decimals have some advantages in terms of understanding the relative size of fractions, they, too, come with conclusions that we must accept if we are to truly understand them.

Reference: Simon, M.A. (2006). Key developmental understandings in mathematics: A direction for investigating and establishing learning goals. Mathematical Thinking and Learning, 8(4), pp. 359-371.

Two of perhaps the most current, publicized, political, and divisive issues around math education in the United States are the Common Core Mathematics Standards (CCSS-M) and Khan Academy. Both have certainly entered my everyday conversations lately.

The CCSS-M standards are currently being blamed for all that is wrong with mathematics education today. At least from adults who had a different experience of mathematics during their own childhood education. But this is flawed for at least two reasons. First, standards do not dictate how mathematics is taught; they represent, rather, what the goals of its teaching should be. In the CCSS-M, these rightly include conceptual and procedural aspects. Whether a student learns to understand these mathematical ideas by rote repetition and algorithms, familiar to many adults of today, or by the “loosey-goosey” “let the student reinvent everything” approach of more recent reforms, the standard remains. Standards do not dictate pedagogy: they outline the *goal*, **not** the *process*, of mathematics education. Much the the chagrin of politicians and parents who vent, blame, and claim otherwise. Second, the mathematics that is more familiar to parents (e.g., practicing the multiplication algorithm repeatedly), which certainly has value, also has serious drawbacks (e.g., do you really understand why the “0” gets added in subsequent lines?). Many of these claims revolve around a sense that practice learning to do a problem the “correct” way builds numeracy. Indeed, lots of practice with almost anything helps build ability and fluency. But to say that the “old math” teaching approach was better at producing more numerically-literate citizens than the “new math” reformed approach is equally incorrect. This issue is illustrated perhaps no better than by the following Khan Academy video. (Type in the password: mathematicalmusings)

In the video, Sal Khan walks students through several addition problems, one of which is the sum: 99+88. This morning happened to be the first time that I ever watched one of the elementary videos apart of the Khan Academy with my 2nd grade daughter. This nine minute video was the first. As we had done with the previous problems, I paused the video to let my daughter try to answer the problem first. As we played the video again, we talked through the “addition algorithm” Sal Khan had been using in the problems – a notation that was unfamiliar to my daughter, until today. When we got to the problem, 99+88, my daughter quickly said, “Oh that’s easy. 187.” I replied, “Great. Now remember your answer. He’s going to take a long time to solve the same problem.” We proceeded to watch the laborious and insidious application of the standard addition algorithm to produce this answer. Watching Khan talk through the algorithm was like watching someone bring a bazooka to shoot a fly. Totally. Unnecessary. This, in fact, is the problem: nothing more demonstrates lack of numeracy than applying the algorithm to solve that addition problem. It is totally unnecessary, unpractical, and, in fact, undesirable. So while my daughter might have taken more time to complete some of the previous addition problems, with a mistake or two mixed in, I at least have some confidence that she is, in fact, developing a degree of numeracy in the process.

This also brings me to the second issue, of Khan Academy. Sal Khan is perhaps one of the most influential teachers of our time, at least by the metric of the millions of students and people who have watched his videos. And Khan Academy has noble goals: a world class education, free, for everyone (with a computer and Internet). He is a pioneer, exploring the possibilities of digital education at large scale – one that uses technology, diagnostics assessments, and gaming techniques, amongst others, to accomplish that end. And the intent is powerful: trying to individualize instruction for each learner to meet them where they are at, mathematically. With that said, however, these “educational lecture videos”, which are a static form of instruction, are worrisome – they do not allow for interaction. (Not to mention a boiling down of mathematics to be some learned procedure.) Take a look at another portion of the same video. (Type in the password: mathematicalmusings)

In about the middle of his explanation, my daughter interrupted and looked at me: “Dad, that’s wrong. Right?” In fact she was right. (8:20 mark) They did not sum to “9”. (“9 is just 9 pennies, no dimes” to be precise.) They summed to 900. And this was extremely evident to my daughter – despite the additional notation on the right. She had a sense of place value to justify her thoughts. And she was right. But, like any child, an authority figure like an instructor always has authority. So she wasn’t sure. Without me being there – and immediately understanding her issue – she might have believed she was in the wrong. This is one of the fundamental issues of “static” education: it cannot respond to students, no matter how many times a video is rewinded or replayed. The explanation doesn’t change. And she was right. But she needed clarification. The educator inside me cursed Khan for so stupidly writing 1+3+5 and not 100+300+500 – amongst other things. This may perhaps be part of the point, but the larger point is one of the paradoxes of such efforts. The goal is the individualize the learning and mathematical experience for each of the millions of students, but we simultaneously try to accomplish this goal through one standardized educational program with fixed components. One that is static, and non-responsive to these millions of students – or their questions.

We have lots to learn about education. But claiming that the CCSS-M are responsible for the demise of mathematics education in our country, or that Khan Academy is providing a world class mathematics education are troubling. That’s neither to say the CCSS-M are perfect nor that Khan Academy has no value or educational potential. In fact, to me, both of these point to the same place: teachers. They will bring life (or not) to our country’s mathematics standards, and provide the interactions (or not) that help our children learn. Specifically, we need to place appropriate value on preparing educators who are mathematically ready for and able to take on the challenges of educating our youth in the classroom.

So let’s take a look at another problem. (Note: this problem was inspired by conversations with a colleague, Bill Zahner.) Let’s assume that we are going to construct an isosceles triangle. The base length is 10 *in*. The lengths of the other two sides will be a randomly chosen real number between 5 and 10 *in* (note, the side length being greater than 5 *in* guarantees forming a triangle, and anything under 10 *in* makes it non-equilateral). What is the probability that the resultant triangle is acute? obtuse? right? The figure below provides some insight – where along the perpendicular bisector of the isosceles triangle determines the classification of the triangle.

So, using geometric probability, we could determine the likelihood of forming an obtuse triangle by computing the ratio of lengths: namely, the length up to the semicircle, which is 5 *in*, divided by the entire length, which is 5√3. So P(obtuse)=5/5√3 ≈ 0.577. Similarly, P(acute)=(5√3 – 5)/5√3 ≈ 0.423, and P(right)=0. Dealing with the probability of forming a right triangle being zero, despite in fact being possible, is difficult enough; but dealing with the fact that long-term simulations of the problem put the probability of being obtuse as less than the probability of being acute, indicating that these probabilities are, in fact, incorrect, takes additional insight into and understanding about the underlying assumptions of geometric probabilities.

Using geometric probability *assumes* that every point in the space is equally likely – in other words, that the distribution of outcomes is uniform. In this case, the “space” is the line segment (along the perpendicular bisector) for where the third vertex of the triangle could be. Although it is possible for the third vertex to fall anywhere along this line, how these points fall on that line is, in fact, not uniformly distributed. The video below models the situation in motion, indicating that the third vertex being near the base is much less likely than other places – in other words, the possible vertex points are not uniformly distributed. Using geometric measurements to determine the probability is inappropriate.

http://vimeo.com/105967203 (password: mathematicalmusings)

So, how do you determine the real probability? There are a few ways, one of which uses the random (uniform) distribution of lengths of the sides being between 5 and 10 *in*; however, another way is to determine the probability density function for the height (*x*) of the third vertex – which we know is not uniform – but that is: *x*/(5√(*x*^2+25)) for 0<*x*<5√3. The image below shows the plot of the actual density function (also comparing it to a uniform distribution) – the unlikelihood of the third vertex being near the base (x≈0) is evident. Determining probabilities from a density function amounts to computing the area under the curve, i.e., integrating. As it turns out, the actual probabilities are close to swapped: P(obtuse) ≈ 0.414 and P(acute) ≈ 0.586. (P(right) is still zero.)

So what do we make of this? One thing to understand is that using geometric measurements to determine probabilities has one *major* assumption: that points are uniformly distributed in the space. When this assumption is not met, using a simple ratio of geometric measurements to determine probability is inappropriate.

Looking back at our original two examples, one begins to wonder whether the probabilities we computed would, in fact, be justified…

]]>

**“What you do to the top, you do to the bottom.”** This mantra is often repeated to help students generate equivalent fractions (something they notoriously have difficulty with.) Unfortunately, the adage, while true for multiplication, does not work with addition – which can be a source of confusion.

**“Just add a zero when you multiply by 10.”** Indeed, multiplying by ten in our base ten system is frequently easy than multiplying by other numbers. However, while it is often true, the result of multiplying by 10 is not always simply adding a 0 to the end of a number – 3.2 x 10, for example, is 32.

**“Multiplying makes bigger.”** In elementary mathematics, with natural numbers, this is often the case (not with 0 or 1, though). However, this notion may make students’ future work with multiplication of fractions more difficult, since this idea does not necessarily hold. (Example from McCrory, et al. (2012).)

**“You can’t subtract a larger number from a smaller one.” **

**“Anything to the zero power is 1.”** When students first have to expand their understanding of exponents to broader number sets, in particular to those that doesn’t make intuitive sense as “repeated multiplication”, there are many ways teachers try to help students learn these ideas. Why we define 5^0 as 1 takes some genuine work. And while most numbers to the zero power are one, both 0^0 and ∞^0 are indeterminate, and have some real implications in terms of developing calculus ideas from limits.

**“Perimeter is just the sum of all the sides.”** This idea works really well with polygons. However, for circles it makes no sense. Perimeter is the distance around a two-dimensional object, which may or may not be composed of straight sides. In fact, this may be part of the difficulty transitioning students to understanding how we calculate the circumference of a circle – the relationship is a multiplicative one (a comparison), not an additive one, where one can find the total from summing smaller lengths.

**“There are half as many even numbers as whole numbers.” **While it is true that half of the whole numbers are even and half are odd numbers, comparing the relative size of infinite sets is less obvious. In fact, based on bijective mappings, what we find is the the set of even numbers has the same cardinality as the set of whole numbers. In fact, it has the same cardinality as the set of integers, and even the set of rational numbers.

**“If it fails the vertical line test, its not a function.”** This is certainly true with graphs of functions on a Cartesian coordinate system. However, move to polar coordinates and functions start having very interesting shapes (lemniscates, cardiods, limacons, rose curves), very few of which pass the ‘vertical line test’. Functions have to do with every input having a unique output – on a Cartesian coordinate system, your inputs are points on a number line (not, say, angles), which makes the vertical test useful.

These are just some examples – probably only the tip of the iceberg. The larger point is that part of the mathematical work of teaching revolves around knowing the ways that the ideas we talk about as teachers get complicated in further developments, so that we provide proper attention to the details of how we describe and conceptualize them for students, making the overarching idea explicit rather than over-reliance on mnemonic devices. If you have other examples to share, post a comment!

Reference: McCrory, Floden, Ferrini-Mundy, Reckase, & Senk (2012). Knowledge of algebra for teaching: A framework of knowledge and practices. Journal for Research in Mathematics Education, 43(5), pp. 584-615.

]]>Counting problems, while often simple to state, can span the spectrum of difficulty – from ridiculously easy to insanely complex. Often, however, one of the tools of the trade in counting problems is counting an “analogous” problem. This process, however, requires functions, and in particular, bijections: if there is a bijective function between two sets, then the two sets have the same cardinality (or size). A few thoughts about how understanding more abstract sets and classifications for functions can be important are discussed below.

1.The handshake problem. In a group of 10 people, if everyone shakes hands with everyone else, how many total handshakes are given?

This is a familiar problem. Having done this with high school students, one of the common ways of approaching this problem is to physically model the handshakes. Done in a systematic way, this often results in the series, 9+8+7+…+3+2+1 = 45. This is a nice way to solve the problem, and which can be a nice way of introducing arithmetic series. However, in the context of combinatorics, one very powerful way is to model this problem differently. In particular, it is to create a bijective function, mapping every “physical” handshake to an ordered pair. If we assume the first 10 letters of the alphabet represent the 10 people in the room, then every handshake involves two people, and we can list the handshake between A and F as (A,F). (In particular, since we only want “one” handshake, we only want, say, ordered pairs in alphabetical order, so that (A,F) is counted, but not (F,A).) We can verify that this is a bijective function by clarifying that “every handshake” will map to an ordered alphabetical pair, and every ordered alphabetical pair will be mapped to from some handshake. Indeed, this is the case. The power in doing so is that we no longer have to think about the physical activity of shaking hands. By counting a different problem, which began by mapping the objects we wanted to count to a set of more countable objects – how many ordered alphabetical pairs are there with the first ten letters of the alphabet – we can make a conclusion about the total number of handshakes. Since there are 10C2 such ordered pairs, then there must be 10C2 (or 10*9/2! = 45) handshakes.

2. How do we know that 7C2 is the same as 7C5 (without relying on the formulas)? Or, more generally, why is nCk = nC(n-k)?

Based on the formula for a combination, it is easy to conclude that these two quantities are the same size. However, it may be less obvious why there are the same number of ways to form a “pair” as a “group of 5” from a room with seven people. There are analogies that can help explain, but the analogies often make use of bijective functions. The set of “pairs” of people look like: {(A,B), (A,C), (A,D), …(F,G)}. The set of “group of 5” look like: {(A,B,C,D,E), (A,B,C,D,F), … (C,D,E,F,G)}. How do we know there are the same number of elements in each set? The easiest way is to find a bijective function between these two “sets” of objects. Indeed, while there are many, perhaps the easiest to justify is to map (A,B) -> (C,D,E,F,G), (A,C) -> (B,D,E,F,G), etc., effectively mapping the “pair” to the “remaining people left”. Indeed, every pair formed has exactly 5 people not chosen, which means every “pair” will map to some element in the “group of 5 set”; similarly, every “group of 5” formed will be mapped from, because every “group of 5” has exactly 2 people not chosen, which will be the “pair” that maps to it. Thus, this mapping between these sets of objects can be used to verify that sets of 7C2 and 7C5 have the same cardinality. (The analogy being that for every group of 2 selected, there are 5 not selected – thus they must be the same size sets.) The more general argument concluding nCk=nC(n-k) follows naturally.

3. How many different subsets can be formed from a set of n elements?

One way of answering this problem is to say that each element can either be “in” or “out” of the subset, thereby creating 2^n subsets. Another argument, an inductive one, is to show that if there are 2^n subsets with n elements, there will be 2^n+1 subsets with n+1 elements. Again, this involves a bijective mapping. Say we have all the subsets with “n” elements listed – what does adding the element “n+1” do? Well, all of the subsets with “n” elements are also subsets of this new set. Additionally, there are subsets with the element “n+1” that we need to count. Considering this set, there is a bijective function between the subsets with the element “n+1” and the (2^n) subsets with “n” elements. Each subset with “n+1”, for example: (1, 3, 5, 7, n+1) can be mapped to the subset without the “n+1” element, in this case, to (1, 3, 5, 7). In this way, (n+1) gets mapped to the empty set (from the subsets with “n” elements), and, indeed, since every other subset with “n+1” has some elements from the “n” elements, there is a bijection. The conclusion, of course, is that the cardinalities of these two sets (subsets with “n” elements, and subsets including the element “n+1”) are the same: namely, they are both 2^n. Thus, there are 2^n+2^n = 2^(n+1) subsets with “n+1” elements, which completes the inductive step of the proof.

4. How many numbers between 0 and 10,000 have digits that sum to 9?

This is a relatively difficult problem, and the easiest way to solve it is approaching it as a multi-choose (e.g., stars and bars) problem. But even this can be hard to follow and conceptualize. In fact, what is happening in this process is a bijection. The solution actually consists of the numbers: {0009, 0090, …, 3105, …}. But how many are there? The stars and bars method in this case is actually “mapping” each of these numbers to a 9-letter object. In particular, if we let Th=Thousands, H=hundreds, T=tens, and O=ones, then each of the solution elements can be created by a 9-string using these four letters (again, we will consider these written as “alphabetical” based on digits): ThThHHTTTOO maps to 2,232. And because each of these strings has 9 letters, we can verify that the sum of the digits will be equal to 9 in every case; and since every number between 0 and 10,000 will have a certain number (between 0 and 9) of Ths, Hs, Ts, and Os, then there is a bijection between these two sets. Therefore, we can instead count the number of 9-strings, from four “letters” (where letters can repeat and order does not matter – aka, we only consider “one” of these strings – namely, the “alphabetical” based on digits one), which amounts to 12C3 ways. But the key to answering the original question lies in transforming it, modeling it, mapping it, to a collection of objects that is easier to count.

5. Random variables in statistics

Lastly, another common application of functions comes from statistics. In fact, random variables *are* functions. In particular, they are functions that map the outcome set to the set of real numbers. For example, the outcome set for flipping a coin three times consists of the following elements – effectively the “data” from the experiement: {HHH, HHT, HTH, HTT, THH, THT, TTH ,TTT}. Yet often what is of interest is a random variable, something like, say, X=the number of heads. Accordingly, this function, X, then maps each of the elements in the set to a number: 0, 1, 2, or 3 (i.e., HHH -> 3, whereas HTT -> 1). Indeed, understanding this relationship is critical to properly conceptualizing aspects of probability. In the original outcome set each of the eight outcomes are equally-likely, whereas because of the functional mapping process the random variable outcomes (0, 1, 2, 3) are not equally-likely. Notably, this mapping is not a bijection, since 1, for example, is mapped to from three different elements (HTT, THT, TTH) – which is in fact the reason that getting 1 head is three times as likely as getting 0 heads.

These examples are by no means the only ones. However, they do perhaps provide some real applications and uses for understanding sets and functions more generally (particularly bijective functions), and they can provide a relatively natural contexts – counting problems – for becoming increasingly familiar with these concepts. Much of the time when we “solve a related” counting problem, we end up applying a bijective function – whether we are aware of it or not – as the means to help verify that we have counted correctly.

]]>(password: mathematicalmusings)

Technically, since the most common objects of study are *right* pyramids, the argument from this visualization, which depicts three oblique pyramids, relies on Cavalieri’s principle (which concludes that the volume of right and oblique pyramid – same base shape and height – have equal volumes by comparing the areas of every cross section). In addition, while for a cube the three pyramids will be identical, for the right rectangular prism depicted in the video, the three pyramids are all different, yet all have the same volume (based on the three lengths/dimensions being the same). This means that the volume of a right pyramid is one-third of its prism.

While this visualization works nicely for rectangular pyramids, the exact same visual does not necessarily help with other polygonal (e.g., hexagonal) pyramids or cones, since extracting three identical pyramids from one of those prisms is more complicated. Yet for helping students understand the overarching relationship between the volumes of right prisms and pyramids, the visualization of the rectangular prism and pyramids does provide some meaningful validation of the relationship.

]]>(password: mathematicalmusings)

Building on this idea, another similar GSP document could be used to develop a functional perspective on the measurement of circumference (in relation to its diameter) as well.

(password: mathematicalmusings)

]]>In order to communicate why these individual properties are important collectively, one possible activity that I have used is to elaborate on solving simple, single-step, equations. In class, it is common to use a “single step” to solve a simple equation, for example, x+5=12; however, there are actually four assumptions being made about how the operation addition works on the set of real numbers. These assumptions, collectively, are important for algebraic reasoning.

*x *+ 5 = 12

(*x *+ 5) + -5 = 12 + -5

*x *+(5 + -5) = 12 + -5* (Associativity (of addition on *R*))*

*x *+ 0 = 12 + -5* (Inverse elements (of addition on *R*))*

*x * = 12 + -5* (Identity element (of addition on *R*))*

*x *= 7* (Closure (of addition on *R*))*

Without these assumptions, of associativity, inverse elements, identity element, and closure, the algebraic solving process may not generalize. While in this case we make use of -5, in the more general case, it would have to be true that every element had an inverse element. Similarly, while it is 0 under addition, without an identity element the solving process would loop infinitely: the identity element is the key to transforming an unknown sum (x+5) into a known sum (x+0). Also, while in this case the sum of 12 and -5 is a number, in the more general case, it would have to be true that the sum of any two elements produced another number. (We note that commutativity is not required to be a group, but is required for other important algebraic structures, such as a field.)

Indeed, perhaps the more powerful illustration of these four properties working collectively as a foundation for algebra is to solve an equation on an unfamiliar set and operation. For example, what is the solution to the equation *X ° RX *=* R*2? (Based on the operation table below, e.g., *R1**° RY *=* RZ*.)

While searching the table for the solution is a valid approach to find the value for *X*, this more abstract case can be used to present the collective impact of these four axioms of a group. First, I note that this table is analogous to the composition of the symmetries of a triangle, which indeed is a group. (Another option is to verify each of these four properties based only on the table above.) The operation is closed on this set (as the composition of any two elements forms an element in the set); the identity element is *R0*; each element has a unique inverse element that produces *R0*; and composition is associative. Therefore, to find *X* algebraically, it is necessary only to apply each of these properties in turn, and compute the result of one composition (*R*2* ° RX*), to solve for *X*.

*X ° RX *=* R*2

*(X ° RX) ° RX *=* R*2* ° RX
*

*X ° (RX ° RX) *=* R*2* ° RX (Associativity (of composition on the set of triangle symmetries))*

*X ° R*0 = *R*2* ° RX (Inverse Elements (of composition** **on** the set **of triangle symmetries**))*

*X *=* R*2* ° RX ** ** ** ** * *(Identity Element (of composition on the set **of triangle symmetries**))*

*X *=* RZ (Closure (of composition on the set **of triangle symmetries**))*

Indeed, as evident from the example above of solving a simple equation, it is not just the individual arithmetic properties – such as associativity, identity element, inverse elements, closure – that are meaningful in mathematics, but rather their collective importance as they become a necessary structure for algebra and algebraic reasoning. It is this basic structure of a group that turns the process of “guess and check” (which was the natural inclination for searching the table in the abstract example, as well as students’ tendency when first introduce to solving equations) into systematic reasoning, based on the collection of arithmetic properties.

]]>