3

I am not sure if I can use the words binomial and binary and boolean as synonyms to describe a data attribute of a data set which has two values (yes or no). Are there any differences in the meaning on a deeper level?

Moreover, if I have an attribute with three possible values (yes, no, unknown), this would be an attribute of type polynominal. What further names are also available for this type of attribute? Are they termed as "symbolic"?

I am interested in the realtion between the following attribute type: binary, boolean, binominal, polynominal (and alternative describtions) and nominal.

ABCD
  • 1,411
  • 1
  • 8
  • 14
user3352632
  • 273
  • 1
  • 8

2 Answers2

1

@SmallChess's answer is a good start, but there are some additional parts to the question.

binary variables or binary data consist of data with the values 0 or 1, and no other values. We usually don't talk about "binary distributions", because it's only data, variables, or outcomes that can be binary. A distribution might produce binary data, but is not itself binary because its parameters typically take on real-values.

A binomial distribution is a distribution that produces binary data. In particular, it is a random process that produces the value 1 with probability $$p$$, and the value 0 with probability $$1-p$$. Notice that although it makes binary data, it is not itself a kind of data, and is in fact charactorized by a non-binary number (p).

Boolean data takes on the values true or false. Often, but not always, these are stored as 0's and 1's. The distinction is that boolean data may not be stored numerically. There might also be different expectations about how Boolean data should be processed (for instance, $$true + true = true$$, but $1 + 1 = 2$.

I am not aware of the term polynomial being applied to data. However, multinomial distributions are probability distributions that produce 0 with probability $p_0$, 1 with probability $p_1$, 2 with probability $p_2$, and so on, producing $p_n$ with probability $1 - \sum_{i=0}^{n-1} p_i$ for $n$ different numbers. Like binomial distributions, multinomial distributions are characterized by a set of real-valued numbers, and are distinct from the kind of data they generate.

Categorical data takes on values from a set of categories. The example you give (yes, no, maybe) is not strictly multinomial data, but could be generated from a multinomial distribution by mapping the values 0, 1 and 2 onto yes, no and maybe. Note again that categorical data might be non-numeric. Operations like adding might be non-sensical.

Cardinal data isn't something you asked about, but arises when data can be nicely ordered. For example, playing cards are easily mapped to the numbers 1-13, and can have reasonable semantic meaning when represented this way (e.g. A + 2 = 3, and 1 + 2 = 3).

Nominal Data is just literal numbers that mean exactly what they purport to mean. For example, if you store the number of cans of beer a customer purchased, that would be nominal data.

John Doucette
  • 9,147
  • 1
  • 17
  • 52
  • Thanks for this great overview! The tool rapidminer uses the terms binominal and polynominal as attribute types. Do you think these are the "right" terms for attribute type? For instance, the attribute type "Integer" is used to describe natural numbers as well as their additive inverses (in the tool rapidminer). From this point of view, I think it would be also to correct to name the type of an attribute with two values binary or boolean. Do you agree? – user3352632 Sep 05 '18 at 06:52
  • @user3352632 I'm not familiar with rapidminer, but "polynomial" is definitely a non-standard term. If someone told me they had "polynomial data", my guess would be that they had data generated from a function like y=x^3 + 2*x^2 + x, not that they had data with 3 or 4 values. If you do a google search for the term, that's what you find also. As mentioned in the answer, the terms "binary" and "boolean" have precise meanings too, and are not completely interchangeable. If you have two values that are non-numeric, and that aren't true/false, you actually have categorical data! – John Doucette Sep 05 '18 at 11:06
0

Binomial is a distribution characterised by $p$, the probability of success for an independent trial. Each sample you get from the distribution is a binary variable, 0 or 1.

ABCD
  • 1,411
  • 1
  • 8
  • 14