Properties of Probability: Derived Rules and Formulas

Introduction

In the previous post, we learned about Kolmogorov's axioms — the three foundational rules of probability. But how do we actually use these axioms?

That's where properties of probability come in. These are rules and formulas derived directly from the axioms. They tell us:

How to handle opposite events
How to combine probabilities of different events
How to reason about conditional probabilities
When and how to multiply probabilities

In this post, we'll explore the major properties of probability, prove them from the axioms, and see how to apply them in practice.

Property 1: Probability of the Complement

The complement of event A (denoted A' or A^c) is the event that A does NOT occur.

Complement Rule

P(A') = 1 - P(A)

Proof: The event space can be split into A and A' (mutually exclusive). By Axiom 3: P(A ∪ A') = P(A) + P(A'). By Axiom 2: P(A ∪ A') = 1. Therefore: P(A) + P(A') = 1, so P(A') = 1 - P(A).

Example

If the probability of rain tomorrow is 0.3, then the probability of no rain is:

P(no rain) = 1 - P(rain) = 1 - 0.3 = 0.7

Property 2: Probability of the Impossible Event

Impossible Event

P(∅) = 0

The probability of an impossible event (the empty set ∅) is zero.

Proof: The empty set is the complement of the entire sample space. P(∅) = 1 - P(S) = 1 - 1 = 0.

Example

Rolling a standard die and getting a 7:

P(rolling a 7) = 0  (impossible on a 6-sided die)

Property 3: Bounded Probability

Probability Bounds

0 ≤ P(A) ≤ 1 for all events A

Probabilities are always between 0 and 1 (inclusive).

Proof: From Axiom 1: P(A) ≥ 0. We can't exceed the entire sample space, so P(A) ≤ P(S) = 1 (by Axiom 2). Thus: 0 ≤ P(A) ≤ 1.

Property 4: Monotonicity

Monotonicity

If event A is a subset of event B (all outcomes in A are in B), denoted A ⊆ B, then:

P(A) ≤ P(B)

Proof: If A ⊆ B, then B can be split into A and (B \ A). By Axiom 3: P(B) = P(A) + P(B \ A). Since P(B \ A) ≥ 0 (Axiom 1), we have P(B) ≥ P(A).

Example

Drawing from a deck of cards:

A = Drawing a heart
B = Drawing a red card

Since all hearts are red: A ⊆ B
Therefore: P(heart) ≤ P(red)
Verification: P(heart) = 13/52, P(red) = 26/52
13/52 ≤ 26/52 ✓

Property 5: Addition Rule (General Form)

General Addition Rule

For any two events A and B:

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Proof: A ∪ B can be decomposed into three mutually exclusive parts: A only, B only, and A ∩ B. By Axiom 3, we add these. Working through the algebra leads to the formula above.

Why the "-P(A ∩ B)"?

When we add P(A) + P(B), we count the overlap (A ∩ B) twice. So we subtract it once to correct for this double-counting.

Special Case: Mutually Exclusive Events

When A and B cannot both occur (mutually exclusive), P(A ∩ B) = 0, so:

P(A ∪ B) = P(A) + P(B) [when A and B are mutually exclusive]

Example

Drawing a card: What's the probability of drawing a King or a Heart?

A = Drawing a King (4 cards)
B = Drawing a Heart (13 cards)
A ∩ B = Drawing a King of Hearts (1 card)

P(King ∪ Heart) = P(King) + P(Heart) - P(King of Hearts)
                = 4/52 + 13/52 - 1/52
                = 16/52
                ≈ 30.77%

Property 6: Inclusion-Exclusion Principle

Three Events

For three events A, B, and C:

P(A ∪ B ∪ C) = P(A) + P(B) + P(C) - P(A ∩ B) - P(A ∩ C) - P(B ∩ C) + P(A ∩ B ∩ C)

Intuition

We add all individual probabilities, subtract pairwise overlaps (to avoid double-counting), then add back the triple overlap (which we subtracted too many times).

Example

Survey problem: Among 100 people, 50 drink coffee, 40 drink tea, 30 drink juice. 20 drink both coffee and tea, 15 drink both coffee and juice, 10 drink both tea and juice, and 5 drink all three. How many drink at least one?

P(C ∪ T ∪ J) = 50 + 40 + 30 - 20 - 15 - 10 + 5 = 80

80 out of 100 people drink at least one beverage.

Property 7: Conditional Probability

Conditional Probability Definition

The probability of A given that B has occurred:

P(A|B) = P(A ∩ B) / P(B) [where P(B) > 0]

Interpretation

We're restricting the sample space to only outcomes where B occurred, then asking: what fraction of those also satisfy A?

Example

Drawing cards without replacement: Draw two cards. Given that the first is an Ace, what's the probability the second is also an Ace?

P(2nd Ace | 1st Ace) = (3 remaining aces) / (51 remaining cards)
                      = 3/51 ≈ 5.88%

Without knowing the first card was an Ace, it would be 4/52 ≈ 7.69%. The information changed the probability!

Property 8: Multiplication Rule

Joint Probability

The probability of both A and B occurring:

P(A ∩ B) = P(A) × P(B|A)

Or equivalently: P(A ∩ B) = P(B) × P(A|B)

Special Case: Independent Events

When A and B are independent (one doesn't affect the other):

P(A ∩ B) = P(A) × P(B) [when A and B are independent]

Example

Coin flips: Two independent coin flips. What's the probability of two heads?

P(heads on flip 1) = 1/2
P(heads on flip 2 | heads on flip 1) = 1/2  (independent!)

P(two heads) = 1/2 × 1/2 = 1/4 = 25%

Property 9: Law of Total Probability

Total Probability

If B₁, B₂, ..., Bₙ partition the sample space (mutually exclusive and exhaustive), then:

P(A) = Σ P(A|Bᵢ) × P(Bᵢ)

What It Means

To find P(A), we consider all ways A can happen through each Bᵢ, weight by P(Bᵢ), and sum.

Example

Disease testing: A disease affects 1% of the population. A test is 95% accurate. What's the probability a random person tests positive?

P(positive) = P(positive|disease) × P(disease) 
            + P(positive|no disease) × P(no disease)
           = 0.95 × 0.01 + 0.05 × 0.99
           = 0.0095 + 0.0495
           = 0.059 = 5.9%

Property 10: Bayes' Theorem

Bayes' Theorem

P(A|B) = P(B|A) × P(A) / P(B)

Why It Matters

Bayes' Theorem lets us reverse conditional probabilities. If we know P(B|A), we can find P(A|B). This is foundational for machine learning, medical diagnosis, and Bayesian inference.

Example

Medical diagnosis: If a person has the disease, there's a 95% chance they test positive. If they don't have it, there's a 5% chance of a false positive. Given that someone tests positive, what's the probability they actually have the disease?

P(disease|positive) = P(positive|disease) × P(disease) / P(positive)
                     = 0.95 × 0.01 / 0.059
                     ≈ 16.1%

Only 16.1% chance! This is the base rate fallacy — even with a positive test, 
it's unlikely the person has the disease when it's rare.

Property 11: Independence

Independence Definition

Events A and B are independent if:

P(A|B) = P(A)

Equivalently: P(A ∩ B) = P(A) × P(B)

What It Means

Knowing that B occurred doesn't change the probability of A. The events don't influence each other.

Example

Die rolls: Two independent die rolls.

P(second roll is 3 | first roll is 3) = P(second roll is 3) = 1/6

The first roll doesn't affect the second.

Quick Reference: Summary of Properties

                Key Properties:
                Complement: P(A') = 1 - P(A)
Impossible: P(∅) = 0
Bounds: 0 ≤ P(A) ≤ 1
Monotonicity: A ⊆ B → P(A) ≤ P(B)
Addition: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Conditional: P(A|B) = P(A ∩ B) / P(B)
Multiplication: P(A ∩ B) = P(A) × P(B|A)
Total Probability: P(A) = Σ P(A|Bᵢ) × P(Bᵢ)
Bayes: P(A|B) = P(B|A) × P(A) / P(B)
Independence: P(A ∩ B) = P(A) × P(B)

            

How These All Connect

These properties form a unified system derived from the three axioms:

Properties 1-4 establish the basic structure (bounds, complements, ordering)
Properties 5-6 tell us how to combine independent events
Properties 7-10 handle dependence and conditional information
Property 11 identifies when events are independent

Conclusion

The properties of probability aren't arbitrary rules — they're logical consequences of Kolmogorov's axioms. Understanding where they come from gives you confidence in using them and helps you remember why they work the way they do.

These properties are the tools you'll use constantly in statistics, machine learning, and data science. Master them, and you'll understand the foundation of probabilistic reasoning.

Pro tip: When solving a probability problem, identify which property applies, then use it systematically. Most problems reduce to these 11 properties and their combinations.

Questions about probability properties? Reach out on X/Twitter or GitHub. Happy to discuss!