Analytical techniques

Approx 8 minutes

In this lesson, we introduce the basics of descriptive statistics, explaining different types of variables, key summary measures, and simple ways to organize and visualize data in order to better understand patterns and distributions.

Lesson Objective

To learn how to describe and summarize data using basic descriptive statistics tools.

Variables and data types

All analytics begins with recognizing what type of data you are dealing with.

A categorical variable classifies into groups: section, device, country, content type.
A numerical variable expresses quantities: reading time, page views, number of users, conversions.

Within numerical variables, a distinction can be made between discrete variables, which take countable values, and continuous variables, which can take a broader range of values.

This distinction matters because not all data is summarized or represented in the same way. A device type cannot be meaningfully averaged; reading time can be summarized with certain measures.

Levels of measurement

It is also useful to know, at least at a basic level, the levels of measurement:

• Nominal: categories without order, such as country or device type
• Ordinal: categories with order, such as satisfaction level or priority
• Interval: scales with meaningful differences but without an absolute zero
• Ratio: numerical variables with a real zero, such as time, users, or revenue

It is not necessary to turn this into an overly theoretical lesson. It is enough for the student to understand that the type of variable determines which operations and charts make sense.

Frequency tables

A frequency table summarizes how many times each value or category appears. It is one of the simplest and most useful tools for exploring a dataset.

For example, if 100 news articles are classified by section, a frequency table may show how many belong to Politics, Sports, Local, Culture, or Economy. That table already allows the detection of compositions, production biases, or concentration areas.

In numerical variables, frequencies also help visualize distributions by grouping values into ranges.

Measures of central tendency

Measures of central tendency help summarize a dataset into a representative value.

The mean is the arithmetic average. It is useful but can be affected by extreme values.

The median is the central value of an ordered distribution. It is especially useful when there are outliers or asymmetric distributions.

The mode is the most frequent value. It is mainly interesting in categorical variables or in distributions where repetition matters.

In editorial consumption data, the median is often very valuable because many user behaviors do not follow balanced distributions. A small group of pieces or users may concentrate a disproportionate share of total volume.

Measures of dispersion

Knowing the central value is not enough. It is also useful to know how much the data varies.

The range shows the distance between the minimum and maximum value.

Variance and standard deviation express how far values deviate from the mean.

In practical terms, dispersion helps answer questions such as:
Are results relatively stable or highly heterogeneous?
Are reading times similar across pieces or do they vary widely?
Does a high average reflect general behavior or only a few extraordinary cases?

Percentiles and quartiles

Percentiles allow a value to be positioned within a distribution. Saying that an article is in the 90th percentile of reading time means it performs better than most pieces according to that metric.

This logic is very useful in editorial contexts because it allows comparisons without relying only on averages. It is also useful for defining thresholds: top 10%, upper quartile, lower half, and so on.

Basic representations

A histogram helps visualize how a numerical variable is distributed.

A box plot summarizes distribution, median, dispersion, and possible outliers.

A bar chart is useful for comparing categories.

A line chart works well for temporal evolution.

A scatter plot allows exploration of the relationship between two variables.

A cross table (or contingency table) is very useful for comparing categories with each other, for example section by device, traffic source by content type, or user segment by conversion.

Try it yourself

Below are the average reading times for 10 articles published last week:

2:10 · 1:45 · 3:20 · 2:05 · 12:45 · 1:50 · 2:40 · 2:15 · 1:55 · 2:10

(Tip: convert to seconds first to make the calculation easier — 2:10 = 130 seconds, and so on.)

Step 1 — Calculate the mean. Add all values, divide by 10.
Step 2 — Find the median. Sort the values from lowest to highest, then identify the middle value.
Step 3 — Compare.

Consider:

Are your mean and median very different? Why?
Which number better represents the “typical” article in this dataset — and why?
Which single article is almost certainly responsible for the gap? What might explain it?
If you were reporting to your editor on “how long people are reading,” which figure would you use — and how would you explain your choice?

(Mean ≈ 3:17 · Median = 2:10)

This is one of the most common misreadings in editorial analytics. A single outlier can make your data look very different from your actual reality.

Lesson Conslusion

Descriptive statistics does not aim to predict the future or explain every phenomenon. It pursues something more basic and essential: accurately describing what we have in front of us. This is the indispensable foundation for any further analysis.

Check your understanding

Each question has one right answer. If you don't get it on the first try, retry as many times as you need.

Which measure is least affected by extreme values?

Mean Median Variance

Create an account to earn points

Just want to read the lessons? Skip the test and continue to the next lesson.