Introduction to Time Series Forecasting in R

Posted by Paurush Gaur

Time is the most critical aspect of a business. It has the power to make or break any business, making proper utilization of time all the more crucial. It is, however, often difficult to keep up with the pace of time.  Technological advancements, though, have other plans. Today, we have several powerful methods to ‘predict’ or ‘see things’ ahead of time. A majority of these methods use the huge volume of data collected over time. One such model is ‘Time Series Forecasting,’ which works on time-based data (years, days, hours, and minutes) to derive hidden insights and help make informed business decisions.

In a two-part series, we will learn all about the time series model and its implication in R. This is the first part, which covers the basic of time series model, its components, and forecasting.

Let’s get started.


Before we even get into the nits and bits of time series, let us learn about forecasting, the different types and their importance in business.

Forecasting is a technique to estimate future values based on historical and current data. Time is the key component in forecasting and data values should be time dependent. Forecasts are a crucial part of the business, as they translate past data or experience into estimates for short-term and long-term decisions.

Forecasting techniques are broadly classified into two categories: quantitative or objective and qualitative or subjective. Quantitative forecasting involves analysis of historical data to anticipate future data points while qualitative forecasting involves seeking expert advice to generate projections. Time series forecasting is a part of quantitative forecasting methods.

What is a Time Series?

Time series is a sequence of data points that are measured at equally spaced intervals over a period of time. For example, weekly sales of cars, monthly profit of a company, monthly product orders, etc. Time series generally assists with statistics, pattern recognition, weather forecasting, communication engineering, and different domains of applied science & engineering. Line charts are the most common form of representation of time series.

While time series analysis includes methods to analyze the data to extract meaningful statistics and other characteristics, time series forecasting uses models that help predict future values based on historical data.

Time Series Components

Trend {T (t)}: in time series, a trend is defined as a long-term movement in values without calendar related and irregular effects, reflecting the underlying levels. Population growth, price inflation, and general economic changes are some examples of trends.

Seasonality {S (t)}: seasonality, in a time series, is a pattern that repeats at regular intervals. Consistently spaced peaks and troughs in one direction with approximately the same magnitude, relative to the trend, indicate seasonality. Sales figure of festive items during festival season and sale of umbrellas during the rainy season are some prime examples of seasonality.

Random Variation/Irregular {I (t)}: random or irregular component is what remains in the time series after estimation and removal of trend and seasonal components. It is a result of short-term non-systematic variations, in the series. Such fluctuations dominate highly irregular series. Seismic recordings are highly irregular.

Important Characteristics to Consider

Since a time series contains a plethora of information in the patterns and lines, it is crucial to accurately read the related graphs. Highlighted below are a few aspects that need immediate attention.

  • Is there a trend? Meaning that, on an average, the measurements tend to increase (or decrease) over time.
  • Is there seasonality? Meaning that there are regularly repeating pattern of peaks and troughs related to calendar time such as seasons, quarters, months, days of the week, and so on.
  • Is there a constant variance over time or the variance is non-constant?

Stationary Series

A stationary series is one where:

  • The mean and variance are constant over time. Here the mean of the series is not a function of time rather it is constant. The below image perfectly depicts the situation, wherein the left-hand graph satisfies the condition while the right-hand graph has a time-dependent mean.



  • The variance is not a function of time. This characteristic is known as homoscedasticity. The following graph perfectly depicts what is and what is not a stationary series.


Terminologies Related to Time Series

White Noise

A white noise process is a sequence of random variables with a constant mean value of zero, a constant variance, and no correlation between its values at different times.


Autocorrelation refers to the correlation of a time series with past and future values.

Time Series Decomposition

Within a given time series, the underlying patterns can be decomposed into sub-patterns to identify component factors that influence every value in a series.

The mathematical representation of the decomposition approach is:

  • Yt is the time series value (actual data) at period t
  • St is the seasonal component at period t
  • Tt is the trend component at period t
  • Et is the irregular (remainder) component at period t

Multiplicative and Additive are the two different types of models used for a time series.

Multiplicative Model: Y(t) = T(t) × S(t) × I(t)
Additive Model: Y(t) = T(t) + S(t) + I(t)

Ending Notes

Time series forecasting in R is a vast subject. We hope the blog will help you get a good understanding of time series, in particular.

Did you find the article useful? Let us know your thoughts in the comments below.

Our next blog in the series we will cover 10 steps to build the time series model in R.





Related Posts

  • 10 Steps to Build a Time Series Model10 Steps to Build a Time Series Model

    As promised in our previous blog “Introduction to Time Series Forecasting in R,” we are back with our next installment to build a time series model. Time series modeling is…

  • Understanding Memory Tuning in JVM- A Case Study and Analysis

    JVM Heap Model The JVM heap model consists of the Young generation and the Old generation memory. The newly created objects are allocated to the young generation memory, as they…

  • Manager’s Dilema: SAS vs R vs Python

    There are countless articles on this topic already, and I must begin by accepting that I am quite late to this superstar battle. However, every time these champions of analytics…

  • KAFKA-Druid Integration with Ingestion DIP Real Time Data

    The following blog explains how we can leverage the power of Druid to ingest the DIP data into Druid (a high performance, column oriented, distributed data store), via Kafka Tranquility…

  • Content Data Store

    Content Data Store Content Data Store (CDS) is a system to provide storage facilities to massive data sets in the form of images, pdfs, documents and scanned documents. This dataset…

  • Infinispan Cache: In a NutshellInfinispan Cache: In a Nutshell

    One of the hottest new additions to JBoss galaxy is Infinispan. It is an open source, Java-based data grid platform. Data grids are highly concurrent distributed data structures that allow users to address large…

Leave a Reply

Your email address will not be published. Required fields are marked *