The Logistic Map: Visualizing Chaos in R

In the 1970s, professor Robert May became interested in the relationship between complexity and stability in animal populations. He noted that even simple equations used to model populations over time can lead to chaotic outcomes. The most famous of these equations is as follows:

xn+1 = rxn(1 – xn)

xn is a number between 0 and 1 that refers to the ratio of the existing population to the maximum possible population. Additionally, r refers to a value between 0 and 4 which indicates the growth rate over time. xn is multiplied by the r value to simulate growth where (1 – xn) represents death in the population.

Lets assume a population of animals is at 50% of the maximum population for a given area. We would allow xn to be .5. Lets also assume a growth rate of 75% allowing r to be .75. After the value xn+1 is computed, we use that new value as the xn in the next iteration and continue to use an r value of .75. We can visualize how xn+1 changes over time.

Visualizing the population with an r value of 50% and a starting population of 50%.

Within 20 iterations, the population dies off. Lets rerun the simulation with an r value greater than 1.

Visualizing the population with an r value of 1.25 and a starting population of 50%.

Notice how the population stabilizes at 20% of the area capacity. When the r value is higher than 3, the population with begin oscillating between multiple values.

Visualizing the population with an r value of 3 and a starting population of 50%.

Expanding beyond an r value of 3.54409 yields rapid changes in oscillation and reveals chaotic behavior.

Visualizing the population with an r value of 3.7 and a starting population of 50%.

Extremely minor changes in the r value yield vastly different distributions of population oscillations. Rather than experiment with different r values, we can visualize the distribution of xn+1 values for a range of r values using the R programming language.

Lets start by building a function in R that returns the first 1000 iterations of xn+1 for a given r value.

logistic_sim <- function(lamda, starting_x) {
  
  vec <- c(starting_x)
  iter <- seq(1,1000,1)
  
  for (i in iter) { 
    vec[(i + 1)] <- vec[i] * lamda * (1 - vec[i])
  }
  
  vec <- vec[1:(length(vec) - 1)]
  
  data.frame(vals = vec, lamda = lamda, iter = iter)

}

This function returns a dataframe with three columns: the iteration number, the r used for each iteration, and the xn+1 value computed for that iteration.

Now we need to iterate this function over a range of r values. Using purrr::map_dfr we can row bind each iteration of r together into a final dataframe.

build_data <- function(min, max) {
  
  step <- (max - min) / 400
  
  purrr::map_dfr(
    seq(min,max,step),
    logistic_sim
  )
  
}

Min refers to the lower limit of r while the max refers to the upper limit. The function will return a dataframe of approximately 400,000 values referring to each of the 1000 iterations for the 400 r values between the lower and upper bound. The function returns all 400,000 values in less than a quarter of a second.

With the dataframe of values assembled, we can visualize the distribution of values using ggplot.

data %>%
  dplyr::filter(
    iter > 50
 # Filtering out the first iterations to allow the simulation to stabilize
  ) %>%
  ggplot(
    aes(x = lamda, y = vals, color = lamda)
  ) +
  geom_point(
    size = .5
  ) +
  labs(
    x = "Growth Rate",
    y = "Population Capacity",
    title = "Testing Logistic Growth in Chaos"
  ) +
  scale_x_continuous(
    labels = scales::percent
  ) +
  scale_y_continuous(
    labels = scales::percent
  ) +
  theme_minimal(
  ) +
  theme(
    legend.position = "none",
    text = element_text(size = 25)
  )
Visualizing the distribution of 400 r values between 0 to 4 for 1000 iterations.

Notice how r values of less than 1 indicate the population dies out. Between 1 and just under three, the population remains relatively stable. At around 3, the populations being oscillating between two points. Beyond an r of 3.54409, chaos ensues. It becomes extremely difficult to predict the value of xn+1 for a given iteration with an r value above 3.54409. So difficult, in fact, that this simple deterministic equation was used as an early random number generator.

So what are the practical applications for this? Representations of chaos (or systems that yield unpredictable results and are sensitive to starting conditions) can be seen across many industries and fields of study. In finance, for example, intra-day security prices have been described as a random walk – extremely difficult to predict. While long term outlooks may show seasonality, chaos theory can help model the extremely chaotic and unpredictable nature of stock prices.

Python for Data Science: The 5 Things You Need to Get Started

Python is a general purpose programming language that was originally released in the early 1990’s. Over the years, it has become famous for being easy to read and learn

Like many modern programming languages, Python is open source which means it can be downloaded and used for free.

While Python is useful on its own, developers have created many packages that can be added to Python to extend it’s functionality even further.  

It is also worth noting that Python has been released in 3 major versions. While Python 2 has many active users, Python 3 is the future of Python.

If you are just getting started with Python, it is best to start with Python 3 so you are learning to work with the latest and greatest Python packages.

1: Anaconda

The Anaconda distribution of Python by Continuum Analytics is the first tool you need to get started with data science in Python.

Anaconda comes with many of the most popular Python packages in addition to an Integrated Development Environment (IDE) called Jupyter. (All for free!)

Getting started with Anaconda is straightforward. Navigate to this link to download Anaconda. Once installed, you will be greeted by Anaconda’s main screen.

Anaconda Navigator’s main screen.

From here you can launch applications and manage your Python packages.

2: Jupyter Notebooks

Being comfortable with Jupyter Notebooks is key for any aspiring data scientist.

Jupyter is famous for allowing Python developers to code in an interactive computing environment. Simply put, you can execute code as you write it.

Code in Jupyter notebooks are executed in cells. Open your own Jupyter notebook and type the following code in the first cell:

print("Jupyter is Great")

Once you write your code, press Shift+ Enter. Your code will be executed below the cell. 

Executing code from the first cell.

In the second cell, enter new code and press Shift + Enter.

Code in the second cell executed after the first cell.

Code in the second cell was executed separately than the first cell.

Jupyter is a powerful tool for data science. As you begin to use it more, its benefits will become even more apparent. 

3: Pandas

Pandas is a free Python package that allows developers to import, manipulate, and visualize data in tables called dataframes. Often times, you can complete work typically done in spreadsheets much faster in Pandas.

If you installed Anaconda, launch a Jupyter notebook to get started with Pandas. To use any Python package, you need to import the package.

In the first cell of your Jupyter notebook, type the following code and press Shift + Enter.

import pandas as pd

Now you are ready to import data into Pandas. I added a .csv file to the same folder that my Jupyter notebook is stored in. Execute the following code to import the data and save it as a variable titled df.

df = pd.read_csv('stocks.csv')

Once your data imports, you can execute the following command to view the first 5 rows of your data.

df.head()
Viewing the head of my data.

This just scrapes the surface of what Pandas is capable of.

4: Matplotlib

Matplotlib, like Pandas, is another Python package that is free to use. Matplotlib is used for visualizing data – or in other words, making graphs!

Visualizing your findings is the most important part of data science. If you are unable to communicate your findings to others, then your effectiveness as a data scientist is a limited.

Matplotlib is already installed with Anaconda. You can import it with the following code:

import matplotlib.pyplot as plt

I am plotting my dataframe (called with df) with this code:

df.plot(kind='bar',x='date',y='price',color='green',alpha=.25,ylim=(100,110), figsize=(12,8))

plt.show()

The result should look something like the following:

The bar graph of stock prices.

Matplotlib can customize graphs much more than this example. This example covered basic plotting, colors, transparency (alpha), axis limiting, and figure sizing.

5: Data

Data analysis is only as good as the data being used in the analysis. It is important that the data you use in your own work is structured and accessible.

Level of StructureDefinition
UnstructuredDatasets that do not conform with any unified format. Ex. Audiofiles, text, pictures.
Semi-structured Datasets that do not conform with the formal structure of a database or spreadsheet (rows and columns), but is organized by tagging or markers. Ex. Word Document with Comments
StructuredDatasets that conform with the formal structure of databases or spreadsheets (rows and columns). Data in this format can be used for rapid calculation and other forms of computation. Ex. a SQL Database

Structured data is ideal for nearly every data science application. Data in this format, however, can be difficult, costly, or time consuming, to collect on your own. 

For those starting out in data science, there are many free-to-use data sources available online.

Some of my favorites include the Census, the American Community Survey, stock prices, Zillow research, and Google Trends

Data science will continue to evolve. As our analysis tools improve, our need for such structured data may decline. Many data scientists are already using tools like Natural Language Processing and Computer Vision to analyze data in unstructured formats.