Dataset source: https://www.kaggle.com/akhilv11/border-crossing-entry-data/data#
From the source: The Bureau of Transportation Statistics Border Crossing Data provides summary statistics for inbound crossings at the U.S.-Canada and the U.S.-Mexico border at the port level. The data reflects the number of vehicles, containers, passengers or pedestrians entering the United States via ports of entry.
Port Name
-Name of the Port of Entry.
State
-US State that the port is in.
Port Code
-Port code as given by Customs and Border Protection (CBP)
Border
-Differentiates between the US-Canadian border and the US-Mexican border.
Date
-Year & Month
Measure
-Whether the imbound traffic was a personal vehicle, container, truck, etc.
Value
-Count
Location
-Longitude and Lattitude
Importing important libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
First 10 rows:
data = pd.read_csv('border.csv') # Read into CSV data file
data.head(10) # Show the first 10 rows
The port at El Paso, Texas was the only port shown when sorting the data for the busiest port crossings into the U.S.
Almost 4.5 million people entered the US in El Paso in March of 2001 via Personal Vehicles, and the months slightly before and after aren't too far off that number either.
busyPorts = data.sort_values('Value', ascending=False).head(15) # Shows the top 15 imbound crossings at ports of entry
busyPorts # by Value (count)
crossPerYear = data.groupby("Date")["Date"].count() # Count the number of crossings for each month
plt1 = crossPerYear.plot.bar() # Plot it
plt.title('Border Crossings Per Year');
plt.ylabel('Number of Crossings (in thousands)');
plt.xticks(np.linspace(0, 275, 24))
plt1;
This graph shows the number of rows that reference a Canadian border port or a Mexican border port.
borderCross = data.groupby("Border")["Border"].count()
plt2 = borderCross.plot.bar();
plt.ylabel("Rows");
plt.title("Rows referencing Canada vs. Mexico");
A vast majority are Canadian border ports, which makes sense given that our northern border is much longer than our southern border in terms of pure distance.
Also, keep in mind that the data is given in rows, which does not mean there are over 250,000 ports. Because there is a different row for every Measure, the actual numbers are much smaller than that for each, but the proportions are the same.
stateData = data.groupby("State")["State"].count() # Group the data by state, and get the total count.
plt3 = stateData.plot.barh(); # Plot it
plt.ylabel("State");
plt.xlabel("Rows");
plt.title("Number of Rows per State");
plt.xticks(np.linspace(0, stateData.max(), 10));
Over 50,000 rows are dedicated solely to North Dakota, which I found very surprising. I figured a state like Texas, which shares a huge border with Mexico, would be much higher than North Dakota or Washington.
Again with this plot, the data is in rows, not ports. This means that there are NOT 57,000+ ports of entry in North Dakota, but proporsionally it is the same.
measure = data.groupby("Measure")["Measure"].count()
plot4 = measure.plot.bar()
plt.title("Inbound Traffic by Measure")
plt.ylabel("Rows")
plt.ylim(26500,30500)
plot4;
I separated the data by the measure and graphed it to show the proportions of what exactly is entering the country.
I manipulated the limits of the y-axis to better detail the proportions of the differenced. Unsurprisingly, there are fewer trains entering the country than buses, passenger vehicles, trucks, and even pedestrians.
What this data tells me is that a sizable majority of the traffic entering the United States is people. If you take the number of people entering via buses, trains, cars, or on foot, and compare it to any kind of cargo, it is a much larger number.
That is surprising to me because Canada and Mexico are the US's top two trade partners in the world, so there must be a lot of people coming in to dwarf the cargo coming in.
Given the data from the dataset, you can see that even though El Paso, Texas has the most amount of people coming through their ports into the US, Canadian border states have higher overall traffic. This tells me that it is most likely the case that we do more trade with Canada, and we therefore have more freight coming in than people from up north, but have more people than freight coming in from Mexico.
It is also very clear that the number of people and cargo coming into the United States does not differ that much at all from year to year, even since the mid 1990's. Personally, I found that to be the most interesting of everything.
Lastly, it is very interesting to see certain states so low in terms of volume entering through that state, and other so high. For instance, California I thought would be much higher given it's sizable border with Mexico and it's economy. Meanwhile, what is often thought of as a vastly uninhabited state, North Dakota, is the top state in terms of incoming traffic. I suspect this is also to do mostly with cargo vs. people.