Stellamaris WN

A Research Scientist with 6 years of experience and exceptional abilities in computational analytics, geospatial data science, big data & cloud engineering. Vastly exposed to various data ecosystems through projects i.e.; Energy, Agricultural, and Policy data. A Master's in Geography from West Virginia University, and onto a MSc. Computational & Applied Mathematics. Previously a Program Director, ESM YouthMappers.
AWS Certified Cloud Practitioner
View My LinkedIn Profile
Find My Publications Here

PROJECT PORTIFOLIO- ENGINEERING, GIS & APPLIED SCIENCE (ML)


DATA PROCESSING

Web Scraping

In this project,I exploited the web crawling frameworks of scrapy, and requests packages to create custom pipelines to extract Airquality geospatial data from the EPA AirNow AirQuality monitoring website. The site host a vast amount of data, but the pipeline utilises the simplest workflow to collect and store data including FeatureClasses for data science investigation taks. All code and processing were executed with Vscode IDE.

View request approach on Github View scrapy approach on Github


BIG DATA SYSTEMS & ARCHITECTURE

COVID-19 Data Engineering System: How efficient were lock-down policies towards mitigating deaths?

The COVID-19 pandemic remains the pandemic of the century claiming over 7 million lives and leaving mostly many countries in devastating economic recessions. In this project, I designed and implemented a comprehensive data engineering system to analyze post COVID-19 incident records and evaluate the efficiency of lockdown policies to mitigate deaths during through the period. Designed using GCP cloud resources, the system combined real-time and batch data processing to create advanced analytics and visualizations for actionable insights. Results indicated that only China total lockdown policy was effective in cubbing death rates. Tools used: Apache Kafka,Hadoop HDFS, Apache Spark, Cloud Storage and BigQuery, Looker Studio, and Vertex AI for autoML(boosted_trees).

US vs China

Code and processing were executed with Vscode IDE, GCP UI & CLI.

View code and other results on Github


DATA MINING

The Alternative Fuels Data Center (AFDC) collects and stores data on alternative fueling stations across the U.S., providing vital information for consumers and businesses alike. In this project, I dive into this rich dataset, using forecasting algorithms to uncover meaningful insights that can inform consumer decisions, including choosing the right car fuel model or planning driving routes. Additionally, these insights can be valuable for business strategies, such as optimizing station placement and infrastructure investments.

a) Types of EV Charging Stations & Their Distribution:

The majority of electric vehicle (ELEC) stations utilize Type J1772 connectors, with Level 2 charging available at over 2,740 stations. Figure 3 shows the distribution of these charging stations across the U.S., with Los Angeles leading in the number of Level 2 chargers. This data reveals important regional patterns in EV infrastructure development.

b) Predicting the Future Growth of ELEC Stations:

Using the Prophet Model, I analyzed and forecasted the future trend of ELEC stations. Since 2010, there has been a significant rise in station openings due to the growing popularity of electric vehicles. Figure 4 illustrates this historical surge, while Figure 7 suggests that the rate of new station openings may slow in the coming years.

View code on Github


–>