CV
Last updated: March 4, 2024
To download a PDF copy of my CV, click here.
Education
- B.S. in Applied Mathematics, University of California Los Angeles (UCLA), June 2024 (expected)
Statistics Minor, Specialization in Computing - Cumulative GPA: 4.00
- Relevant Coursework: Linear Algebra, Optimization, Machine Learning, Algorithms, Mathematical Modeling, Differential Equations, Numerical Analysis, Discrete Structures, Real Analysis, Complex Analysis
Work experience
- Data Science Intern, Snowflake (June-September 2023)
- Developed adaptive stratified survey sampling framework with multivariate testing, scheduled via Airflow, to iteratively calibrate and refine user experience models with human feedback
- Formulated mathematical framework for computing + updating account reputation scores via a combination of heuristics and behavioral anomaly detection
- Preprocessed and analyzed third-party data classifications, investigated use of crowdsourcing label aggregation algorithms to improve internal data labeling system
- Pic 16B Reader: Python with Applications II, UCLA (January-March 2024)
Organizational experience
- Data Science Union (DSU)
- Built and trained decoder transformer models in parallel from scratch on TinyStories, investigated scaling laws of dataset and model size with validation loss and created story generation demo
- Designed a transformer attention model leveraging patent long-form text to classify new patents into USPC categories and produce technology forecasts, achieving a top-5 accuracy of 81.6%
- DataRes
- Led a PageRank-centrality analysis graph project via Neo4j Graph Data Science and Cypher on 1,000,000 Spotify playlists
- Augmented a message-passing graph convolutional network with custom-defined socioeconomic indicators to improve traffic accident predictions
- Association of Computing Machinery (ACM) AI
- Developed a CNN in PyTorch to classify plant diseases
- Built a bidirectional LSTM with GloVe embeddings to identify insincere questions on Quora
Projects and competitions
- 2022 DataFest Finalist + 2023
- Cleaned proprietary data, derived and presented insights in teams of five from challenging long-form datasets (100+ columns, 2+ million rows) in 40 hours
- Substantial use of seaborn and plotly visualizations, statistical tests, time series, survival analysis
- LLMs for Question Answering
- Fine-tuned encoder-decoder model T5 with LoRA for extractive question answering on reading comprehension dataset SQuAD v1.1
- Art Generation with GANs
- Implemented and compared DCGANs and Creative Adversarial Network (CAN)s to generate paintings, performed hyperparameter tuning and metric evaluations, developed interactive Streamlit demo
- Rocket League E-sports Statistical Analysis
- Extracted 37,000+ series from public API, performed context-informed data wrangling and cleaning in pandas to obtain clear stat sheets for each player in every match
- Generated exploratory visualizations with seaborn, identifying interesting correlations to investigate
- Performed modeling/clustering to further analyze player behavior and uncover playstyle/team strategy insights
- UCLA Hack on the Hill 9 (2022) [Education Category Winner]
- Designed skeleton framework for a novel UCLA automatic degree planner with a team in 12 hours
- Wrote web-scraping algorithms incorporating regular expression matching to extract nested major requirements and prerequisite class data from various department and course catalogs
Skills
- Programming Languages: Python (pandas, NumPy, PyTorch, TensorFlow, Huggingface, scikit-learn), R, SQL, C++, HTML/CSS
- Technical Skills: Git, Airflow, Snowflake, Streamlit, Tableau, Docker, NLP, A/B testing
- Languages: English (Native fluency), Chinese (Native fluency)