Transforming Data into Insights

One Visualization at a Time

I specialize in uncovering patterns, trends, and insights from complex datasets.

A Data Scientist with a passion for predictive modeling and data visualization.

Experienced in Python, R, SAS, AWS, SQL, Apache Spark, Tableau, Power BI

@Mirela Giantaru

Unstructured to structured Data

Problem Statement: Developing a Python solution to structure unstructured log data, resolving ambiguities in sign-in/out sequences across multiple doors. Key Techniques & Tools: Python, Pandas, Regex, NLP, Data Parsing & Cleaning + Logical sequencing to detect anomalies in multi-entry logs * Industry Relevance: Essential for security logs, employee tracking, and fraud detection, ensuring accurate activity logs despite inconsistencies. Results: Reduced manual reporting time by 40%

Text Cleaning Pipeline

Problem Statement: Explored advanced techniques to clean, process, and analyze unstructured text data, applying machine learning for text classification.* Key Techniques & Tools: Python, NLTK, Scikit-learn, Vectorization (TF-IDF, Word Embeddings) * Industry Relevance: Critical for applications in chatbots, sentiment analysis, fraud detection, and any field leveraging text data for insights.

Data Exploration with SQL

Data exploration of Covid 19 Dataset; Google BigQuery integration with Databricks to enable seamless querying, processing, and analysis of large datasets. Applications: ETL workflows across BigQuery & Databricks * Analyzing large datasets using Spark & ML tools * Querying BigQuery data using SQL in Databricks * Optimizing cloud-based data pipelines Tools & Technologies Used: Google BigQuery - Cloud data warehouse for scalable storage & queries; Databricks- Unified analytics platform for processing & analysis ; PySpark & Spark SQL - Querying and transforming BigQuery data; Pandas - Handling data in Python for smaller-scale analytics ; Google Cloud Storage (GCS)- Storing authentication keys ; Databricks Notebooks - Running Python & SQL-based workflows ; Service Account Authentication - Secure connection to BigQuery Key Features & Usage: Secure Connection - Uses a GCP service account (service_key.json) for authentication ; Querying BigQuery in Databricks - Using both Pandas & Spark to fetch and analyze data. ;Transforming Data in Databricks – Spark SQL and PySpark ; Writing Data Back to BigQuery – Storing results from Databricks analysis back into BigQuery tables. Can also be applied to: Real-time analytics, streaming data from BigQuery into Databricks or Machine Learning, by using Databricks MLflow on BigQuery datasets, ETL Pipelines for extracting, transforming, and loading data from BigQuery, BI Dashboards for Power BI or Tableau integration with Databricks & BigQuery

Menu