Faisal Malik Widya Prasetya, Developer in 印度尼西亚日惹特区Sleman行政区Sleman街道
Faisal is available for hire
Hire Faisal

Faisal Malik Widya Prasetya

Verified Expert  in Engineering

数据工程师和开发人员

Location
印度尼西亚日惹特区Sleman行政区Sleman街道
至今成员总数
April 25, 2022

Faisal是一名数据工程师,专门研究谷歌和AWS等云数据技术以及端到端数据工程流程. 从设计体系结构和构建基础设施到开发管道操作, 他对新的云计算适应能力很强, open source, 或SaaS技术. Faisal拥有丰富的经验,通过直接构建端到端数据管道或在其专业领域提供咨询服务,为早期创业公司做出贡献.

Portfolio

Burak Karakaya
网页抓取、数据抓取、抓取、亚马逊网络服务(AWS)、JavaScript...
XpressLane, Inc.
数据工程,Python,谷歌Data Studio, PostgreSQL,谷歌BigQuery...
Toptal
Python, SQL, Pandas,数据工程,面向对象设计(OOD)...

Experience

Availability

Part-time

首选的环境

Visual Studio Code (VS Code), Conda, Linux, Docker, Docker Compose, 谷歌云平台, 亚马逊网络服务(AWS), Jira

The most amazing...

...我做过的一个项目是在客户数据仓库上实现成本优化策略, 将BI使用成本降低100倍.

Work Experience

网页抓取专家

2023 - 2023
Burak Karakaya
  • 开发了一个实时网页抓取器,从各种来源抓取数据, such as Twitter, 币安期货排行榜, etc.,向客户的交易机器人提供数据. scraper可以在tweet发布后的200毫秒内获取tweet.
  • 在AWS上提供基础设施,以实现高性能网络,使刮刀能够实时工作. 我设置了IP旋转,这样scraper就不会因为绕过新闻来源的IP速率限制而被阻止.
  • 为非技术用户提供管理和操作刮刀的方便界面. 我使用Streamlit和FastAPI来开发这些接口.
  • 利用Redis和C等高性能Python扩展来提高scraper的存储和运行时性能.
Technologies: 网页抓取、数据抓取、抓取、亚马逊网络服务(AWS)、JavaScript, Python, Streaming Data, Data Integration, Orchestration, GPT, LangChain, 解决方案架构, 技术架构, Monitoring, Data Auditing, Agile, t - sql (transact - sql), 业务体系结构, 企业架构, Interactive Brokers API, Multithreading, 实体关系, Stored Procedure, Software Design, Workflow, Microservices架构, API Design, AWS云架构, Celery, RabbitMQ, 性能调优, Database Design, 亚马逊API网关, Amazon Simple Queue Service (SQS)

Data Engineer

2023 - 2023
XpressLane, Inc.
  • 开发抓取工具,从各个网站抓取数据并推送到BigQuery.
  • 创建开发和操作文档,以便客户可以维护解决方案,并可以在将来开发更多功能.
  • 从抓取的数据向客户交付报告和仪表板,以帮助客户更好地为M做出决策&A use cases.
Technologies: 数据工程,Python,谷歌Data Studio, PostgreSQL,谷歌BigQuery, Dataproc, 谷歌云数据, Looker, Apache Airflow, Redis, Spark, PySpark, Web Scraping, Scraping, Data Wrangling, Data Modeling, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Amazon EKS, Data Manipulation, Shell Scripting, MapReduce, 商业智能(BI), Business Analysis, Benchmarking, Databases, Performance, 性能测试, Caching, Stress Testing, Data Reporting, Pandas, Asyncio, 软件架构, Swagger, DevOps, 人工智能(AI), Python API, Data Scraping, REST, HTML, CSS, OpenAI GPT-3 API, REST APIs, Scalability, Algorithms, Data Structures, 软件开发, Optimization, Cloud, Excel Macros, Database Modeling, 数据驱动的设计, SaaS, NumPy, API Integration, 自然语言处理(NLP), Serverless, SharePoint, 亚马逊ElastiCache, AWS简单通知服务(SNS), Python 3, Git, Lint, OpenAPI, Jupyter, Jupyter Notebook, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Extensions, Scrapy, Data, Apache Spark, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, GPT, LangChain, 解决方案架构, SharePoint Online, 技术架构, Monitoring, Data Auditing, Agile, t - sql (transact - sql), 业务体系结构, 企业架构, Interactive Brokers API, Multithreading, 实体关系, Stored Procedure, Software Design, Workflow, Microservices, Microservices架构, Go, API Design, AWS云架构, MongoDB Atlas, 性能调优, Dynamic SQL, Database Design, 亚马逊API网关, Amazon Simple Queue Service (SQS)

高级数据工程师

2022 - 2023
Toptal
  • 设计并实现了一个强大的数据管道,从多个营销工具和api(如谷歌Ads)中提取数据, Facebook Ads, and Twitter Ads, 并使用基于Luigi的内部数据管道工具将其转移到BigQuery.
  • 创建数据管道解决方案,有效地从各种学习平台(如Polly)提取数据, Udemy, 和Lessonly,并利用Composer与BigQuery合并, 由GCP提供的托管Apache气流服务.
  • 参与数据工程团队拆分头脑风暴会议,提出将团队拆分为数据平台团队和分析工程团队的想法. 分析工程团队专注于ETL逻辑, 而数据平台团队维护基础设施.
技术:Python, SQL, Pandas,数据工程,面向对象设计(OOD), 面向对象编程(OOP), Data Modeling, Scala, Luigi, Apache Airflow, BigQuery, 分布式计算, 维度建模, ETL, Google Cloud, 谷歌云存储, ETL Tools, 脚本语言, Data Analytics, Data Architecture, Data Management, Data Pipelines, ELT, 大数据架构, Snowpark, Architecture, Big Data, Kanban, Project Planning, 敏捷项目管理, 技术项目管理, Azure Data Lake, Data Wrangling, APIs, Dashboards, Data Manipulation, Shell Scripting, MapReduce, Google Analytics, Web Scraping, Benchmarking, Databases, Performance, 性能测试, Caching, Stress Testing, Asyncio, 软件架构, Back-end, GraphQL, DevOps, 人工智能(AI), Python API, Scraping, Data Scraping, REST, REST APIs, Scalability, Algorithms, Data Structures, 软件开发, Optimization, Cloud, Database Modeling, 数据驱动的设计, SaaS, NumPy, API Integration, Serverless, 亚马逊ElastiCache, AWS简单通知服务(SNS), Python 3, Git, Lint, Hadoop, Jupyter, Jupyter Notebook, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Amazon API, Scrapy, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, 解决方案架构, 技术架构, Monitoring, Data Auditing, Agile, t - sql (transact - sql), 业务体系结构, 企业架构, Multithreading, 实体关系, Stored Procedure, Software Design, Workflow, Microservices, Microservices架构, Go, API Design, AWS云架构, MongoDB Atlas, 性能调优, Database Design, Amazon Simple Queue Service (SQS)

Data Engineer

2021 - 2023
QuantumBlack
  • 开发了内部数据分析工具,可以简化客户端站点上的部署. 我构建的功能是从各种来源摄取数据,并将它们增量地存储在Snowflake上.
  • 处理客户端请求,构建数据分析管道和api.
  • 与客户的分析团队和领导层密切合作,收集分析需求,并从架构设计中仔细规划, 执行和交付.
技术:Python, Kedro, Apache Airflow, 亚马逊网络服务(AWS), 谷歌云平台, Alibaba Cloud, Spark, PySpark, GitHub, Terraform, ETL Tools, 脚本语言, SQL, Data Analytics, Amazon Athena, Redshift Spectrum, AWS Glue, Data Engineering, Microsoft Power BI, Amazon Neptune, Microsoft SQL Server, Oracle Database, 数据库管理(DBA), Redshift, NoSQL, Data Architecture, Data Management, Data Lakes, Azure, 数据库迁移, Amazon RDS, CDC, Amazon Aurora, 数据构建工具(dbt), Snowflake, Data Pipelines, Neo4j, Apache Kafka, ETL, Cloud Migration, IIS SQL Server, Domo, ELT, 大数据架构, Snowpark, Oracle, Architecture, Big Data, Azure数据工厂, Kanban, Project Planning, 敏捷项目管理, 技术项目管理, Azure Data Lake, Data Wrangling, Azure Databricks, Data Modeling, APIs, Databricks, Django, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Amazon EKS, Data Manipulation, Spark ML, Amazon QuickSight, Elasticsearch, AWS步骤函数, Shell Scripting, MapReduce, 商业智能(BI), Business Analysis, Web Scraping, Benchmarking, Databases, Performance, 性能测试, Caching, Data Reporting, Pandas, Asyncio, 软件架构, Back-end, GraphQL, Amazon Cognito, Swagger, DevOps, 人工智能(AI), Python API, Scraping, Data Scraping, PDF Scraping, REST, AWS Lambda, Flask, OpenCV, Tesseract, QGIS, GIS, GRASS GIS, Flutter, OpenAI GPT-3 API, REST APIs, AWS Elastic Beanstalk, Scalability, Algorithms, Data Structures, 软件开发, Optimization, Cloud, eCommerce, Amazon DynamoDB, Database Modeling, 数据驱动的设计, Neural Networks, SaaS, NumPy, GeoPandas, Shapely, Scikit-learn, API Integration, Twitter API, Node.js, 自然语言处理(NLP), Serverless, SharePoint, 亚马逊ElastiCache, AWS简单通知服务(SNS), Python 3, Git, Lint, Hadoop, OpenAPI, Jupyter, Jupyter Notebook, Credit Modeling, 包装消费品, Azure Synapse, 后端开发, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Amazon API, Extensions, Scrapy, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, 解决方案架构, SharePoint Online, 技术架构, Monitoring, Data Auditing, Agile, Azure SQL数据仓库(SQL DW), t - sql (transact - sql), 业务体系结构, 企业架构, Interactive Brokers API, Multithreading, 实体关系, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices架构, Go, API Design, R, AWS云架构, MongoDB Atlas, Celery, RabbitMQ, 性能调优, Dynamic SQL, Database Design, 亚马逊API网关, Amazon Simple Queue Service (SQS)

高级数据工程师

2021 - 2021
Flip
  • 使用本地谷歌云平台技术构建数据分析生态系统, such as Datastream, 谷歌云存储, Pub/Sub, Dataflow, and BigQuery.
  • 将分析等待时间从最坏情况下的3小时缩短到一个大报告的30秒.
  • 维护MySQL和服务器上的cron作业上的数据分析遗留技术,在一个繁重但经常使用的查询上创建计划作业. 繁重的查询可以在不到30分钟的时间内访问,并且具有每日数据的新鲜度.
  • 在遗留的基础上构建数据工程团队和团队成员, current, 以及未来的实施.
技术:Python, 谷歌云平台, MySQL, BigQuery, Google BigQuery, Metabase, Data Warehousing, CI/CD Pipelines, GitHub, Data Migration, ETL Tools, 脚本语言, SQL, Data Analytics, AWS Glue, Data Engineering, Data Analysis, NoSQL, Data Architecture, Data Management, Data Lakes, 数据库迁移, Amazon RDS, CDC, Amazon Aurora, 数据构建工具(dbt), Data Pipelines, Apache Kafka, ETL, Cloud Migration, ELT, 大数据架构, Architecture, Big Data, Kanban, 敏捷项目管理, 技术项目管理, Microsoft Power BI, Data Wrangling, Data Modeling, APIs, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Data Manipulation, Amazon QuickSight, AWS步骤函数, Shell Scripting, Google Analytics, MySQL性能调优, Benchmarking, Databases, Performance, 性能测试, Data Reporting, Pandas, Asyncio, 软件架构, Back-end, GraphQL, Swagger, Python API, PDF Scraping, REST, AWS Lambda, Flask, HTML, REST APIs, Scalability, Algorithms, Data Structures, 软件开发, Optimization, Cloud, Database Modeling, SaaS, NumPy, API Integration, Serverless, Python 3, Git, Lint, OpenAPI, Jupyter, Jupyter Notebook, 后端开发, Design Patterns, Elasticsearch, Kubernetes, Pytest, Amazon API, Extensions, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, 解决方案架构, 技术架构, Monitoring, Data Auditing, Agile, Azure SQL数据仓库(SQL DW), t - sql (transact - sql), 业务体系结构, 企业架构, Multithreading, 实体关系, Stored Procedure, Software Design, Workflow, Microservices, Microservices架构, AWS云架构, Celery, RabbitMQ, 性能调优, Database Design, 亚马逊API网关, Amazon Simple Queue Service (SQS)

Data Engineer

2020 - 2021
Pintu
  • 在Amazon EC2上开发ELT数据管道. 它由AWS Lambda打开和关闭, 通过使用CloudWatch调度程序从各种数据源(MySQL, PostgreSQL, MongoDB, Google Sheets, 加密交换api)到BigQuery数据仓库.
  • 实现分区, clustering, 将BigQuery上的视图具体化,并将分析成本降低了100倍.
  • 与财务专家合作制定最佳的做市策略. 在已发表的论文中对模型进行了实现和改进, 将自有资产的流动性和市场活跃度提高67%.
  • 开发了一个欺诈检测系统,在系统安全漏洞的情况下提醒欺诈活动. 此警报通知执行团队,并在四小时内捕获欺诈者. 它获得了价值200万美元的资产.
  • 培训业务用户使用Metabase和谷歌Data Studio开发自己的BI报告. 这导致70%的Metabase报告是由业务团队创建的, 而另外30%则需要复杂的查询.
  • 领导数据分析团队,并通过运行冲刺计划实现敏捷文化, standup, sprint回顾会议. 它允许跟踪业务用户请求、数据管道问题和改进.
技术:Python, 谷歌云平台, 亚马逊网络服务(AWS), Amazon EC2, AWS Lambda, BigQuery, Google BigQuery, Amazon S3 (AWS S3), Metabase, Redash, 谷歌数据工作室, 商业智能(BI), 数据可视化, Data Warehousing, Amazon CloudWatch, PostgreSQL, MongoDB, GitHub, ETL Tools, 脚本语言, SQL, Data Migration, Data Analytics, Data Engineering, Tableau, NoSQL, Data Architecture, Data Management, Data Lakes, Amazon RDS, Amazon Aurora, Data Pipelines, Neo4j, Apache Kafka, ETL, Cloud Migration, Looker, Architecture, Big Data, Kanban, 敏捷项目管理, 技术项目管理, Snowflake, Data Wrangling, APIs, Excel 365, Dashboards, Data Manipulation, Data Science, Amazon QuickSight, AWS步骤函数, Shell Scripting, MapReduce, Google Analytics, JavaScript, MySQL性能调优, Benchmarking, Databases, Performance, Data Reporting, Pandas, Amazon Cognito, PDF Scraping, REST, Flask, HTML, CSS, REST APIs, AWS Elastic Beanstalk, Scalability, Algorithms, Data Structures, 软件开发, Optimization, Cloud, Excel Macros, Amazon DynamoDB, Database Modeling, 自动交易软件, Neural Networks, SaaS, NumPy, Scikit-learn, API Integration, Twitter API, 自然语言处理(NLP), Firebase, Serverless, SharePoint, Python 3, Git, Hadoop, SciPy, Jupyter, Jupyter Notebook, TensorFlow, 后端开发, Design Patterns, Elasticsearch, Kubernetes, Pytest, Amazon API, Extensions, Data, Apache Spark, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, 解决方案架构, 技术架构, Monitoring, Data Auditing, Agile, Azure SQL数据仓库(SQL DW), t - sql (transact - sql), 业务体系结构, 企业架构, Multithreading, 实体关系, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices架构, API Design, AWS云架构, MongoDB Atlas, 性能调优, Dynamic SQL, Database Design, 亚马逊API网关, Amazon Simple Queue Service (SQS)

Data Engineer

2019 - 2020
Kulina
  • 从应用程序数据库开发ELT流程, 第三方营销工具, 和谷歌表到BigQuery使用Stitch数据, 哪种方法减少了生产数据库上的查询冲突数量, 间接提高应用程序性能.
  • 在数据仓库上开发了雪花模式, 增加业务团队之间的数据可见性.
  • Deployed, maintained, 并管理了几个BI工具, such as Redash, Data Studio, and Metabase, 获得业务单位级别的数据治理,并使用适当的工具回答与数据相关的问题.
技术:Python, 谷歌云平台, 商业智能(BI), Data Warehousing, Cryptography, 数据可视化, BigQuery, Google BigQuery, Stitch Data, ETL Tools, 脚本语言, SQL, Data Analytics, Data Engineering, Data Analysis, Tableau, Data Architecture, Data Management, Amazon RDS, 数据驱动的仪表盘, Data Pipelines, ETL, Looker, Snowflake, Data Wrangling, Dashboards, Data Manipulation, Data Science, Amazon QuickSight, Shell Scripting, JavaScript, MySQL性能调优, Benchmarking, Databases, Performance, Data Reporting, Pandas, PDF Scraping, REST, HTML, CSS, REST APIs, Algorithms, Data Structures, 软件开发, Optimization, Cloud, eCommerce, Excel Macros, Database Modeling, Neural Networks, SaaS, NumPy, Scikit-learn, API Integration, 自然语言处理(NLP), Firebase, Serverless, Python 3, Git, Hadoop, SciPy, Jupyter, Jupyter Notebook, TensorFlow, Node.js, Amazon API, Data, Apache Spark, Data Integration, Orchestration, Monitoring, Data Auditing, Agile, Azure SQL数据仓库(SQL DW), t - sql (transact - sql), Multithreading, 实体关系, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices架构, R, AWS云架构, 性能调优, Database Design

NASA API Python Wrapper

http://pypi.org/project/python-nasa/
基于官方NASA API文档的NASA API的非官方Python包装器, http://api.nasa.gov/. 这个项目是一个开源项目,我做这个项目是为了改进我的投资组合,增强我开发API包装器的知识.

可扩展的Web Scraper

我们在GCP上开发并部署了一个可扩展的web scraper. 我们使用气流和Redis Broker下的CeleryExecutor作为工作流协调器. 我设置了这些基础设施,以便可以同时完成抓取过程.

然后,对于转换,我们使用部署在Dataproc上的PySpark. 我们展示无服务器Spark Dataproc以使我们的转换管道具有成本效益. 我们使用GCS作为数据湖, 所以从网站上获取的所有数据都将驻留在GCS和转换输出中. 然后使用BigQuery加载作业将干净的数据存储在BigQuery中, 也编排在气流上. 当数据到达BigQuery时, 涉众仪表板将使用最近的数据自动更新. 我们还设置了一个旋转代理,以避免被发现是机器人.

GCP上的数据管道

使用气流和内部框架开发从第三方api到BigQuery的数据管道. 我对系统实现了增量加载,只检索新数据, 避免不必要的满载.

Languages

Python, SQL, Snowflake, JavaScript, HTML, Python 3, t - sql (transact - sql), Stored Procedure, GraphQL, CSS, PHP, Go, R, Scala

Frameworks

Django, Swagger, Flask, Hadoop, Scrapy, Apache Spark, Spark, Flutter, CodeIgniter

Libraries/APIs

Pandas, Asyncio, Python API, REST API, NumPy, Shapely, Scikit-learn, Node.js, OpenAPI, Amazon API, PySpark, Spark ML, OpenCV, Twitter API, SciPy, TensorFlow, Interactive Brokers API, Luigi

Tools

BigQuery, Apache Airflow, GitHub, AWS Glue, Microsoft Power BI, Tableau, Amazon Elastic MapReduce (EMR), Amazon QuickSight, AWS步骤函数, MySQL性能调优, 亚马逊ElastiCache, AWS简单通知服务(SNS), Git, Jupyter, Pytest, Kibana, Cloud Dataflow, Apache Beam, Celery, RabbitMQ, Amazon Simple Queue Service (SQS), Docker Compose, Redash, Amazon CloudWatch, Terraform, Amazon Athena, Redshift Spectrum, Looker, Amazon EKS, Google Analytics, Amazon Cognito, GIS, GRASS GIS, PhpStorm, Navicat, MongoDB Atlas, Stitch Data, Jira, Domo, 谷歌云数据

Paradigms

商业智能(BI), ETL, MapReduce, Stress Testing, REST, 数据驱动的设计, Design Patterns, Microservices, Microservices架构, Database Design, Kanban, 敏捷项目管理, Data Science, DevOps, Agile, 面向对象设计(OOD), 面向对象编程(OOP), 分布式计算, 维度建模

Platforms

Visual Studio Code (VS Code), Linux, 谷歌云平台, 亚马逊网络服务(AWS), AWS Lambda, AWS Elastic Beanstalk, SharePoint, Jupyter Notebook, Docker, Amazon EC2, Oracle Database, Azure, Apache Kafka, Oracle, Databricks, Firebase, Kubernetes

Storage

MySQL, PostgreSQL, Microsoft SQL Server, NoSQL, Data Lakes, 数据库迁移, Amazon Aurora, Data Pipelines, Elasticsearch, Databases, Amazon DynamoDB, Database Modeling, Data Integration, PL/SQL, Amazon S3 (AWS S3), MongoDB, 数据库管理(DBA), Redshift, Neo4j, Dynamic SQL, Alibaba Cloud, Google Cloud, 谷歌云存储, IIS SQL Server, Redis

Other

Conda, Machine Learning, Google BigQuery, Data Engineering, Data Modeling, Data Migration, ETL Tools, Data Analytics, Data Analysis, Data Architecture, Data Management, Amazon RDS, CDC, 数据构建工具(dbt), Cloud Migration, ELT, 大数据架构, Architecture, Big Data, Project Planning, Web Scraping, Scraping, Data Wrangling, APIs, Excel 365, Dashboards, Data Manipulation, Shell Scripting, Benchmarking, Performance, 性能测试, Caching, Data Reporting, 软件架构, Back-end, 人工智能(AI), Data Scraping, PDF Scraping, Scalability, Algorithms, Data Structures, 软件开发, Optimization, Cloud, eCommerce, Excel Macros, 自动交易软件, SaaS, GeoPandas, API Integration, 自然语言处理(NLP), Serverless, Lint, 包装消费品, 后端开发, FastAPI, Extensions, Data, Streaming Data, Data Governance, Orchestration, 解决方案架构, 技术架构, Monitoring, Multithreading, 实体关系, Software Design, Workflow, API Design, AWS云架构, 性能调优, 亚马逊API网关, Cryptography, Research, Data Warehousing, 数据可视化, Metabase, 谷歌数据工作室, CI/CD Pipelines, GitHub Actions, 脚本语言, 数据驱动的仪表盘, Azure数据工厂, 技术项目管理, Azure Data Lake, Azure Databricks, Business Analysis, Tesseract, QGIS, OpenAI GPT-3 API, Neural Networks, Azure Synapse, eCommerce APIs, GPT, LangChain, SharePoint Online, Data Auditing, Azure SQL数据仓库(SQL DW), 业务体系结构, 企业架构, Mathematics, Kedro, Amazon Neptune, Snowpark, Dataproc, Credit Modeling

2015 - 2019

计算机科学学士学位

Gadjah Mada大学-日惹,印度尼西亚

2022年2月至今

基础设施自动化与Terraform云

Udemy

2022年1月至今

谷歌云专业数据工程师

Udemy