Principal Data Engineer

Date Posted: Posted30+ hari yang lalu

Salary:

Indonesia

Responsibilities:

Work independently with minimal supervision and collaborate with team leaders and architects.
Design and develop robust end-to-end data solutions for structured and unstructured data including, but not limited to, ingestion, parsing, integration, auditing, logging, aggregation, normalization, modeling, and error handling.
Interact directly with end users to gather requirements and consult on data integration solutions. Regularly have their best interests in mind and proactively recommend value added items – even if not requested.
Maintain, support, and enhance most elements of Tokopedia’s Information Management Systems with minimal assistance from peers.
Collaborate with a cross functional team to resolve data quality and operational issues.
Identify opportunities for team standardization in coding, deployments, documentation and other related areas and create said standards.
Provide mentoring / coaching to more junior developers
Educate the team in emerging related technologies and identify value add opportunities for their implementation
Migrate code across environments and leverage a source code management system.
Create jobs to perform auditing and error handling.
Explore new technological solutions, learn and trial them, and deploy them in our Info Mgmt environments to solve business and operational challenges.
Receive and adhere to project delivery deadlines.
Project planning and estimating effort for the different phases.
Create and maintain guidelines for various SDLC stages.

Proficient in traditional RDBMS with an emphasis on Postgres, and MySQL.
Deep understanding of ETL/ELT frameworks, error handling techniques, data quality techniques and their overall operation
Proficient in performing data transformations via scripting, stored procedures, or an ETL framework.
Deep understanding of the 4Vs (volume, variety, veracity, velocity) of data and development strategies for accommodating them in integration
Proficient in building robust and scalablearchitecture for Data Pipelines and Data Platform products
Proficiency in developing and supporting all aspects of a big data cluster: Ingestion, Processing, Parsing , integration (Python, Spark, Scala), data movement, workflow management (OOZIE and ActiveBatch, Airflow), and querying (SQL).
Some proficiency in the following programming languages: Java, Python, Go and Scala
Proficient in writing Apache Spark including an understanding of optimization techniques.
Proficient in Unix and Linux operating systems
Capable of navigating and working effectively in a DevOps model including leveraging related technologies: Jenkins, GitLab, etc…
Some proficiency in administering a big data cluster.
Some experience processing/parsing files using a scripting language.
Very good experience with SQL.

A Bachelor’s Degree in Computer Science or related field required.
5+ years of Data Integration experience.
5+ years of hands on experience with one of the following technologies: Hadoop, SQL Server, Redshift, PostgreSQL
3+ years experience with Cloud Environment like AWS, GCP
3+ years of experience in planning, architecting, designing, developing and delivering software related to platforms at scale
Having experience in GCP Product like Bigquery, Dataflow, PubSub, Bigtable is a big plus
3+ years of experience in developing REST APIs