Job Description:
Seeking a highly skilled Spark & Scala Engineer to join our team. The ideal candidate will be passionate about developing high-performance data processing engines and applying advanced algorithms to solve complex problems. Expertise in Spark, Scala, object-oriented programming (OOP), and advanced algorithms such as search trees, clustering algorithms (K-means), and graph algorithms is essential. This role focuses on designing scalable systems and applying state-of-the-art techniques for data-driven problem-solving.
Key Responsibilities:
- Engine Development:
- Design and implement efficient, scalable engines for data processing using Apache Spark and Scala.
- Build robust and reusable systems based on advanced OOP principles (encapsulation, inheritance, polymorphism, and abstraction).
- Algorithm Design & Implementation:
- Implement complex search trees (e.g., binary search trees, AVL trees, red-black trees) for optimized data querying and management.
- Develop clustering algorithms such as K-means, DBSCAN, and hierarchical clustering to group and analyze large datasets.
- Apply graph algorithms (e.g., Dijkstra's, Bellman-Ford, PageRank) for network and relationship-based data analysis.
- Leverage sorting and searching algorithms (e.g., quicksort, mergesort, binary search) to optimize data workflows.
- Implement dynamic programming algorithms for solving optimization problems like knapsack, shortest paths, and matrix chain multiplication.
- Big Data Processing:
- Process and analyze massive datasets using Spark SQL, DataFrames, and RDDs.
- Optimize Spark jobs for real-time and batch processing to improve performance and scalability.
- Machine Learning & Data Analytics:
- Use clustering techniques like K-means and GMM (Gaussian Mixture Models) for unsupervised learning.
- Design algorithms to detect patterns, trends, and anomalies in large-scale datasets.
- Optimization & Scalability:
- Build scalable and modular solutions for processing structured, semi-structured, and unstructured data.
- Optimize system performance and ensure high availability in distributed computing environments.
- Collaboration:
- Work with cross-functional teams, including data scientists and engineers, to design end-to-end solutions.
- Contribute to architectural decisions and define best practices for algorithm implementation and system design.
Required Skills:
- Programming: Strong proficiency in Scala, with expertise in both functional and object-oriented programming.
- Big Data: Hands-on experience with Apache Spark, including Spark SQL, DataFrames, RDDs, and distributed data processing.
- Algorithms:
- Search Trees: Binary search trees, AVL trees, red-black trees.
- Clustering: K-means, DBSCAN, hierarchical clustering.
- Graph Algorithms: Dijkstra's, Bellman-Ford, Kruskal's, Prim's, PageRank.
- Optimization: Dynamic programming, greedy algorithms, branch-and-bound.
- Sorting & Searching: QuickSort, MergeSort, binary search.
- Data Structures: Proficiency in advanced data structures like heaps, hash tables, and trees.
- Problem Solving: Strong ability to write efficient, optimized, and clean code to solve computational challenges