Description

Job Description:

Seeking a highly skilled Spark & Scala Engineer to join our team. The ideal candidate will be passionate about developing high-performance data processing engines and applying advanced algorithms to solve complex problems. Expertise in Spark, Scala, object-oriented programming (OOP), and advanced algorithms such as search trees, clustering algorithms (K-means), and graph algorithms is essential. This role focuses on designing scalable systems and applying state-of-the-art techniques for data-driven problem-solving.

 

Key Responsibilities:

  1. Engine Development:
    • Design and implement efficient, scalable engines for data processing using Apache Spark and Scala.
    • Build robust and reusable systems based on advanced OOP principles (encapsulation, inheritance, polymorphism, and abstraction).
  2. Algorithm Design & Implementation:
    • Implement complex search trees (e.g., binary search trees, AVL trees, red-black trees) for optimized data querying and management.
    • Develop clustering algorithms such as K-means, DBSCAN, and hierarchical clustering to group and analyze large datasets.
    • Apply graph algorithms (e.g., Dijkstra's, Bellman-Ford, PageRank) for network and relationship-based data analysis.
    • Leverage sorting and searching algorithms (e.g., quicksort, mergesort, binary search) to optimize data workflows.
    • Implement dynamic programming algorithms for solving optimization problems like knapsack, shortest paths, and matrix chain multiplication.
  3. Big Data Processing:
    • Process and analyze massive datasets using Spark SQL, DataFrames, and RDDs.
    • Optimize Spark jobs for real-time and batch processing to improve performance and scalability.
  4. Machine Learning & Data Analytics:
    • Use clustering techniques like K-means and GMM (Gaussian Mixture Models) for unsupervised learning.
    • Design algorithms to detect patterns, trends, and anomalies in large-scale datasets.
  5. Optimization & Scalability:
    • Build scalable and modular solutions for processing structured, semi-structured, and unstructured data.
    • Optimize system performance and ensure high availability in distributed computing environments.
  6. Collaboration:
    • Work with cross-functional teams, including data scientists and engineers, to design end-to-end solutions.
    • Contribute to architectural decisions and define best practices for algorithm implementation and system design.

 

Required Skills:

  • Programming: Strong proficiency in Scala, with expertise in both functional and object-oriented programming.
  • Big Data: Hands-on experience with Apache Spark, including Spark SQL, DataFrames, RDDs, and distributed data processing.
  • Algorithms:
    • Search Trees: Binary search trees, AVL trees, red-black trees.
    • Clustering: K-means, DBSCAN, hierarchical clustering.
    • Graph Algorithms: Dijkstra's, Bellman-Ford, Kruskal's, Prim's, PageRank.
    • Optimization: Dynamic programming, greedy algorithms, branch-and-bound.
    • Sorting & Searching: QuickSort, MergeSort, binary search.
  • Data Structures: Proficiency in advanced data structures like heaps, hash tables, and trees.
  • Problem Solving: Strong ability to write efficient, optimized, and clean code to solve computational challenges

Key Skills
Education

Any Graduate