People, Person, Computer, Electronics, LCD Screen, Laptop, Pc

Senior Distributed Systems EngineerUnited States

MapR

Senior Distributed Systems Engineer

United States

MapR Technologies, Inc., is a visionary Silicon Valley enterprise software company that pioneered one platform for all data in every cloud. This approach is the industry's first modern data system. At the core is the MapR Converged Data Platform that enables simultaneous analytics and applications as data happens with speed, scale and reliability. Forward leaning companies such as SAP, Cisco, United Healthcare and many more are able to create new, intelligent and modern apps to outperform the competition. Limitless possibilities happen with MapR Data Technologies. Learn more: www.mapr.com.
MapR Technologies is an equal opportunity employer.
Senior Distributed Systems Engineer
In this exciting position, you will provide the highest level of technical expertise and advice to MapR's rapidly growing customer base. You will provide assistance to these enterprises throughout the Pre- and Post-Sales process. On average, your role will consist of providing the highest level of technical expertise to MapR's customer base, resolving and replicating issues in-house to work on code fixes with Engineering and release patches to resolve complex issues as soon as possible.
If you consider yourself a technical dynamo, and you are looking for a challenge in a trailblazing company we want to hear from you!
Responsibilities and Daily Duties:
  • Acting as a distributed system engineer in handling and evaluating high priority issues on MapR filesystem, security, no SQL database and MapR messaging (30%):
    • Installing and configuring MapR filesystem for customers, including sizing the nodes and making sure there is enough storage and memory before the installation and configuration of the file system based on the customer use case and data retention policies.
    • Advising how to setup volumes, using NFS for access and data ingestion as well as storage of small and big files on the MapR Filesystem.
    • Troubleshooting issues with MapR filesystem that stems from data corruption in the file system, node or disk failures which result in resync of data blocks. Looking at specific logs in a clustered environment and determine the reasons for hardware or software failures and quickly come with solution to avoid future issues.
    • Using scripting and pattern matching tools to look at specific logs on data nodes to see file system failures and setting up local environment to reproduce customer issues and set debug to find root cause.
    • Working on table structure and partitions by looking at table columns for NoSQL databases, understanding data types, and running sample queries to see customer reported errors. Evaluating table regions and knowing how to recover from region failures and restore file consistencies.
    • Setting up either Kerberos or MapR security on customer environment for authentication and authorization. Ensuring security tickets are issued and configured properly.
  • Evaluate customers distributed environment and suggest methods to upgrade or implement a revised solution (20%):
    • Interacting with various customer teams to be aware of customer use case, systems, tools and monitoring applications that are used for managing the cluster. Evaluating customer process for data needs and evaluating their tools and management of cluster.
    • Evaluating all third-party applications used by the customer and how they are integrated with MapR software. Working with vendors and Engineers to certify newly released software and ensure that features are backward compatible.
    • Ensuring and validating customer test plans and test cases before upgrading and ensuring new release is tested in staged environment.
    • Developing a plan to rollout revised software for upgrades, making Engineering teams aware of any possible issues, and being available to provide support during the actual upgrade process. Ensuring that all data is backed up and there is a disaster recovery in place if upgrade fails.
    • Developing scripts in python or bash to monitor customer environments. These scripts call generic REST APIs and check at customer configurations, resource consumption, data usage and suggest corrective action, if necessary. Developing MapR documentation and/or knowledge articles that include more detailed explanations.
    • Utilizing strong analytical skills to dig deep into the problem, performing root cause and assigning to relevant categories such as bug, configuration, and hardware for effective problem solving. Managing comprehensive notes, timely updates to the customer and providing analysis of log files are a requirement
  • Evaluating open source Hadoop components for data processing, data ingestion, and analytics (30%):
    • Maintaining a constant vigil on data ingestion and data processing software. Understanding map reduce and scheduling mechanism for data processing and tune the environment for optimal use of resources. Analyzing any job failures by looking at logs and quickly suggesting venues for restoration of the job.
    • Communicating technical concepts clearly and effectively on query and optimization for analytics. Evaluating customer queries and suggesting splitting the queries in a distributed fashion for performance. Evaluating network and resource bandwidth and suggesting alternatives.
    • Guiding customers on data ingestion by suggesting right technology and helping them use the right tools for ETL (Extract, Transform, and Load). Helping to configure Metastore and multiplicity for access to Metastore.
    • Analyzing slow java programs or failures related to JVM using j-stack and j-map to see which processes or threads are either blocked or slow and suggest a corrective action.
    • Using tools like Valgrind to find memory leak issues and fix issues reported by customer.
  • Coordinating with Engineering and other internal teams for escalations and create knowledge articles (20%):
    • Coordinating with Engineering on product defects, testing and patch delivery to customers. Responsible for reproducing the customer issue on Amazon cloud or OnPrem and providing access to Engineers.
    • Maintaining all communication and ownership of cases opened by customer and bringing in sales engineers for enhancements and internally working with backline to shadow issues.
    • Opening Bugzilla bugs and analyzing on issues and if necessary, looking at the code to determine the exact issue.
    • Working with consulting and accounting teams to identify opportunities for MapR solutions to make sure a high level of customer satisfaction is maintained.
    • Capturing best practices as part of engagement and writes knowledge articles for
      self-service.
Skills/Tools/Technologies Required:
  • Bachelor's degree in Computer Science, Computer Engineering, Electrical/Electronics Engineering or a related field.
  • Tools: tcpdump, netstat, SAR, free, htop, IOStat, meminfo, jstack, gdb, vmstat
  • Programming languages: Java, python, SQL, sparkSQL, scala and C/C++
  • Technologies: Hadoop, Sqoop, flume, hive, HBase, hiveserver 2, hdfs, MySQL, spark, tez, and oozie
  • Knowledge of Linux operating system administration and troubleshooting
  • Familiarity with MapR build tools such as fsck, gfsck, guts and analyzing core dumps
  • Java/C++ development experience is a plus

I'm interested

Please send me alerts for jobs like this

Not ?

Thank you. Please wait while we forward you to the application.

Similar Jobs

There are currently no jobs matching this criteria