Facebook Data Center Automation Engineer in Altoona, Iowa


Facebook's mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities - we're just getting started.


Facebook is seeking a forward thinking experienced Machine Learning Automation Engineer to join the Data Center Site Operations team. Our data centers, and the tens of thousands of servers installed in them, are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered. Facebook is at the leading edge of the global data center industry both in terms of how data centers are designed and operated. This person should enjoy working in a fast-paced environment where adaptability and flexibility will be key to their success.

The candidate will be a forward-thinking IT professional with advanced hands-on technical skills in Networks, Server Hardware and Linux in a Data Center environment with knowledge of applied Machine Learning technologies. Having extensive knowledge of managing servers, programming/scripting, and performing complex projects in a large-scale distributed data center environment is a core competency. Solid communication with cross-functional teams is a requirement, and the role requires ability to remotely collaborate with peers and other teams in complex projects.

Required Skills:

  1. Lead investigation of complex technical challenges on a scale beyond the individual Data Center site, and spanning multiple disciplines such as Hardware, Linux, and Networking

  2. Seek opportunities to globally improve and innovate using applied machine learning in key areas such as server integration and repairs, documentation and standardization, tooling and automation, and Data Center design and capacity planning

  3. Leverage existing metrics and define new ones to support troubleshooting of global issues of a complex nature

  4. Proactively analyze and make use of data through applied machine learning to identify risks and opportunities for efficiency gains

  5. Ability to advance the machine learning model(s) and algorithms using languages such as (but not limited to) Python, Lumos, Clue

  6. Define and improve standards, processes, and best practices for teams globally to leverage by providing actionable feedback from automated applied machine learning determinations

  7. Subject matter expert for Data Center Engineers to escalate issues related to machine learning, infrastructure, incident management or capacity planning

  8. Build cross-functional relationships and have the ability to influence policies and procedures to improve global data center operations

  9. Act as member of project teams developing new tools or enhancing existing ones, together with our engineering teams in Menlo Park, CA and Dublin, Ireland

  10. Drive tooling improvements through prioritization in tooling road maps, while partnering closely with tooling and automation program managers

  11. Based on operational escalations and scaling challenges, develop and create the appropriate use cases for tooling improvements and automation enhancements through the use of applied machine learning

  12. Builds strong relationships with other groups within engineering and/or across the company

  13. Actively solicits feedback from related teams, and uses that feedback to improve tooling efficiency as infrastructure scales

  14. Ability to travel up to 30% required

Minimum Qualifications:

  1. BS, BEng or BA in technical field or commensurate experience

  2. 7+ years of technical experience in a data center environment

  3. Knowledge of applied machine learning and its relationship to data center environments

  4. System knowledge of Linux, server hardware and Data Center automation

  5. Experience working with at least one programming language

  6. Experience processing and analyzing large data sets

  7. Knowledge with revision control systems such as Mercurial, Git, SVN, and Phabricator

  8. Experience interacting with SQL databases and distributed storage systems such as Hadoop

  9. Knowledge of networking principles and related technologies, protocols and standards

  10. Experience working in Data Center environments, and knowledge of infrastructure found in Data Centers such as cooling, power distribution and fiber-optic cabling

  11. Knowledge of large-scale supply chain, logistics and asset management in a Data Center environment

  12. Experience managing multiple projects within the same time schedule, and time management experience

  13. Experience working individually as well as in groups

  14. Communication experience

Industry: Internet

Equal Opportunity: Facebook is proud to be an Equal Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. If you need assistance or an accommodation due to a disability, you may contact us at accommodations-ext@fb.com or you may call us at +1 650-308-7837.