■Engineering & Research Division / AI Data Center Architect ■About the role We are seeking an experienced AI Data Center Architect to join our team. In this role, you will be responsible for designing, implementing, and optimizing a medium-sized AI data center that leverages the latest hardware and software technologies to support our organization's growing AI and machine learning workloads. ■Job Scope 1. AI Data Center Architecture and Design: - Assess the organization's current and future AI and machine learning requirements, including compute, storage, and networking needs. - Design a scalable, resilient, and efficient AI data center architecture that can accommodate a variety of AI workloads, such as model training, inference, and data processing. - Ensure the architecture aligns with industry best practices, regulatory compliance, and the organization's IT and business strategies. 2. Hardware and Infrastructure Selection: - Evaluate and select the appropriate server hardware, including CPUs, GPUs, and specialized AI accelerators (e.g., NVIDIA Tensor Core GPUs, Google TPUs). - Determine the optimal storage solutions, considering factors like capacity, performance, and data redundancy (e.g., high-performance SSDs, NVMe, network-attached storage). - Ensure the networking infrastructure can support the required bandwidth, low-latency communication, and data transfer requirements. - Incorporate power and cooling considerations, such as efficient cooling systems and redundant power supplies, to maintain optimal operating conditions. 3. Software Stack Integration and Optimization: - Evaluate and integrate the appropriate operating system (e.g., Linux distributions), containerization platform (e.g., Docker, Kubernetes), and orchestration tools. - Integrate and configure leading AI/ML frameworks and libraries (e.g., TensorFlow, PyTorch, Keras) to enable efficient model development and deployment. - Implement data management and processing pipelines, leveraging tools like Apache Spark, Hadoop, or custom data ingestion and preprocessing workflows. - Optimize the software stack for performance, scalability, and resource utilization to ensure the AI data center operates at peak efficiency. 4. Monitoring, Observability, and Automation: - Implement comprehensive monitoring and observability tools to track the performance, resource utilization, and health of the AI data center. - Develop data-driven insights and analytics to identify bottlenecks, optimize resource allocation, and ensure overall system reliability. - Automate deployment, scaling, and management processes to streamline the operation and maintenance of the AI data center. 5. Security and Compliance: - Implement robust security measures, such as access controls, network segmentation, and data encryption, to protect the AI data center from potential threats. - Ensure compliance with relevant data privacy and regulatory requirements (e.g., GDPR, HIPAA) by implementing appropriate data governance and access policies. - Develop and test disaster recovery and business continuity plans to ensure the resilience of the AI data center in the event of failures or disasters. 6. Continuous Optimization and Scalability: - Continuously monitor the AI data center's performance and resource utilization to identify opportunities for optimization. - Implement auto-scaling and dynamic resource allocation mechanisms to handle fluctuations in workload demands. - Explore options for distributed or federated learning architectures to scale the AI capabilities across multiple edge devices or smaller data centers. 7. Collaboration and Knowledge Sharing: - Provide technical leadership and mentor junior engineers on AI data center best practices and strategies. - Collaborate with cross-functional teams (data science, IT operations, security) to ensure the AI data center meets the organization's evolving needs. - Document and share knowledge, solutions, and lessons learned to promote continuous improvement within the organization. ■Internal common IT tools - Google Workspace (Gmail, G-cal, Gmeet等) - Slack - Notion - SmartHR - Money Forward - Bakuraku etc. ■About Engineering and Research Division Our Engineering and Research Division consists of mainly three teams that handle end-to-end development of Hardware and software systems for Energy storage & power transfer solutions and services. Currently, approximately 50 specialists are engaged in the mission of advancing energy storage technologies and solutions. The Division is organized into the following teams: - Series Development: Responsible for prototyping, testing & validation , requirements engineering , series handover of new products including product support and commissioning. - Advanced Engineering: Responsible for development and experimentation into emerging technologies to sustain our current and future roadmap of energy solutions with focus on a areas viz. embedded development, PCB design, model based development, battery management, power conversion, digital twins, edge computing, cloud solutions ,AI/ML based dispatch optimization, generation and forecasts. - Product Lifecycle Management: Manages product & project life cycles by tracking across quality gates through development, sourcing, value engineering leading up to manufacturing and after sales activities through cross functional coordination and data intensive product life cycle assessment Working alongside talented engineers from around the world, you will have the opportunity to thrive in a diverse environment that values autonomy and empowers individuals to make effective contributions while gaining new skills and experiences on some of the latest emerging technologies directly applied into our solutions.