CASE
SRE

SRE monitoring for greater operational efficiency:

how our customer increased herd productivity by 75%

As the world changes digitally, the reliability of websites, cloud applications, and cloud infrastructure has become critical to business success. In addition, the way we manage systems and their workloads has changed as well. Entry-level servers are brought together through virtualization, with distributed software architecture, preventing outages from causing downtime and losses. The focus now is on  digital infrastructure and efficiency.

 

In search of strategic improvements in its operations, one of our clients, the largest financial institution in Latin America and one of the largest in the world, sought out Inmetrics’ team of specialists. The bank had a digital ecosystem with several integrated technologies, therefore, we present the SRE methodology as the ideal solution so that the squad of the Institution responsible for the PIX project could focus on  strategic areas, reaching the stipulated time to market without compromising the delivery quality. Thus, the team of Inmetrics specialists was allocated to our client to structure and implement the ideal model of SRE monitoring in the operations of the squad in question.

Implementing SRE

Site Reliability Engineering (SRE) is an approach to operations that ensures continuous applications run efficiently and reliably through automation and software engineering solutions. The key concept is engineering, which includes a data-driven approach to operations, and an automation culture to increase efficiency and reduce risk, and a hypothesis-driven methodology on incident, performance, and capacity tasks.

Challenges and Opportunities

The SRE methodology is adaptable and can be included in any squad of a company, according to the demand, maturity, or need of these teams. Therefore, the initial phase of our monitoring project at this Financial Institution was developed as follows:

 

We identified opportunities for improvement and understood the specific scenario of that technology environment alongside the squad responsible for the PIX project.


From there, we surveyed their main needs.


We structured an action plan based on brainstorm meetings, in which we verified the possibilities for evolution, and defined the strategies for that production environment.


We started the implementation phase of the site reliability engineering (SRE) disciplines according to the maturity and focus of the squad in question.

Challenges and Opportunities

From there, we defined our implementation methodology and the main objectives that we would pursue together with our client’s team. From the bottom to the top of the pyramid, we have directions from Inmetrics experts:

USER EXPERIENCE AND RELIABILITY

The final proof of user experience in relation to our customer’s products and services via intelligent monitoring

CAPACITY PLANNING

Data correlation, generation and validation of mathematical models, consumption projection, limit analysis, and improvement report with guaranteed SLA

FAILURE INJECTION

Insertion of coordinated failures, result in monitoring and creation of systemic resilience gates in the application solution

LAUNCH ENGINEERING

Concentrate and structure event logs and reports. Define, improve and integrate infrastructure, business, and APM dashboards

MONITORING EVOLUTION

Concentrate and structure event logs and reports. Define, improve and integrate infrastructure, business, and APM dashboards

INSTRUMENT AND AUTOMATE

Definition of SLIs & SLOs, instrumentation of critical services, creation of alerts and automation in the fault response process

IDENTIFY AND MAPPING

An initial brainstorm with involved teams, process refinements and full system

Impacts of our work

Our specialists brought to the operations of the squad responsible for the PIX project the SRE principles to deal with infrastructure problems and process automation. We were responsible for developing performance, strategy, and optimization plans for these operations.

Right in the initial stages of implementation of the SRE methodology, the following gains could be observed:

Mapping
Visibility of all microservices that must be instrumented in the monitoring environment
Collaboration
A team willing and ready for new ideas and processes, facilitating collaboration and partnerships
Breaking silos
Better communication and information sharing between operational and development resources
TOIL decrease
Improvement after refinement of repetitive operational processes
Automation
Incident response through anomaly alarms and ticket creation with multiple severities
New thinking
Change in the way teams work, Ownership of IT services, and greater quality assurance for the end-user.
Previous
Next

In addition, with the implementation of SRE monitoring in squad operations, we made the systems more observable and considerably reduced the time spent on performing daily tasks, such as spot troubleshooting and war rooms, as we brought insights and accurate information, which effectively added value to our client’s processes.

COMPLETE TICKETS RESOLUTION
Reduced effort time during the troubleshooting process in dealing with tickets

WAR ROOMS
Average time spent in war rooms exponentially reduced

SRE monitoring increases efficiency and optimizes the working time of our client’s squad by 75%
There was an improvement in technical negotiations in general, with less effort and resources

contact

    NAME*

    EMAIL*

    PHONE*

    SEGMENT*

    POSITION*

    MESSAGE*

    I HAVE READ AND ACCEPT
    THE PRIVACY TERMS

    BRAZIL
    CHILE
    COLOMBIA
    DOMINICAN REPUBLIC
    BARUERI

    +55 11 3303-3200

     

    Av. Tamboré 267 – 21º andar
    Torre Norte,Tamboré
    Barueri SP – Brasil |CEP:
    06460-000

     

    SÃO PAULO

    +55 11 3303-3200

     

    Av. Eng. Luís Carlos Berrini, 105,
    16º andar | Sala 1607
    Brooklin Novo – SP
    Brasil | CEP: 04571-010

     

    SANTIAGO

    +56 2 3203-9507

    Cerro El Plomo, 5420
    Oficina 1503
    Las Condes | Santiago Chile
    Código Postal : 7560742

     

    comercial@inmetrics.cl

    BOGOTÁ

    +57 1 646-9642

    Carrera 19A #90-13
    Oficina 304, Bogotá
    Colômbia
    Código Postal: 110221


    comercial@inmetrics.co

    SANTO DOMINGO

    +1 809.794.5333 ext. 5334


    Calle Filomena Gómez de Cova No.3
    Edificio Corporativo 2015, Piso 7
    Local 701. Piantini