District Portal

Outage Recap - Improvements

Information

Good Evening,

Over the past week, we have made substantial improvements to our TITAN infrastructure to accommodate increased system usage. We hope you have noticed significant performance improvements. As a cloud provider, we understand the importance of delivering a seamless service. You should never be left wondering about the stability and system performance. We recognize the impact that our recent outage and the subsequent instability last week had on your team.

Given these issues, we are further planning additional infrastructure improvements this fall in anticipation of continued growth. If you are interested in knowing more about our infrastructure changes, please continue reading below.

Again, we extend our sincerest apologies for the outage on August 18 and subsequent instability. We look forward to continuing to earn your faith and confidence in all aspects of our TITAN service. We appreciate your patience as we navigate a tumultuous back-to-school season.

Sincerely,
TITAN Support


The following are some of the changes we made to our infrastructure this past weekend:
Code Changes
• Rerouted a substantial number of our read/write database calls to our read-only database, reducing overhead.
• Eliminated redundant Redis calls by skipping the update on every poll request.
• Made changes to direct queries to the read-only database instance, preventing table locks in the job/task and message queue.
Hardware/Environment
• Both production and failover environments were brought to parity in terms of resource power.
• Disabled auto-scaling and fixed available server capacity to 30 servers for the immediate future.
• Doubled the CPUs in the read-only database to 64 CPUs.
• Portal servers have been changed to a second-generation N2 Custom 8 CPU’s, increasing the power of our servers.
Redis
• Enabled Redis clustering adding multiple nodes versus single node, increased storage, disabled persistence causing unnecessary restart of processes.