Incident Report: Intermittent Transaction Failures

Incident Report for Paychoice

Postmortem

Date of Incident: 31 July 2025
Timeframe: 06:32 – 12:24 AEST
Service Affected: PayChoice Transaction Processing via Issuer API
Impact Level: Medium
Duration: ~6 hours
Root Cause: Intermittent network connectivity issues at the issuer's data centre

📋 Summary

On the morning of 31 July 2025, PayChoice experienced degraded performance and intermittent transaction failures due to network connectivity issues when interacting with the issuer’s API. The issue originated from within the issuer’s data centre, where intermittent disconnections caused transaction requests to fail.

🕒 Incident Timeline

Time (AEST) Event
06:32 Monitoring alarm triggered for slower than average response times from the issuer gateway.
07:37 Alarm triggered for degraded performance in PayChoice processing infrastructure.
08:00 Internal investigation commenced on infrastructure performance.
08:45 Intermittent transaction failures identified; deeper infrastructure review initiated.
09:03 Second alarm for slow response from the issuer gateway raised.
09:18 PayChoice network investigation revealed no internal issues. Issue escalated to the issuer. Initial response indicated no known issues.
09:30 Ongoing review of PayChoice infrastructure for potential causes of network disconnects.
11:35 Verbal confirmation received that the issuer was investigating a potential issue in their data centre.
12:24 The issuer resolved the issue. Network disconnections ceased. Transaction processing stabilised.

📉 Impact

  • Scope: Intermittent transaction failures for customers transacting via PayChoice through the issuer.
  • User Impact: Affected transactions either timed out or returned error messages to end users.

🧠 Root Cause

  • Network instability and disconnections between PayChoice and the issuer’s gateway.
  • Confirmed to be caused by an issue in the issuer’s data centre, leading to intermittent connectivity and timeouts during API calls.

✅ Resolution

  • The issuer resolved the issue within their data centre at 12:24 AEST, restoring stable network communication.
  • PayChoice transaction processing returned to normal immediately after.

🔍 Lessons Learned

  • Monitoring effectively detected response time degradation early.
  • Internal infrastructure checks were comprehensive and eliminated internal causes promptly.
  • Initial delays in identifying the issue were due to lack of immediate confirmation from the issuer.

🏁 Conclusion

This incident highlighted the importance of proactive monitoring, rapid triage, and timely escalation. While PayChoice systems responded as designed, further improvements will be made to enhance resilience and visibility in the event of upstream failures.

Posted Aug 01, 2025 - 15:11 AEST

Resolved

On the morning of 31 July 2025, PayChoice experienced degraded performance and intermittent transaction failures due to network connectivity issues when interacting with the issuer’s API. The issue originated from within the issuer’s data centre, where intermittent disconnections caused transaction requests to fail.
Posted Jul 31, 2025 - 06:00 AEST