Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

This covers common troubleshooting for when the site cannot be reached.

Step-by-step guide

  1. Check the site to see if there is an error message
    1. If there was a deployment recently in Slack, try rolling back the deployment.
    2. If you see an error about SSL certificates go to the SSL section below
    3. If there is no message at all, you may have a firewall error
      1. Go to AWS and check the EC2 page
      2. Select the app box
      3. In the Description tab at the bottom of the page check the Security Groups
        1. It should have CHF-Access-Web (staging) or Public-Access-Web (production)
        2. If it does not, select the box, go to Actions→Networking→Change Security Groups and add the group for a quick fix
        3. Then go to your machine with Ansible and check the variables for the security groups to be automatically applied
          1. The file to check will be app_TIER_override as app's security groups always differ from the other machines
          2. Make sure the box has the Management-Access group and the Temp and internal networking groups for its tier as well as the Access-Web group above
          3. Save edits, commit the changes and push them. This will put them in the automatic updates to avoid the problem in the future
    4. If there is an Apache error follow the rest of the guide
  2. SSH into the app box(es) you wish to check
  3. Check /opt/sufia-project/current/log/production.log for Sufia errors that might explain the problem
  4. If no problems appear there, check /var/log/apache2/error.log or /var/log/apache2/other_vhosts_access.log to find errors or access requests
  5. If the problem appears to be a Passenger error or an Apache error, try a quick sudo service apache2 restart to fix things. Apache restarts also restart Passenger
    1. If Apache does not restart, check the apache error.log for the details

SSL Errors

A SSL error is one of the more likely issues, you should see a notice about the site no longer being trusted. Staging uses Let's Encrypt certificates while Production uses a GoDaddy certificate that IT manages.

In production

  1. If an SSL error occurs, in the browser check to make sure the SSL cert is valid for the current date.
  2. Go onto server and check the SSL file's name and location match the name and location in the apache config file at /etc/apache2/sites-enabled/sufia-project_ssl.conf
  3. If the file date covers past today and the name and location match, check the permissions on the file with ls -l to make sure it can be accessed.
  4. Check the current SSL file on XXX and md5sum it and compare it to the md5sum of the file currently on the server.
  5. Check via the command line with XXX to see the expiration date of the

On staging

  1. Check the expiration date on the SSL certificate
    1. If expired check the crontab for certbot
    2. Try running certbot manually
  2. Check the apache config file /etc/apache2/sites-enabled/sufia-project_ssl.conf to make sure that the name of the file points to a soft link that has the most up to date version of the certificate file.


 

Filter by label

There are no items with the selected labels at this time.


  • No labels