How to Setup & Run PHPCS on Changed files in Github Actions
This is a problem that has come up a couple of times in my career and now I'm taking some time to document it. If you find it useful, please reach out and let me know!
When working on legacy codebases, one thing that is often sorely missing is consistent code formatting. My experiences have been no different. The codebase at my new job is massive and has been worked on by a dozen different developers over half a decade. Never in that time did anyone get around to setting up code formatting rules. The team had a few internal discussions and tried to come to verbal agreements about code style, but unfortunately, when crunch time happens, it's easy to ignore or look over things when the gatekeeper is a human being. God bless them, but it's time for change!
When I joined a year ago, one of my first self improvement PRs was to add PHPCS to the project. I spent a on/off 8~ hours over a couple of weeks to fix all of the errors in violation of PSR-12 in our Laravel project. I was even careful to add commits for various stages of the autofixer. For each new error I saw, if it could be fixed automatically, I would run it, commit, and then rinse and repeat. By the end, I had 15 commits nicely delineating various fixes. The real problem was that there were 2,550 changed files!
If we had a comprehensive test suite, it would be pretty easy to confirm that this didn't change anything, but because I was 2 months into a new job, I can understand why others on the team were hesitant to merge a 2500+ file PR of changes. It would be a huge waste of time to view each file separately and almost impossible to test everything. I put the project on the back burner and knew one day I'd get back to it.
Using git diff to find changed files
Since I'm working on a legacy project, it became clear that the only way to really make this happen was an incremental approach. Ideally, when a contributor opens a new PR, only the changed files would be run against PHPCS. Great, seems easy enough! How does one get the changed files? We're using git so it makes sense that the same tool should be able to spit out a list of changed files, and it absolutely does. There's just one little problem...
The command to get a list of changed files is git diff
. After some googling, I eventually settled on the following command.
git diff --name-only --diff-filter=d origin/$GITHUB_HEAD_REF..origin/$GITHUB_BASE_REF -- '*.php'
The variables in the command are part of the Github Actions ENV. At face value, this will give us exactly what we want; a diff of file names, without deleted files (--diff-filter=d
), comparing the current branch to the target PR branch. The diff command also supports filtering by a blob string.
When working with frequent small commits against one branch, this approach generally works very well. The trouble arises when you use feature branches or release branches. In our case, it's the latter. Every sprint has it's own branch and tickets are merged into it over the course of the sprint. However, during the sprint, we might release hotfixes to main
and then merge main
periodically back into the sprint branch.
The issue, is that git diff
is kinda stupid and will show you the full diff between the two branches. So let's say I start on a branch that originates from main
. I tend to use this workflow because then it's always safe to release as a hotfix if needed. Then I'll open a PR and point it to the sprint branch. As the sprint progresses more and more code will get merged into the sprint branch, and thus, the diff will have more and more files the more commits are in between my branch and the target sprint branch.
I know what you're thinking; "Daron, why isn't everyone else fixing the PHPCS issues in their tickets"? This is a great question! They're human. Shit happens and sometimes people have issues with their IDE or whatever and it doesn't make sense for our team right now to totally block a PR on codestyle. I also don't think it's fair that someone else should have to fix all of the files from someone else's work.
This solution was actually merged into our codebase a couple of months ago. It was relatively straightforward to add and for the hotfixes, it was a good way to incrementally fix things in frequently touched files (router, configs, etc).
But the progressively expanding codestyle CI failure was still a problem and today I was determined to fix it!
Using the Github CLI to find changed files
Earlier this evening, I was doing some googling trying to figure this out again. I came across the same stackoverflow posts and articles from before as well as some Github Actions that claimed to do what I wanted. Generally for these types of issues, I want to understand the problem and own the solution as much as possible. We've already established that this isn't a particularly hard problem to solve, but rather one with nuance that needs to be tailored to our real life workflows.
The key search term ended up being something like "how to generate git diff like github pull request changed files". Github seems to produce the result we want already, so it made the most sense to try and reproduce what they are doing. This led me to a command that is part of the Github CLI: gh pr diff
, which does exactly what you would think. If a PR is open for the current branch, it will return a diff of the changed files, importantly, using the information from the PR. The nice thing about this, is that it only shows changed files after all the magic of fast forwarding and merge commits. No longer must I contend with an ever increasing set of "different" files when pointing a 1 line change back to our long running sprint branch.
To make it all place nice, I needed to pipe a few things around, and I will now explain the full command.
gh pr diff ${{ github.event.number }} --name-only | grep .php | xargs find 2> /dev/null | xargs phpcs -nq
First, we run the gh pr diff
command passing the PR number as an argument and the --name-only
flag. If you are doing this in a full git repo that is connected to the remote Github repository, their CLI will automagically resolve the PR for you. In this case, I am deriving it from the running Github Action.
Next, we pass that output to grep
to get files that include .php
since contributors make create PRs with a mix of different file types. While PHPCS can handle some other files besides PHP files, it's easier for us to filter everything else out.
After we have our relevant files, we pass those to find
using xargs
. This is to ensure that all of the files we pass to PHPCS are present. If we pass a file that has been deleted to PHPCS, it will error out. The find
utility is essentially being used as a fancy filter. If you are unaware, xargs
is a tiny utility to take any piped input and pass it as arguments to another command. Since find
will error if a file is not found, we need to capture those errors and send them to /dev/null
so that our Action does not fail prematurely.
Finally, we pass the remaining existing files to phpcs
with the -nq
flags. These ignore warnings and quiet output respectively. Important to note, there is a phpcs.xml
file in the root of the repository, that will be loaded for linting rules and other configuration automatically. If phpcs
encounters any errors, those are written to stderr normally and the Action will fail!
I purposefully left out some of the other configuration from before, but the same principle could apply if you'd rather use that approach.
In the end, this is the final action file:
name: "PHPCS Code style"
on:
pull_request:
jobs:
build_app:
name: "PHPCS"
runs-on: ubuntu-latest
env:
GH_TOKEN: ${{ github.token }}
steps:
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: '8.2'
tools: cs2pr, phpcs
- name: Checkout code
uses: actions/checkout@v4
- name: Run PHPCS
# Get the changed files from the PR via Github CLI.
# `grep` to only PHP files then remove deleted files via `find`.
# Run `phpcs` on the remaining files and pass output to `cs2pr`.
run: |
echo "Files for this PR:"
gh pr diff ${{ github.event.number }} --name-only | grep .php | xargs find 2> /dev/null | cat
gh pr diff ${{ github.event.number }} --name-only | grep .php | xargs find 2> /dev/null | xargs phpcs -nq --report=checkstyle | cs2pr
As you can see, there is a little extra something something going on here to aid in debugging should something go wrong. I also found a neat Action that I do find useful, that can ingest PHPCS output and create annotations directly in the PR with what needs to change. Hopefully this helps contributors to pay more attention to the code style workflow results as we still will not be blocking merges anytime soon. I'm hopeful that it will work better though, since we are only linting changed files now.
Hopefully this helpful! If you liked it, please reach out. I'm on twitter and mastodon and links are in the footer of my blog. :)