I use Git to track changes to my Logseq Graph and keep a remote version of my repository on GitHub (as a backup). Logseq can automatically commit changes to your files at a given a time interval and this gives a great record of what you are doing all day, with data that is relatively easy to analyze (if you ever wanted to).

Tracking Logseq with Git works very well, until it doesn’t!

A few weeks ago, I added a large file (>100MB) to my graph, which was automatically tracked by Git. This prevented me from pushing any changes to my remote repository. I quickly removed references to the file from my graph, and deleted the file from the assets folder in Logseq, but the file was still tracked in the repository history. I had to remove it, but now removing it meant altering my the history of my repository.

I ignored this problem for a while, meaning that I racked up a few hundred automatic commits in the repository AFTER mistakenly adding the big file.

Removing a file from Git history

First, I backed up my repository by copying the Logseq files to a new directory. This was crucial - it took a few attempts to get this working correctly! Each time something went wrong, I could try again from scratch.

So, I needed to remove my one large file from Git, and update all of the history for the hundreds of commits that came after I added the file. To do that, I looked into using BFG Repo-cleaner, but this seemed more focused on files already present in a remote repository. In my case, I wasn’t able to push any changes to the remote, so I needed to change the local history, then reconcile this updated history with the very out-of-date remote.

I settled on git filter-branch to ignore my large file and alter all of the commits after I had added and removed the file. This actually reconstructs the entire history of the repository, so for me (with 25k commits in my repo), it took about 10 minutes:

git filter-branch --force --index-filter "git rm --cached --ignore-unmatch assets/my-very-large-file" --prune-empty --tag-name-filter cat -- --all

Reconciling remote and local histories

When this was sorted and I had checked that my graph looked OK in Logseq (in a previous attempt, I aborted halfway through and my graph was riddled with git conflict annotations!), I was finally able to push my updated history to my remote repository. The internet recommended I force push to main, but the idea of pushing a messed-up history and overwriting the whole record of my Logseq graph was too much for me to stomach. I opted to create a new repository and push the updated history there.

Because my repository was already configured to use a different remote, I had to change to my newly created GitHub repository:

git remote set-url origin https://github.com/my-repo2

Then, I pushed the changes to my new repository:

git push origin main --force

It failed (some network issue), then it worked! But some of my changes hadn’t been reconciled with the new main branch of my repository. I got the message:

Warning: you are leaving 67 commits behind, not connected to
any of your branches:

  cd09fdd86 Auto saved by Logseq
  c2626c134 Auto saved by Logseq
  bf024fa5c Auto saved by Logseq
  c93f0921b Auto saved by Logseq
 ... and 63 more.

If you want to keep them by creating a new branch, this may be a good time
to do so with:

 git branch <new-branch-name> cd09fdd86

I think I accidentally edited a few Logseq files while I was updating the history (oops!). Now, the changes conflicted with the new main branch of my repo. I needed to put these changes in their own branch, push them to the remote, and create a Pull Request:

git branch latest_branch cd09fdd86
git push --set-upstream origin latest_branch

Then, I resolved all of the conflicts I had created, merged the PR, switched back into the main branch:

git checkout main
git pull

And it worked! My big file was removed from the repo’s history, and my Logseq files were up to date.

Quite a catastrophe I caused by simply trying to save a video in my Logseq graph!