When managing large files as a part of your application’s repository, it’s important to have a proper strategy in place. These files could include audio, video, datasets, graphics among others. Attempting to manage these files with Git alone can cause issues as Git isn’t designed for versioning large files. The most common problems are slow cloning and fetching from repositories and difficulty changing files, such as checking out different versions. This post aims to discuss a strategy for handling large files, primarily focusing on Git Large File Storage (Git LFS) and git-fat.
Git Large File Storage (Git LFS)
Git LFS is an open-source git extension, which reduces the impact of large files in your repository by downloading the relevant versions of them lazily. Specifically, it replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the actual file contents on a remote server like GitHub or Azure Repos.
For Azure DevOps, Azure Repos provides an unlimited free amount of Git LFS storage for repositories in a project.
To get started with Git LFS, follow the below steps.
- Install Git LFS:
$ brew install git-lfs
- Set Git LFS up:
$ git lfs install
- To start versioning large files, select the file types you’d like Git LFS to manage:
$ git lfs track "*.psd"
Ensure to commit your .gitattributes file, which stores this file tracking information.
The key advantages of Git LFS include:
- It works seamlessly with any Git server.
- It is straightforward to set up and configure.
The downsides include:
- It requires an additional step to set up on a new clone.
- There’s a need to manually list which file types to track.
git-fat
Like Git LFS, git-fat is also an extension to Git for handling large files. However, git-fat uses rsync to transfer files and hence, can be used with any server where you have an SSH login. Instead of a simple pointer, git-fat maintains a stand-in file with a checksum of the actual file content.
Git-fat can be installed and set up as follows:
- Install git-fat using pip or easy_install:
$ pip install git-fat
- Create a .gitfat file in the root of your repository:
This file should contain something like:
[rsync]
remote = user@yourserver.com:/path/to/git-fat-store
- Track file patterns to be managed by git-fat using a .gitattributes file. Eg:
*.png filter=fat -crlf
- Run `git fat init` to complete the setup.
Git-fat’s advantages include:
- It is compatible with any server that supports SSH, including private ones.
- It doesn’t require large files to be declared explicitly before adding them.
The downsides are:
- It is less known and not as well-supported as Git LFS.
- It is Python-based, thus Python interpreter is required on both client and server side.
A strategy for managing large files in a DevOps pipeline should take into account the trade-offs associated with the approach chosen. Git LFS’s extensive support and straightforward configuration make it favourable for many projects; although git-fat’s flexibility and simplicity can be useful when dealing with servers where SSH login is available.
Developers should opt for the solution that best suits their needs, while aligning with the broader strategy of the DevOps implementation. It is essential to test on a non-production environment before executing the migration in a production environment as handling large files in Git is often a complex process.
Practice Test
True or False: Git LFS stands for Git Large File Storage.
- True
- False
Answer: True.
Explanation: Git LFS is an abbreviation for Git Large File Storage. It is a Git extension for versioning large files.
Git LFS replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.
- a) True
- b) False
Answer: a) True.
Explanation: Git LFS does exactly that. It stores the binary file format in a separate server and leaves a pointer in the original repository.
Git-fat is:
- a) A Git extension for managing and versioning large files.
- b) A tool for compressing large files.
- c) A scripting language for Git.
- d) Another name for Git LFS.
Answer: a) A Git extension for managing and versioning large files.
Explanation: Like Git LFS, git-fat is a Git extension for handling large files.
True or False: Git-fat and Git LFS can’t be used together in a project.
- True
- False
Answer: False.
Explanation: Both can indeed be used together in a project if desired or necessary.
Is ‘git lfs track’ the command used to track new files in Git LFS?
- a) True
- b) False
Answer: a) True.
Explanation: The ‘git lfs track’ command is used to track new files in Git LFS.
Git-fat works by storing large binary files where:
- a) In a separate repository.
- b) In the same repository but in a different branch.
- c) On a remote server.
- d) None of the above.
Answer: a) In a separate repository.
Explanation: Git-fat works by storing the large binary files in a separate fat repository.
True or False: Git LFS uses smudge and clean filters to track changes to files.
- True
- False
Answer: True.
Explanation: Git LFS uses these filters to convert large files to space-efficient pointer files when committing, and back to their actual file content when checking out.
Which extension is more appropriate when working with a lot of small size binary files?
- a) Git LFS
- b) git-fat
- c) Both are equally appropriate
- d) None of the above
Answer: a) Git LFS
Explanation: Git LFS is designed for versioning all types of binary files, it’s not limited to large files.
In a Git server, the file limit can be set by:
- a) Setting the ‘postBuffer’ limit
- b) Setting the ‘fileLimit’ limit
- c) Setting the ‘LFS’ limit
- d) Setting the ‘fatLimit’ limit
Answer: a) Setting the ‘postBuffer’ limit
Explanation: In Git, the ‘postBuffer’ property is used to set the file limit.
True or False: Managing large files in Git repositories can drastically slow down processes.
- True
- False
Answer: True.
Explanation: Managing large files directly in Git repositories can significantly slow down cloning and fetching times and use a lot of storage. That’s why extensions like Git LFS and git-fat were created.
Azure Repos supports Git Large File Storage (LFS)
- a) True
- b) False
Answer: a) True
Explanation: Microsoft’s Azure Repos provides unlimited free storage for your Git LFS files.
Git LFS and git-fat do not keep versioned history of the large files.
- a) True
- b) False
Answer: b) False.
Explanation: Both Git LFS and git-fat provide version control for large files, similarly to standard size files.
Git LFS is compatible only with Linux.
- a) True
- b) False
Answer: b) False.
Explanation: Git LFS works on various platforms including Linux, MacOS and Windows.
git-fat requires rsync to operate.
- a) True
- b) False
Answer: a) True
Explanation: git-fat requires rsync, a utility for efficiently transferring and synchronizing files across computer systems.
Git LFS allows push and pull operations to work with large files just as they do with smaller files.
- a) True
- b) False
Answer: a) True
Explanation: Git LFS allows users to work with large files as if they were normal files. The ‘push’ and ‘pull’ actions behave just as they do with smaller files.
Interview Questions
1. What is Git Large File Storage (LFS)?
Git LFS is a Git extension that reduces the impact of large files in your repository by downloading the relevant versions of them lazily. Specifically, Git LFS replaces large files, such as audio samples, videos, datasets, and graphics, with text pointers inside Git, while storing the file contents on a remote server.
2. What problems does Git LFS solve?
Git LFS solves problems related to storing, updating, and retrieving large binary files in Git repositories. It helps to keep your repository’s clone times and disk usage low, allowing you to work with large files more effectively.
3. What is git-fat?
Git-fat is a simple, lightweight solution for handling large files with Git. It identifies large files and places them in a separate location known as a fat store, replacing the large files in the repository with placeholders.
4. How is Git LFS different from git-fat?
Git LFS is a Git extension that introduces new ‘lfs’ command to Git. It stores large files on a separate server that you can configure as per need. On the other hand, git-fat uses rsync to store and retrieve large files from a designated fat store.
5. Can Git LFS handle file versioning?
Yes, Git LFS supports file versioning. It tracks changes to large files over time and allows you to retrieve previous versions when necessary.
6. Is it possible to use Git LFS with a private repository?
Yes, Git LFS supports both public and private repositories. You’ll need to ensure that the LFS server you use also supports private repositories.
7. What are the benefits of managing large files outside the Git repository?
Managing large files outside the Git repository keeps the repository light and easy to manage. It also helps to ensure efficient use of system resources and quickens tasks like cloning or pulling the repository.
8. How does Git track changes in large files with Git LFS?
Git LFS tracks changes to large files by storing a pointer to the file in the repository and storing the actual file contents on a separate LFS server. The pointer is a small text file, which can be version controlled like any other.
9. What considerations need to be made when choosing between Git LFS and git-fat?
Some considerations might include how much control you need over where your files are stored, what kind of file versioning support you require, the level of community support and development activity of the project, and the complexity of setup and use you’re willing to manage.
10. Can you use Git LFS without a dedicated LFS server?
While it’s technically possible, it’s not recommended. The LFS server stores the actual file content and if it’s not available, you’ll only have the text pointer files in your repository and won’t be able to access the large files themselves.
11. What is the communication protocol between Git LFS and the LFS server?
Git LFS uses the HTTPS protocol to communicate with the LFS server. It uploads large files to the server using HTTP, and downloads files using a custom Git LFS transfer protocol.
12. What commands are used to track large files in Git LFS?
The ‘git lfs track’ command is used to track large files in Git LFS. This needs to be followed by the file or file type you wish to track, for example ‘git lfs track “*.iso”‘.
13. How does git-fat handle large files when pulling a new committed version?
Git-fat relies on rsync to handle changes to large files. When you pull new changes, the placeholders for updated files are replaced with the large files from the fat store, effectively ‘fattening’ them.
14. Can you convert a regular Git repository to use Git LFS?
Yes, this is possible. The ‘git lfs migrate’ command can be used to convert a regular Git repository to use Git LFS, while converting all large binary files into LFS pointers.
15. Is there a file size limit for Git LFS?
There isn’t a set limit on the size of individual files in Git LFS, but there might be limits imposed by your LFS server or hosting provider. It’s also worth noting that extremely large files might require significant system resources to handle.