HTTP uses the Content-Type
header to inform web browsers what they’re getting back.
If we can control it, then it would be nicer to use a clean URL instead of a URL
that has an extension. In other words, it would be nice if HTML pages did not
have .html
in the URL.
However, by default Hakyll will always include the .html
extension
with the HTML pages it generates. It is actually not that hard to remove it, though,
and here I describe how I did it for my personal web site.
Setup in Hakyll
Some approaches rely on web servers typically serving the index.html
file in a directory
to serve an HTML page corresponding to a URL for the directory. However, I wanted to use
a more direct approach which avoided including .html
in the file names in the first place.
With a vanilla installation of a Hakyll site, you will see code such as the following,
which switches the file name extension to .html
for the file which will contain the
HTML output translated from the original file.
"about.markdown" $ do
match $ setExtension "html"
route ...
It is really easy to switch things so that it removes the extension, instead.
Simply set the extension to the empty string instead of .html
.
"about.markdown" $ do
match $ setExtension ""
route ...
Directory URLs
If you would like to use directory URLs, i.e., URLs which end with a slash
and whose content is actually contained in an index.html
file,
then some more work will need to be done if you have any links automatically
generated by Hakyll, which will likely be the case. Hakyll will by default
include the file name in the links it generates for index.html
files,
so we would like to remove the index.html
from these links.
A common way is to generate pages as usual and to clean up the URLs afterwards.
This has the disadvantage that it can be easy to forget to clean up URLs
in every case they should be. You may also have to clean up URLs differently
for different cases. For example, I had to clean up URLs in sitemap.xml
differently from HTML pages when previously using this approach.
I also did not realize that the usual way of cleaning up URLs
does not work with Hakyll’s built-in method of generating feeds.
An alternative approach which I now use is to not include the index.html
part
in the generated links in the first place.
The default context provided by Hakyll generates URLs by
translating them from the route using the toUrl
function.
So what I can do is to use another context I call siteContext
,
where it cleans up the URL generated the same way and overrides "url"
metadata field.
I then use siteContext
everywhere that I would usually use defaultContext
.
siteContext :: Context String
= field "url" clean <> defaultContext
siteContext where
-- Clean up "index.html" from URLs.
= do
clean item <- getRoute (itemIdentifier item)
path case path of
Nothing -> noResult "no route for identifier"
Just s -> pure . cleanupIndexUrl . toUrl $ s
The actual cleaning of index.html
from URLs is done with cleanupIndexUrl
,
which strips index.html
from local URLs.
cleanupIndexUrl :: String -> String
@('/' : _) -- only clean up local URLs
cleanupIndexUrl url| Nothing <- prefix = url -- does not end with index.html
| Just s <- prefix = s -- clean up index.html from URL
where
= needlePrefix "index.html" url
prefix = url cleanupIndexUrl url
To prevent using defaultContext
by mistake instead of siteContext
,
I use a custom HLint hint.
- warning: {lhs: defaultContext, rhs: siteContext}
By overriding the "url"
metadata field this way,
Hakyll will use the clean version of a directory URL in the first place,
and I do not have to worry about forgetting to clean up URLs
in site maps or feeds.
Setup in Apache
Using file names with no extension is all well and good, but it would be couterproductive
if web browsers treated the content as plain text or a blob of binary bytes.
In other words, we need the HTTP server to actually set the Content-Type
to text/html
for the HTML pages.
My web site is served using the Apache HTTP server on a shared host.
Since I cannot change the main configuration for the server, I put the following in .htaccess
:
<FilesMatch "^[^.]+$">
text/html
ForceType</FilesMatch>
This will force the HTTP server to set the Content-Type
to text/html
if the file name has no extension.
Obviously, this will not work as intended if I had dots in the names of files containing HTML,
but this is fine for me because I have no such files, and my file naming convention avoids such files.
In fact, I have Hakyll generate my .htaccess
file as well,
so I don’t have to worry about copying or editing it separately.
See site/server/htaccess
.
Custom server
There is nothing more to do if all one wants is to serve HTML pages without including the extension in the URL. However, I would like to preview my site without standing up my own Apache HTTP server.
Hakyll uses the warp HTTP server for previewing a site locally.
It does not know to serve files without an extension as HTML,
so I made my own customizations to warp so that it would set Content-Type
to text/html
for files without an extension.
main :: IO ()
= hakyllWith config rules
main where
= defaultConfiguration { previewSettings = serverSettings }
config
serverSettings :: FilePath -> Static.StaticSettings
= baseSettings {ssGetMimeType = getMimeType}
serverSettings path where
= Static.defaultFileServerSettings path
baseSettings = ssGetMimeType baseSettings
defaultGetMimeType
-- Overrides MIME type for files with no extension
-- so that HTML pages need no extension.
=
getMimeType file if Text.elem '.' (fromPiece $ fileName file)
then defaultGetMimeType file
else return "text/html"
Caveats
A caveat with the way clean URLs are implemented here is that HTML files should not have a dot in their file names. This is not a problem for me because my file naming conventions avoids this.
See also
The source code for this site.
The approach described on this page is not the only way to use clean URLs with Hakyll. Others have described alternative approaches.
Clean URLs with Hakyll by Rohan Jain
Jekyll Style URLs with Hakyll by Andreas Herrmann