#! code: Drupal 11: Building A Link Directory: Part 1

$node->set('field_link_image', [
'target_id' => $file->id(),
'alt' => $node->getTitle(),
]);

Here’s a preview of the link directory in action, with a small sample of 8 pages. You can see that even the JavaScript heavy pages still rendered correctly.With the content type in place I also created a view so that the links could be shown in a directory. I used a simple grid setup to display the links with screenshots as a grid.I’m using a separate field for the last checked time because I wanted it to be different from the node created/updated times. As the last checked is only updated if the link was active we can use this to automatically scan the links and use this field to check to see what links aren’t active any more. I’ll address automatic link checking in a later article in this series, but for now it isn’t used much. After losing yet another bookmark again recently I decided to do something about it. I realised that #! code was the best place for it as I’m always logged into the site, so I set about creating a link directory on the site. I didn’t just want a big list of links though. In my mind a good link directory takes a screenshot of the site when the link is created so that it is easy to see what links are there from the screenshot of the original site.

Creating The Link Content Type

Now we have a place to store our links we need to look at creating the screenshot of the site, which was relatively straightforward to do with PHP. It does, however, require a bit of work to get everything set up correctly.

  • URL – This uses the Link field type to store the URL in use. Using the field settings I disabled the title field and required external links. The link field widget means that we get some basic validation for the form, although we won’t be using that too much.
  • Tags – Some tags that we can use for the links. This helps to categorise them.
  • Image – An image field that stores the screenshot for the link. This is used in favour a media field as we want the image to be connected directly to the Link item and don’t need it to be shared elsewhere.
  • Image updates off? – This is a boolean field that can be used to prevent updating the screenshot. This allows us to generate the screenshot in a number of different ways, but prevent it from being updated if it isn’t updated automatically.
  • Description – In order to provide context for searching the system a short description field is added. This will also help when looking through the link directory in the future, especially if the site has since gone offline.
  • Date last checked – Thinking of sites going offline, a timestamp field is used to store when the link was last checked. This will be used in the future to check links in the site to ensure that they are still online.

$file = $this->crawlerService->writeResultsFileDataToDisk($result);

The problem is that web browsers don’t allow you to categorise or search bookmarks so I can never find them again. Also when I swap browsers (which I have done twice this year) I end up having to migrate them over and set up synchronising between computers. This always removes the favicons of the sites so I have even more trouble finding the right link.Here is the full class that contains the presave hook that updates the image and last checked timestamp of the Link. We also use the setting in the “field_link_image_update” field to make sure we want to automatically update the Link image before doing so.After a couple of weeks of creating links in the directory I now have a few pages of results. I’ve been adding tags to things slowly so that I can find things, but the search system is working really well. All of the screenshots captured are exactly the same size, which makes creating a grid display of them quite easy.$fileData = $result->getScreenshotFileData();
if (is_string($fileData) && $fileData !== '') {
$fileData = base64_decode($fileData);
$directory = self::LINK_IMAGE_DIRECTORY;
if ($this->fileSystem->prepareDirectory($directory, FileSystemInterface::CREATE_DIRECTORY)) {
$fileName = $directory . '/' . $result->getUrl()->generateHash() . '.png';
$file = $this->fileRepository->writeData($fileData, $fileName, FileExists::Replace);
}
}

Join me in part 2 where I look at solving some of the problems of taking a screenshot of a site using a different mechanism. public function submitForm(array &$form, FormStateInterface $form_state) {
$ignore404 = (bool) $form_state->getValue('ignore_404');
$url = $form_state->getValue('url');

$result = $this->crawler->crawlUrl($url);

if ($result === NULL) {
$this->messenger()->addError($this->t('Unable to start the Chrome instance.'));
return;
}

if ($result->getResponseCode() === 403) {
$headers = $result->getHeaders();
if (isset($headers['cf-mitigated'][0]) && $headers['cf-mitigated'][0] === 'challenge') {
$this->messenger()->addWarning($this->t('Entered URL %url returned an Cloudflare challenge, you will need to do this by hand.', ['%url' => $url]));
return;
}
}

if ($result->getResponseCode() !== 200) {
$this->messenger()->addError($this->t('Entered URL %url returned an incorrect status and so was not entered.', ['%url' => $url]));
return;
}

$title = $result->getTitle();
if ($title === NULL || $title === '') {
// Set a default title as the URL for the occasional site that doesn't
// have a title.
$title = $url;
}

// Create the node.
$node = Node::create([
'type' => 'link',
'field_link_url' => $url,
'title' => $title,
'uid' => $this->currentUser->id(),
'status' => 1,
]);

$file = $this->crawler->writeResultsFileDataToDisk($result);

if ($file instanceof FileInterface) {
$node->set('field_link_image', [
'target_id' => $file->id(),
'alt' => $title,
]);
}

try {
$node->save();
}
catch (Exception $e) {
$this->messenger()->addError($this->t('Failed to create page.'));
return;
}

$params = ['%label' => $node->toLink()->toString(), '%url' => $url];
$this->messenger()->addStatus($this->t('Link %label created for URL %url.', $params));

$cacheTags = ['node_list', 'node_view'];
Cache::invalidateTags($cacheTags);
}

A problem I’ve been struggling with for a while now is managing my bookmarks. Every time I come across an interesting article I want to read, a good resource I want to keep, or a neat tool I want to try I create a bookmark.$result = $this->crawlerService->crawlUrl($url);

The best hook to use here is the hook_node_presave() hook, which will allow us to update the data in the Link entity before it is saved to the database. Since I’m using Drupal 11 I can use the new OOP hook system to add this hook.In this article I will go through how I set up the link directory, how links are added, and how the site is able to take screenshots of the links as they are added to the directory. Here is the code that runs all of the above steps. public function buildForm(array $form, FormStateInterface $form_state) {
$form['url'] = [
'#type' => 'textfield',
'#title' => $this->t('URL'),
'#required' => TRUE,
'#default_value' => $form_state->getValue('url'),
'#description' => $this->t('This form will strip out any query parameters and fragments from the URL.'),
];

$form['submit'] = [
'#type' => 'submit',
'#value' => $this->t('Submit'),
];

return $form;
}

<?php

namespace Drupallink_directoryHook;

use DrupalCoreHookAttributeHook;
use DrupalCoreMessengerMessengerInterface;
use DrupalCoreStringTranslationStringTranslationTrait;
use DrupalfileFileInterface;
use Drupallink_directoryEntityLink;
use Drupallink_directoryEntityLinkInterface;
use Drupallink_directoryServiceCrawlerServiceInterface;
use DrupalnodeNodeInterface;

/**
* Hooks for the link node.
*/
class LinkHooks {

use StringTranslationTrait;

public function __construct(
protected CrawlerServiceInterface $crawlerService,
protected MessengerInterface $messenger,
) {}

/**
* Implements hook_node_presave().
*
* Perform actions on the link as it is saved.
*/
#[Hook('node_presave')]
public function linkUpdate(NodeInterface $node):void {
if (!($node instanceof LinkInterface)) {
// Only act upon Link nodes.
return;
}

if ($node->isNew()) {
// Don't act upon new nodes.
return;
}

if ($node->get('field_link_url')->isEmpty()) {
// Don't run this process for empty links.
return;
}

$url = $node->get('field_link_url')->getValue();
$url = $url[0]['uri'];

$result = $this->crawlerService->crawlUrl($url);

if ($result === NULL) {
// Crawler failed, don't update the Link.
return;
}

if ($result->getResponseCode() === 403) {
// We can't update this as it's protected, so just ignore it.
return;
}

if ($result->getResponseCode() !== 200) {
// The site was down or something.
$this->messenger->addError($this->t('The URL %url returned an incorrect status and so was not updated.', ['%url' => $url]));
return;
}

if ($node->hasField('field_link_last_checked')) {
$node->set('field_link_last_checked', time());
}

$imageUpdateOff = (bool) $node->get('field_link_image_update')->getValue()[0]['value'];

if ($imageUpdateOff === TRUE) {
// Don't update the image.
return;
}

$file = $this->crawlerService->writeResultsFileDataToDisk($result);

if ($file instanceof FileInterface) {
// If we have a file then inject this into the page.
$node->set('field_link_image', [
'target_id' => $file->id(),
'alt' => $node->getTitle(),
]);
}
}

}

Here’s an example of a screenshot from the site.A screenshot of the Mechboards website, a popular keyboard webiste. It is showing the new Corne Min keyboard on the homepage. A black, two part, keyboard on a yellow background.Even if the browser is able to access the site and take a screenshot I have a secondary issue of cookie consent checks, newsletter popups, intrusive advertising and other annoying things that destroy the site screenshot. I therefore got to thinking how I could take screenshots of a web page in a different way and still get the effect I wanted.

Link Hook Event

Whilst I could easily go into the content creation dialog and create new content I wanted a way to simply create Links in the link directory by just throwing a URL at a form. This would then create the Link content item, take a screenshot of the site, and save everything to the database.The success rate of taking a screenshot of the site is on the low side. Although the Chromium browser does a good job of rendering the page and producing an image we can use, actually getting to that point has proven to be quite difficult. Many websites have multiple levels of protection to prevent automated requests, especially in this world of abusive AI crawlers and copyright theft, which is completely understandable. Attempting to take a screenshot of any page on Drupal.org will result in a error code from Fastly and the screenshot would just contain a “are you human” check if I allowed it to continue. GitHub will refuse to load stylesheets in this mode so any shots of GitHub repositories look quite broken.The process of scanning a page uses the following steps:I won’t post all the entire form class here, but the form built step here is pretty simple, we just need a field for the URL and a submit button.To store the links I created a content type called “Link” and added a few fields to it.$urlObj = new Url($url);
$crawler = new ChromeCrawler();
$browserFactory = new BrowserFactory('/path/to/the/chrome/binary');

$options = [
'windowSize' => [1280, 960],
'noSandbox' => TRUE,
'userDataDir' => sys_get_temp_dir(),
'startupTimeout' => '120',
'envVariables' => [
'XDG_CONFIG_HOME' => sys_get_temp_dir(),
'XDG_CACHE_HOME' => sys_get_temp_dir(),
],
'customFlags' => [
'--disable-crash-reporter',
'--no-crashpad',
],
];
$browserFactory->setOptions($options);

try {
$crawler->setEngine($browserFactory->createBrowser());
}
catch (RuntimeException $e) {
return NULL;
}

$crawler->setTakeScreenshot(TRUE);
$crawler->setScreenshotType(ChromeCrawler::SCREENSHOT_TYPE_DATA);

$result = $crawler->processUrl($urlObj);

Once we have ensured that the result is actually a valid page by checking the response code then we can generate the file entity. This is done using the file_system service to ensure that the correct directories exist for our file and the file.repository service to create the file entity from the image data we have.

Creating A Link Submit Form

If you’re wondering where the link directory is, then it is currently hidden from view for anonymous users. It’s currently a work in progress but I wanted to show some of the steps involved in creating it.Let’s look at tying this screenshot generation into the Link entity.Over time I have collected a large collection of bookmarks so when I add a new one to the list it gets lots in the pile. I’ve tried to create directories to keep “new” bookmarks or organise them into sections, but I always end up scrabbling to find them.The first avenue to injecting this code into the Link entity is to use a hook.Note that we have set a entity bundle class for the Link elsewhere so all we need to do is make sure that the incoming entity is of the correct type before continuing.

Conclusion

Over the past couple of years I have been plugging away at a PHP sitemap checker package that uses a couple of different engines to grab links from a sitemap.xml file and check them for validity. I use this quite a bit for checking for the sitemap.xml files and scanning sites for broken links.With this in place the data and screenshot of the Link are updated when the page is saved (assuming that the Link is configured correctly). I wanted to make it easy for me to add a link to the site, so I decided to build a form to do this.A preview of the link directory created on hashbangcode. This shows 8 links in a 4x2 grid.I’ve written before about converting base64 encoded data into file entities in Drupal in a previous article. This approach follows some of the same steps, but I get the data from a different location.When the form is submitted we need to use the crawler service to fetch data from the URL entered and then create a Link object. If we have screenshot data then we update the Link with that information and save it to the database. The final step is to invalidate any caches for node lists so that the Link will appear as an item in our link directory.This package has the ability to use a headless Chrome browser (via the Chrome PHP package) to scan and (optionally) take a screenshot of the page in question. I adapted this system to scan and take a screenshot of a single page, using Chromium, rather than Chrome. Chromium is the package that is the basis of Chrome, and so has less tracking and other features than Chrome does. This makes Chromium ideal for this purpose.  Here is the code that creates the file from base64 encoded data.

Similar Posts