Configure indexed search and Crawler easily in typo3

English

Services
- Digitalization consulting We offer advice on digital projects, providing solutions for business processes, technology, and implementation.
- UI / UX Design Our in-house experts design applications with beautiful UI/UX, attributing to the key success of your software projects.
- Development We excel in developing scalable software solutions with high quality standards.
- Testing/QA Success of your project hinges on its quality, and our team of quality engineers excels in both manual and automated testing.
- IT Services To ensure your software runs seamlessly 24/7, our technical team monitors and maintains a reliable infrastructure.
Solutions
- E-Commerce We build modern ecommerce shops that offer customers convenience and reliability.
- Mobile Apps Our team is specialized in Native and Hybrid app developments for your business needs.
- Business Applications We develop mission critical business applications for a wide variety of verticals.
- AI & Data Science We offer AI and Data Science services to unlock insights and drive innovation in diverse industries by harnessing the full potential of the data.
- Websites We handcraft amazing websites resulting in a solid online presence that boost your brand value.
Your Remote Team
Technology
References
Company
- About PITS A snapshot about PITS, our history, values and team is always interesting to know.
- Initiatives Our digital initiatives span developing our own products, projects, and investing in startups over the years.
- Jobs PITS is a long-standing great place to work. Explore our openings and join us.
- Contact Connect with our nearby office for a face-to-face discussion about your projects over a coffee
Insights
- Case Studies Let our collection of insightful case studies guide you on the path to success.
- White paper PITS Whitepapers are carefully prepared for developers as well as for customers on specific topics.
- Newsroom Welcome to our newsroom, where we share the latest updates, important company news, event highlights, engaging videos, and more.
- Blog Our blog regularly provides you with current and exciting articles on a wide variety of topics from the online world.

The system extension Indexed Search is the engine which actually indexes content and provides a frontend plugin to let you search for content and show the results. The index search engine provides two major elements to TYPO3:

1. Indexing: An indexing engine which indexes TYPO3 pages on-the-fly as they are rendered by TYPO3’s frontend. Indexing a page means that all words from the page (or specifically defined areas on the page) are registered, counted, weighted and finally inserted into a database table of words. Then another table will be filled with relation records between the word table and the page.
2. Searching: A plugin you can insert on your website which allows website users to search for information on your website. By searching the plugin first looks in the word-table if the word exist and if it does all pages which has a relation to that word will be considered for the search result display. The search results are ordered based on factors like where on the page the word was found or the frequency of the word on the page.

This article will give you step by step instruction on how to install and configure those extensions to help efficiently index your typo3 content.

Configuring Server
If you want to index external documents referenced on your Web pages in addition to standard text elements, you will have to make sure you have properly installed a few third party binaries:

catdoc for Microsoft Word documents (Will not support docx files)
xlhtml for Microsoft Excel spreadsheets (Will not support xlsx files)
ppthtml for Microsoft Powerpoint presentations (Will not support pptx files)
pdftotext and pdfinfo for PDF files
unzip for OpenOffice documents
unrtf for RTF

Configuring indexed search and Crawler
Login to typo3 backend and then Admin Tools > Extension Manager and find the extension Indexed Search Engine. Go to extensions configuration section.
Make sure Paths to PDF parsers, unzip, WORD parser, EXCEL parser, POWERPOINT parser and RTF parser all contain/usr/bin/

Indexed search configuration typo3

Make sure indexing of content is not performed automatically when showing a page in frontend and let use crawler to index external files.

Indexed search

Indexed Search

Crawler requires a backend user _cli_crawler. Go to SYSTEM > Backend Users and create this backend user with a random password. This user must not be an administrator and should not be part of any backend user group.

Backend user

Typoscript Setup

Open your typoscript template and add following lines.

config.index_enable = 1
config.index_externals = 1

How Crawler works ?

The crawler performs mainly two jobs,
1. Generate URLs of pages to be processed (with any GET parameter required, e.g., “L” for language or “tx_ttnews[tt_news]” to show the details of a tt_news record) and enqueue them for processing by the other job;
2. Process the queue of URLs and take the appropriate action (in our case invoke Indexed Search to index the page or the document).
When generating URLs, the crawler will automatically be able to crawl your website and enqueue the different pages (with /index.php?uid=…). But if your site is multilingual, you will have to tell it to generate variations for each and every page (with /index.php?uid=…&L=0 and/index.php?uid=…&L=1 for instance).
When a link to a document is encountered while indexing the content of a page, Indexing Search will not index it right away but instead will add it to the queue of pages and documents to be indexed (because option “Use crawler extension to index external files” was ticked in Indexed Search configuration).

Crawler Configuration.
We can check a basic crawler configuration which allows the whole page tree to be indexed.
Step 1: Goto Web > List
Step 2: Select Root Page of your site
Step 3: Create a new record of type “Crawler Configuration” Which is under the section “Site Crawler”

Crawler Configuration

Now we can use this configuration to index our website. Configure the scheduler to run different crawler tasks.

Typo3 scheduler

Adding Search Plugin To a Page

Select the page in which you want to integrate the search option. Create new content elemnt, under the tab ‘Plugin’ select ‘indexed serach’.

Indexing News Articles

Suppose we have latest news list section which contains news teaser and link to detail page. The details page contains att_news plugin whose output mode is SINGLE. As such, this plugin expects a GET parameter in the URL:
&tx_ttnews[tt_news]= (id)
Our test configuration:

Sysfolder [uid #19] is our tt_news storage folder
Page [uid #35] contains a tt_news plugin for SINGLE view

We want crawler to dynamically generate a list of URLs with the additional tx_ttnews[tt_news] parameter when it crawls page #35.

Crawler Configuration
We are creating this configuration for the subtree of page #35.
Step1: Go To Typo3 backend Web > List
Step2: Click on page #35
Step 3: Create a new record of type “Crawler Configuration” Which is under the section “Site Crawler”

Crawler configuration

The _TABLE field in configuration defines the look up table (tt_new here). And _PID defines the news storage folder id (#19 here).
While creating crawler configuration tick “Append cHash” otherwise you will end up having N times the first news being indexed due to TYPO3 caching mechanism.

Crawler Configuration

You must be logged in to post a comment.

Configure indexed search and Crawler easily in typo3

Leave a Reply