# ERROR CONTROL <br> WITH BINARY CYCLIC CODES 

A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences

By<br>Martin-Thomas Grymel<br>School of Computer Science

## Contents

Abstract ..... 13
Declaration ..... 15
Copyright ..... 17
Acknowledgements ..... 19
I Introduction ..... 21
1 Introduction ..... 23
1.1 Research Objectives ..... 24
1.2 Contribution ..... 25
1.3 Thesis Organisation ..... 26
II Background ..... 29
2 Error Control Coding ..... 31
2.1 Linear Block Codes ..... 33
2.1.1 Generator Matrix ..... 33
2.1.2 Parity-check Matrix ..... 35
2.1.3 Error Syndrome ..... 35
2.1.4 Hamming Distance ..... 37
2.1.5 Error Detection and Correction Capabilities ..... 37
2.2 Hamming Codes ..... 38
2.3 Cyclic Codes ..... 38
2.3.1 Encoding ..... 39
2.3.2 Decoding ..... 40
2.3.3 Error Detection and Correction Capabilities ..... 43
2.3.4 Generator Selection Considerations ..... 43
2.4 BCH Codes ..... 44
2.4.1 Decoding ..... 45
2.5 Conclusion ..... 46
3 Discrete Logarithms ..... 49
3.1 Shanks' Algorithm ..... 51
3.2 Pollard's Rho Algorithm ..... 52
3.3 Silver-Pohlig-Hellman Algorithm ..... 54
3.4 Index-Calculus Algorithm ..... 56
3.5 Conclusion ..... 60
4 SpiNNaker ..... 63
4.1 Architecture ..... 64
4.2 System ..... 67
4.3 Memory ..... 68
4.4 CRC Unit ..... 69
4.5 Conclusion ..... 71
III Contributions ..... 73
5 Programmable CRC Hardware ..... 75
5.1 From Serial to Parallel ..... 76
5.2 From Static to Programmable ..... 79
5.2.1 Proposed Method ..... 81
5.3 From Theory to Silicon ..... 83
5.4 Comparison ..... 85
5.5 Conclusion ..... 87
6 Logarithms: A Generic Algorithm ..... 89
6.1 Computing Discrete Logarithms ..... 90
6.2 Improvement ..... 95
6.3 Properties ..... 96
6.4 Comparison ..... 99
6.5 Conclusion ..... 99
7 Logarithms: Reduce-Map Algorithm ..... 101
7.1 Preliminaries ..... 101
7.2 Shift Register Sequences ..... 102
7.2.1 Sequence Numbering ..... 102
7.2.2 $T$ Transformation ..... 103
7.2.3 Parity Vector Spaces ..... 103
7.2.4 Blocks ..... 106
7.2.5 $M$ Transformation ..... 108
7.2.6 Base Transformations ..... 113
7.3 Additional Properties ..... 116
7.3.1 Blocks ..... 116
7.3.2 $L$ Transformation ..... 117
7.3.3 $Q$ Transformation ..... 122
7.3.4 Parity Subspaces ..... 128
7.4 Computing Discrete Logarithms ..... 131
7.4.1 Initialisation ..... 132
7.4.2 Main Computation ..... 133
7.4.3 Implementation Details ..... 136
7.4.4 Example ..... 137
7.5 Evaluation ..... 138
7.6 Conclusion ..... 141
IV Conclusion ..... 143
8 Conclusion ..... 145
8.1 Programmable CRC ..... 145
8.1.1 Future Work in Programmable CRC ..... 146
8.2 Error Correction ..... 146
8.2.1 Future Work in Error Correction ..... 147
8.3 Summary ..... 149
A Mapping Configurations ..... 151Bibliography171

Word Count: 39638

## List of Tables

2.1 A $(7,4)$ systematic linear block code. ..... 34
2.2 Single-bit error syndromes. ..... 36
5.1 Programmable CRC circuit implementation comparison ..... 85
6.1 Sequences of the form $k 2^{j}$ modulo $\left(M_{4} / \operatorname{gcd}\left(M_{4}, k\right)\right)$. ..... 93
6.2 Sequences of the form $\left(X^{k}\right)^{2^{j}}$ modulo $X^{4}+X^{3}+1$. ..... 98
7.1 Example of a parity set decomposition ..... 105
7.2 $L$ transformation. ..... 120
7.3 Reduction of the nonzero elements of a finite field. ..... 122
7.4 $Q$ transformation. ..... 124
7.5 $E$ transformation. ..... 125
7.6 Number of clusters obtained using the $Q$ and $E$ transformations. ..... 126
7.7 Reduced nonzero field elements of $\mathbb{Z}_{2}[X] /\left\langle X^{6}+X+1\right\rangle$. ..... 138
7.8 Worst-case number of mappings steps. ..... 140

## List of Figures

2.1 Forward error correction in computer, transmission or storage systems ..... 32
2.2 Binary symmetric channel with transition error probability $p$ ..... 33
2.3 Code word in systematic form. ..... 33
2.4 Minimum code distance. ..... 37
2.5 LFSR for $p(X)=X^{3}+X+1$. ..... 39
2.6 LFSR for $p(X)=X^{3}+X+1$ with automatic premultiplication by $X^{3}$. ..... 40
4.1 SpiNNaker MPSoC with stacked SDRAM ..... 65
4.2 SpiNNaker Die. ..... 65
4.3 SpiNNaker chip schematic. ..... 66
4.4 Processor subsystem schematic. ..... 67
4.5 SpiNNaker system. ..... 68
5.1 Programmable parallel CRC circuit ..... 79
5.2 Proposed programmable parallel CRC circuit. ..... 82
5.3 Fully routed layout of the proposed programmable parallel CRC circuit. ..... 84
7.1 Calculation of all level-two sub-parities for a vector. ..... 104
7.2 $M$ transformation ..... 110
7.3 Pascal's triangle modulo two ..... 110
7.4 $M^{-1}$ transformation. ..... 112
7.5 $Q$ transformation ..... 124
7.6 Clustering using the $Q$ transformation. ..... 126
7.7 Clustering using the $Q$ and $E$ transformations. ..... 127
7.8 Sample mapping configuration. ..... 139

## List of Algorithms

3.1 Shanks' precomputation. ..... 51
3.2 Shanks' algorithm. ..... 52
3.3 Pollard's rho algorithm for discrete logarithms. ..... 54
3.4 Silver-Pohlig-Hellman algorithm. ..... 56
6.1 Discrete logarithm computation based on cyclotomic cosets. ..... 94
7.1 Initialisation pseudocode. ..... 132
7.2 Main function pseudocode. ..... 133
7.3 Reduction function variant 0 pseudocode. ..... 134
7.4 Reduction function variant 1 pseudocode. ..... 135
7.5 Mapping function pseudocode. ..... 136

## Abstract

Error-control codes provide a mechanism to increase the reliability of digital data being processed, transmitted, or stored under noisy conditions. Cyclic codes constitute an important class of error-control code, offering powerful error detection and correction capabilities. They can easily be generated and verified in hardware, which makes them particularly well suited to the practical use as error detecting codes.

A cyclic code is based on a generator polynomial which determines its properties including the specific error detection strength. The optimal choice of polynomial depends on many factors that may be influenced by the underlying application. It is therefore advantageous to employ programmable cyclic code hardware that allows a flexible choice of polynomial to be applied to different requirements. A novel method is presented in this thesis to realise programmable cyclic code circuits that are fast, energy-efficient and minimise implementation resources.

It can be shown that the correction of a single-bit error on the basis of a cyclic code is equivalent to the solution of an instance of the discrete logarithm problem. A new approach is proposed for computing discrete logarithms; this leads to a generic deterministic algorithm for analysed group orders that equal Mersenne numbers with an exponent of a power of two. The algorithm exhibits a worst-case runtime in the order of the square root of the group order and constant space requirements.

This thesis establishes new relationships for finite fields that are represented as the polynomial ring over the binary field modulo a primitive polynomial. With a subset of these properties, a novel approach is developed for the solution of the discrete logarithm in the multiplicative groups of these fields. This leads to a deterministic algorithm for small group orders that has linear space and linearithmic time requirements in the degree of defining polynomial, enabling an efficient correction of single-bit errors based on the corresponding cyclic codes.

## Declaration

No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

## Copyright

i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the "Copyright") and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes.
ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made.
iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual property (the "Intellectual Property") and any reproductions of copyright works in the thesis, for example graphs and tables ("Reproductions"), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions.
iv. Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/ DocuInfo.aspx?DocID=487), in any relevant Thesis restriction declarations deposited in the University Library, The University Library's regulations (see http://www.manchester.ac.uk/library/aboutus/regulations) and in The University's policy on presentation of Theses.

## Acknowledgements

First of all, I would like to express my sincere gratitude to Prof. Stephen B. Furber for offering me the opportunity to join his research group in Manchester and for supervising my work. His confidence led me to explore daunting territory, which at the beginning seemed to be a hopeless endeavour. I'm thankful for the inspiration and guidance that I have received throughout my research project.

I would like to acknowledge Dr. Jim D. Garside for initially suggesting I investigate a CRC circuit for SpiNNaker and for his valuable advice on my hardware queries.

Special thanks go to Dr. Chiara Del Vescovo for sharing her invaluable insights on number theory and abstract algebra, and for being a very good friend.

Throughout the years, I have received great support from many members of the APT group and I am in particular very grateful to Dr. John V. Woods, Dr. Luis A. Plana, Dr. Steve Temple, Dr. David R. Lester, Dr. Simon Davidson, Dr. Barry Cheetham, Dr. Mikel Luján, and Jeff Pepper.

I would like to thank my brave proof-readers Dr. John V. Woods, Dr. Jim D. Garside, Dr. David R. Lester, Dr. Cameron Patterson, and Francesco Galluppi for their valuable input.

I greatly appreciate the help from the High Throughput Computing facility team in the Faculty of Engineering and Physical Sciences, in particular for permitting me to consume huge quantities of processor cycles for my computations.

I acknowledge the support of an Engineering and Physical Sciences Research Council (EPSRC) studentship.

Many thanks to the friends I've made in Manchester; all the fantastic pub and pizza nights will remain unforgotten.

I am very thankful to my family for their constant encouragement and support in all my endeavours.

Finally, I would like to thank Jelo for sweetening and spicing up my life, whilst keeping me sane(ish) during my work towards completing this thesis.

## Part I

## Introduction

## Chapter 1

## Introduction

The preservation of digital data integrity is of major concern for computer, communication, and storage systems. In all these applications digital data is susceptible to unintentional modification which may arise from electrical or magnetic disturbance, component failure, or the result of system design error. Depending on the specific application, data failures may result in severe consequences and thus their potential occurrence needs to be considered carefully in the underlying design of the system.

Data reliability can be enhanced through the employment of error-control codes [LC83; PW72; RF89], which provide mechanisms for the detection and correction of errors. These codes enrich the data with redundancy by forming code words, which initially can be used to detect inconsistencies in the received data. If the affected data can be retransmitted or recalculated, one simple error correction scheme is the Automatic Repeat Query (ARQ), where the receiver simply requests a retransmission once it detects data inconsistencies. However, where retransmission or recalculation is not feasible, being too slow or uneconomic, Forward Error Correction (FEC) techniques can correct errors on the basis of the corrupted received data and its inherent redundancy.

Cyclic codes form an important class of error-control code offering powerful error detection and correction capabilities. At the same time, their algebraic properties permit the use of simplified processing procedures when compared to non-cyclic codes. For instance, the encoding of data into cyclic code words can easily be achieved in hardware using a simple linear feedback shift register. Likewise, the same circuit can be used to validate code words, and thus detect errors. For these reasons, cyclic codes have been widely adopted as error-detecting codes and commonly deployed in combination with ARQ schemes. This thesis concentrates on cyclic codes with
symbols from the binary field.
If error correction is desired, the most basic case concerns the localisation of a single erroneous bit. It can be shown that for cyclic codes this problem is equivalent to the computation of the discrete logarithm in finite cyclic groups, for which it is widely believed that for the general case no efficient classical algorithm is devisable. This is one reason why the discrete logarithm forms the basis for many cryptographic applications such as the Diffie-Hellman-Merkle key exchange [DH76]. However, it has not been proven that the computation of the discrete logarithm is hard for all groups of practical interest [Odl85; Odl00; McC90; MVO96; Mos96; Buc01; Sti02; Gal12].

Cyclic codes are characterised by an underlying generator polynomial. Different generator polynomials exhibit different capabilities in regard to the detection and correction of errors [TW11; Koo02; KC04]. Certain polynomials, for example, may be particularly well suited to the practical realisation of the error correction process. The selection of polynomial is also dependent on the length of the data that is to be protected and the anticipated error patterns. In systems where diverse applications may favour different cyclic codes, and where the requirements may change over time or be unknown at the design stage, it may be advantageous to provide full flexibility in regard to the usable cyclic code generator polynomials. Efficient cyclic code processing relies on dedicated cyclic code hardware circuits, which may be programmable if the underlying generator polynomial is adaptable. Such a programmable circuit has been created for the SpiNNaker project [FB09; Fur12], a massively parallel spiking neural network simulator.

### 1.1 Research Objectives

Cyclic codes comprise a powerful class of error-control code; they have gained wide popularity in the field of error detection owing to their efficient hardware implementations and simultaneous effective error detection. Programmable cyclic code circuits have the benefit of flexible adaption of the underlying cyclic code to meet the requirements of a specific application. One goal of this research is to provide a method for the efficient realisation of parallel programmable cyclic code circuits, in hardware, to make them appealing for a wider range of applications.

The current predominating drawback of cyclic codes is the lack of efficient error correction techniques for them. To perform the correction of a single-bit error an instance of the generalised discrete logarithm problem needs to be solved. It is an
additional goal of this research to explore new methods in the quest for an efficient mechanism for computing discrete logarithms in relevant finite cyclic groups and, therefore, facilitate the efficient recovery from single-bit errors through cyclic codes.

### 1.2 Contribution

The contributions of this thesis are summarised as follows:

- A new scheme is proposed enabling the efficient calculation of the state transition and control matrix for the parallel operation of a cyclic code circuit in hardware. This circuit is used both for the data encoding step and the decoder error detection phase. With the incorporation of a programmable transition and control matrix, this adaptable circuit can be configured to use different cyclic codes. This added flexibility is a valuable enhancement, as the error detection and error correction performance of a particular cyclic code depends on parameters including the length of the data that is to be protected and the anticipated error patterns of an application. A simulation of the novel programmable cyclic code circuit produced shows significant improvements in terms of speed, area and energy efficiency when compared with previously published designs. The design of the new circuit has been published [GF11].
- A new approach is proposed for the computation of discrete logarithms; this leads to a generic deterministic algorithm for analysed group orders that equal a Mersenne number where the exponent is a power of two. It is shown that, for these groups, the worst-case running time is proportional to the square root of the group order, while the space requirements are constant. The scheme is further improved for particular cases where the discrete logarithm values occur with different probabilities, leading to reduced average and worst-case execution times. Furthermore, properties are derived that apply to the sequences that are used by the algorithm.
- A set of new relationships is developed for the field elements of the ring of polynomials over the binary field modulo a primitive polynomial. Based on a subset of these properties, a novel approach is proposed for the computation of discrete logarithms in the cyclic multiplicative group of the finite field. For at least all primitive polynomials up to degree 12 and the first evaluated primitive polynomials of degree 13 and 14, a deterministic algorithm with linear space and linearithmic time requirements in the degree of the polynomial results.


### 1.3 Thesis Organisation

The remainder of this thesis is structured as follows:

## Part II: Background

## Chapter 2

This chapter reviews the fundamental principles behind error-control codes and establishes important related terminology. Particular focus is directed towards linear block codes and their subclass of cyclic codes. The error detection and correction capabilities of cyclic codes are described, and it is shown how simple but inefficient error correction mechanisms are realised. Established error correction techniques such as the Meggitt decoder, error-trapping decoding, or the BCH code decoder are briefly reviewed, and their inefficiency concerning the correction of a single-bit error is highlighted. Moreover, it is shown which factors influence the selection of the optimal generator polynomial for a cyclic code.

## Chapter 3

In this chapter, the discrete logarithm problem is defined. Information concerning the difficulty of the problem and properties of the underlying finite cyclic groups are presented. Important cryptographic applications that base their operating principle on the presumed difficulty of the discrete logarithm problem in certain groups are specified. Today's most important algorithms for computing discrete logarithms are presented detailing their execution overheads.

## Chapter 4

This chapter presents an overview of the SpiNNaker project which targets the largescale simulation of spiking neural networks. The SpiNNaker architecture facilitates a massively-parallel supercomputer with a million processors, which supports these neural simulations. The issue of anticipated memory faults in a SpiNNaker system of this scale is highlighted, and the usage of cyclic codes as a layer of protection against many of these errors is explained.

## Part III: Contributions

## Chapter 5

In this chapter, the equations for a parallel realisation of a cyclic code circuit are derived. It is then shown how a programmable version of such a circuit can be created which can be configured to use different generator polynomials. Furthermore, a new scheme is presented that allows the efficient computation of the state transition and control matrix necessary for the circuit. With this scheme a novel programmable parallel cyclic code circuit is proposed that is then compared to previous work. This chapter is based on a journal publication [GF11].

## Chapter 6

This chapter describes a new generic approach for the computation of the discrete logarithm. Based on this approach an algorithm is presented for group sizes that equal a Mersenne number with an exponent of a power of two. It is shown how the scheme can be improved if the discrete logarithm values occur with unequal probabilities. Properties are derived for the sequences that are used by the algorithm and finally, the proposed scheme is compared with an established method for the evaluation of discrete logarithms.

## Chapter 7

In this chapter, a new set of properties is derived for the elements of a finite field with binary characteristic, where the field is represented as a polynomial ring over the binary field modulo a primitive polynomial. For the multiplicative group of the finite field, a novel approach is presented to compute discrete logarithms based on a subset of the newly established field properties. Performance results are reported for the resulting algorithm for evaluated small group sizes.

## Part IV: Conclusion

## Chapter 8

This chapter draws conclusions and suggests future directions of research.

## Part II

## Background

## Chapter 2

## Error Control Coding

Digital data in computer, communication and storage systems is inevitably subjected to noise and may thus undergo undesired alterations, which may cause entire systems to fail. To lower the probability of such scenarios, it is possible to employ error-control codes that can increase the reliability of data [LC83; PW72; RF89]. For this purpose, the data is sent, in the first place, through an encoding process that will add redundancy to it, before it is processed, transported or stored later on, and thereby exposed to noise. Once the encoded and possibly corrupted data is received by a consumer, the decoding process will attempt to recover the original data. There are two different, but combinable approaches to the decoding process.

On the one hand, the decoder can perform a simple error detection and request a retransmission of the data if inconsistencies are detected; this is known as Automatic Repeat Query (ARQ). This is, however, infeasible if a feedback channel for a retransmission request is not available, or if the data is retrieved from a memory, where it is already stored erroneously. If an ARQ scheme is implementable, it may still be inefficient from a speed or energy point of view, and therefore impractical.

On the other hand, it is possible to employ Forward Error Correction (FEC) techniques that can not only detect, but also correct, errors in the decoder, on the basis of the corrupted data and redundancy only. Error correction proves to be typically more complex in its realisation than pure error detection, but has the significant advantage of managing to recover from certain faults without the need for retransmission.

The setup of an FEC system is shown in Figure 2.1. It is assumed that the source outputs a data message $u$, which is, in the next step, translated by the encoder into a code word $\bar{u}$, which carries the same information as $u$ but in a redundant form


Figure 2.1: Forward error correction in computer, transmission or storage systems.
according to the employed error correction code. The code word $\bar{u}$ is then passed on to a processor, channel or memory, where noise may cause undesired alterations, modelled as $e$. In this scenario, the processor constitutes a special case as it is assumed to produce an output possibly different from $\bar{u}$ as part of its functionality, which may then, however, differ from the expected output in consequence of the noise exposure. However, the processor case is not further considered here. Instead, it is assumed that $\bar{u}$ is passed on to a communication channel or a storage medium. In both of these cases, it is expected that after a successful transmission or retrieval of the data, without any noise interference, the decoder will receive $\bar{v}$, which will equal $\bar{u}$. If errors occur, it follows that $e \neq 0$ and $\bar{v}=\bar{u}+e$. It is the task of the decoder to translate the received and possibly corrupted code word $\bar{v}$ into the most likely message $v$. In the ideal case no decoding error is made and it follows $v=u$.

From an information theoretic point of view, the only aspect that can be influenced in the described forward error correction system concerns the encoding and decoding procedure. It was Shannon who published groundbreaking work in this field in 1948 [Sha48a; Sha48b]. He postulated that by choosing an appropriate encoding and decoding strategy, it is possible to reduce the decoding error probability to an arbitrarily small value, as long as the information rate, i.e. the ratio of non-redundant bits to the total number of bits per code word, stays below a certain threshold-the capacity-that is specific to each channel or memory.

The model that is assumed for the transmission or memory channel is called the Binary Symmetric Channel (BSC) and is depicted in Figure 2.2. It has an error probability parameter $p \leq 0.5$, which indicates the probability of an erroneously received symbol.

In what follows, the considered codes are assumed to have symbols from the binary field $G F(2)$, although generalisations to non-binary alphabets can be made. The symbols are represented as 0 and 1 , where the addition and multiplication is performed in modulo-2 arithmetic. There are two different types of codes: block and


Figure 2.2: Binary symmetric channel with transition error probability $p$.


Figure 2.3: Code word in systematic form.
convolutional codes; both encode $k$-bit input messages into $n$-bit code words. The difference is that, for convolutional codes, the code word is not only made dependent on the current input message, but also on previous messages, by employing memory in the encoder. Subsequently, however, only block codes are considered.

A code in systematic form has the property that the message bits appear as one section of the code word and the bits for the redundancy as a second section as visualised in Figure 2.3. The number of redundancy bits is denoted by $m$ such that $n=k+m$.

### 2.1 Linear Block Codes

An ( $n, k$ ) linear block code is a block code that transforms $k$-bit message vectors into $n$-bit code word vectors with the additional property that all the $2^{k}$ different code vectors form a $k$-dimensional subspace of the $n$-dimensional vector space of all vectors of size $n$ over the binary field. It follows from this definition that the linear combination of code vectors results in a code vector of the same code. An example of a $(7,4)$ linear block code in systematic form is shown in Table 2.1.

### 2.1.1 Generator Matrix

Due to the fact that an $(n, k)$ linear block code forms a $k$-dimensional vector space, it is possible to generate every code vector through a linear combination of $k$ code basis

Table 2.1: A (7, 4) systematic linear block code.

| Message | Code word |
| :---: | :---: |
| $[0,0,0,0]$ | $[0,0,0,0,0,0,0]$ |
| $[0,0,0,1]$ | $[0,0,0,1,0,1,1]$ |
| $[0,0,1,0]$ | $[0,0,1,0,1,1,0]$ |
| $[0,0,1,1]$ | $[0,0,1,1,1,0,1]$ |
| $[0,1,0,0]$ | $[0,1,0,0,1,1,1]$ |
| $[0,1,0,1]$ | $[0,1,0,1,1,0,0]$ |
| $[0,1,1,0]$ | $[0,1,1,0,0,0,1]$ |
| $[0,1,1,1]$ | $[0,1,1,1,0,1,0]$ |
| $[1,0,0,0]$ | $[1,0,0,0,1,0,1]$ |
| $[1,0,0,1]$ | $[1,0,0,1,1,1,0]$ |
| $[1,0,1,0]$ | $[1,0,1,0,0,1,1]$ |
| $[1,0,1,1]$ | $[1,0,1,1,0,0,0]$ |
| $[1,1,0,0]$ | $[1,1,0,0,0,1,0]$ |
| $[1,1,0,1]$ | $[1,1,0,1,0,0,1]$ |
| $[1,1,1,0]$ | $[1,1,1,0,1,0,0]$ |
| $[1,1,1,1]$ | $[1,1,1,1,1,1,1]$ |

vectors $g_{0}$ to $g_{k-1}$, which are aggregated as row vectors in the generator matrix $G$. A $k$-bit message vector $u=\left[u_{k-1}, \ldots, u_{0}\right]$ can now be encoded through multiplication with $G$ as

$$
\bar{u}=u G=u\left[\begin{array}{c}
g_{k-1} \\
\vdots \\
g_{0}
\end{array}\right] .
$$

For the $(7,4)$ linear block code in Table 2.1, the generator matrix constitutes

$$
G=\left[\begin{array}{lllllll}
1 & 0 & 0 & 0 & 1 & 0 & 1 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 1 & 0 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right] .
$$

The identity matrix in the left part of the generator matrix indicates the systematic encoding property of this particular code. A message $u=[1,0,1,0]$ can now be
encoded as

$$
\bar{u}=u G=[1,0,1,0]\left[\begin{array}{lllllll}
1 & 0 & 0 & 0 & 1 & 0 & 1 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 1 & 0 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right]=[\underbrace{1,0,1,0}_{u}, \underbrace{0,1,1}_{\text {redundancy }}] .
$$

### 2.1.2 Parity-check Matrix

For an $(n, k)$ linear block code $C$ with generator matrix $G$, it is possible to construct the dual code $C^{\perp}$. This code is an $(n, n-k)$ linear block code and it is the null space of $G$. The corresponding generator matrix $H$ of $C^{\perp}$ is the parity-check matrix of $C$. It follows that

$$
G H^{T}=0 .
$$

Furthermore, for every code vector $\bar{u}$ of $C$,

$$
\begin{equation*}
\bar{u} H^{T}=0 . \tag{2.1}
\end{equation*}
$$

The parity-check matrix $H$ is thus also a different way of describing the code $C$. For the $(7,4)$ linear block code in Table 2.1, the parity-check matrix takes thereby the following shape

$$
H=\left[\begin{array}{lllllll}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
0 & 1 & 1 & 1 & 0 & 1 & 0 \\
1 & 1 & 0 & 1 & 0 & 0 & 1
\end{array}\right] .
$$

It can easily be verified that $\bar{u}=[1,0,1,0,0,1,1]$ is a code vector of $C$ since

$$
\bar{u} H^{T}=[1,0,1,0,0,1,1]\left[\begin{array}{lllllll}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
0 & 1 & 1 & 1 & 0 & 1 & 0 \\
1 & 1 & 0 & 1 & 0 & 0 & 1
\end{array}\right]^{T}=0
$$

### 2.1.3 Error Syndrome

The parity-check matrix $H$ of a linear block code not only provides a method to validate code vectors, but also helps in the correction of errors that may have occurred. It is assumed that the code vector $\bar{u}$ gets corrupted with the error vector $e$ such that

$$
\begin{equation*}
\bar{v}=\bar{u}+e \tag{2.2}
\end{equation*}
$$

Table 2.2: Single-bit error syndromes.

| Error pattern | Error syndrome |
| :---: | :---: |
| $[1,0,0,0,0,0,0]$ | $[1,0,1]$ |
| $[0,1,0,0,0,0,0]$ | $[1,1,1]$ |
| $[0,0,1,0,0,0,0]$ | $[0,1,1]$ |
| $[0,0,0,1,0,0,0]$ | $[1,1,0]$ |
| $[0,0,0,0,1,0,0]$ | $[0,0,1]$ |
| $[0,0,0,0,0,1,0]$ | $[0,1,0]$ |
| $[0,0,0,0,0,0,1]$ | $[1,0,0]$ |

is received. The error syndrome of the vector $\bar{v}$ is now defined as

$$
s=\bar{v} H^{T} .
$$

With (2.1) and (2.2) it follows further that

$$
s=\bar{v} H^{T}=(\bar{u}+e) H^{T}=e H^{T} .
$$

The error syndrome $s$ is thus entirely determined by the error vector $e$. If no error has occurred, $e=0$, and therefore also the error syndrome $s=0$. In the case that the error vector coincides with a code vector, the syndrome will also indicate no error. However, in any other case, the syndrome will be different from zero, indicating the occurrence of one or several errors.

Once a nonzero syndrome has been computed, an error correction can be performed if the syndrome can be associated with a most likely error pattern. This can be achieved with a simple but inefficient table-lookup mechanism. For the code in Table 2.1, the error syndromes for all the different single-bit error patterns are given in Table 2.2. It can be seen that the syndromes are all different, which means that every single-bit error can be corrected. Error patterns with more than one bit error are either falsely classified as single-bit errors or go undetected if they correspond to code vectors. The considered code is thus only able to correct all possible single-bit errors.


Figure 2.4: Minimum code distance.

### 2.1.4 Hamming Distance

For a block code $C$ and code vectors $\bar{u}_{0}, \bar{u}_{1} \in C$, the Hamming distance of $\bar{u}_{0}$ and $\bar{u}_{1}$, denoted by $d\left(\bar{u}_{0}, \bar{u}_{1}\right)$, corresponds to the number of components in which $\bar{u}_{0}$ and $\bar{u}_{1}$ differ. The Hamming weight of $\bar{u}$, indicated by $w(\bar{u})$, equals the Hamming distance between $\bar{u}$ and the zero code vector, such that $w(\bar{u})=d(\bar{u}, 0)$. Another important measure is the minimum distance of $C$, which is defined as the smallest Hamming distance that is achievable between any two distinct code vectors of $C$ and it is denoted by

$$
d_{\min }=\min _{\bar{u}_{0}, \bar{u}_{1} \in C}\left\{d\left(\bar{u}_{0}, \bar{u}_{1}\right) \mid \bar{u}_{0} \neq \bar{u}_{1}\right\} .
$$

A special case arises if $C$ is a linear block code, since the minimum distance of $C$ can then be indicated as the minimum weight over all nonzero code vectors of the code. This can be, for instance, attributed to the fact that for a linear block code the linear combination of code vectors results again in a code vector.

### 2.1.5 Error Detection and Correction Capabilities

The minimum distance $d_{\text {min }}$ of a block code gives an indication of its error detection and correction properties as stated in the following theorem [RF89]:

Theorem 2.1. For a block code with minimum distance $d_{\text {min }}$, it is guaranteed that up to $t$ errors can be corrected, while at the same time up to $d$ errors can be detected, as long as $t+d+1 \leq d_{\text {min }}$ and $t \leq d$.

As an example, it is assumed that the minimum code distance accounts for $d_{\min }=4$, such that the Hamming distance between any two distinct code vectors $\bar{u}_{0}$ and $\bar{u}_{1}$ is at least $d_{\min }$ as illustrated in Figure 2.4. With Theorem 2.1, two different decoder choices can be made. Either a pure error detection is realised, which would be capable of
detecting all error patterns with up to three errors, or a combination of a single error correction with the detection of up to two errors can be targeted.

The error detection and correction capabilities of an $(n, k)$ block code are in general better than stated by Theorem 2.1 on the basis of the minimum distance. If pure error detection is considered, then $2^{n}-2^{k}$ different error patterns are detectable, which is the number of error patterns of length $n$ that do not correspond to any code vectors. For the pure error correction case of a $t$-error correcting linear block code, the standard array decoding mechanism allows the correction of $2^{n-k}=2^{m}$ error patterns, which include all combinations of $t$ or fewer errors [LC83].

### 2.2 Hamming Codes

A special class of linear block code is formed by Hamming codes. These codes exhibit, in basic form, a minimum distance of three, which qualifies them to be used as either single-error correcting or double-error detecting codes. For every number of redundancy bits $m$ with $m \geq 2$, it is possible to construct a Hamming code with a code word length of $n=2^{m}-1$ and $k=2^{m}-m-1$ information bits. An example of a $(7,4)$ Hamming code is given in Table 2.1 as the minimal weight over all nonzero code word accounts for three.

Hamming codes are considered as perfect codes as they reach the Hamming bound, which means that for a $t$-error correcting $(n, k)$ block code, the number of all error patterns of $t$ or fewer errors equals the ratio of all possible words to valid code words such that

$$
2^{m}=\sum_{i=0}^{t}\binom{n}{i}
$$

### 2.3 Cyclic Codes

An $(n, k)$ linear block code $C$ is considered to be cyclic if every cyclically shifted code vector results again in a code vector of the same code $C$ [Pra57; PB61; PW72; LC83]. In other words, for every $\bar{u} \in C$ with $\bar{u}=\left[\bar{u}_{n-1}, \bar{u}_{n-2}, \ldots, \bar{u}_{0}\right]$, it must follow that $\left[\bar{u}_{0}, \bar{u}_{n-1}, \ldots, \bar{u}_{1}\right] \in C$. The $(7,4)$ linear block code in Table 2.1 serves as an example of a cyclic code.

For the following considerations, it will be convenient to regard word vectors in an equivalent polynomial representation in the indeterminate $X$ in such a way that a


Figure 2.5: LFSR for $p(X)=X^{3}+X+1$.
vector $v=\left[v_{n-1}, v_{n-2}, \ldots, v_{0}\right]$ will correspond to the polynomial

$$
v(X)=v_{n-1} X^{n-1}+\cdots+v_{1} X+v_{0}
$$

An $(n, k)$ cyclic code $C$ can be characterised through a generator polynomial $p(X)$ of degree $m=n-k$, which is a factor of $X^{n}+1$ and takes the following shape

$$
\begin{equation*}
p(X)=X^{m}+p_{m-1} X^{m-1}+\cdots+p_{1} X+1 . \tag{2.3}
\end{equation*}
$$

The code polynomials of $C$ are comprised of all binary polynomials of degree less than $n$ that are divisible by the generator polynomial $p(X)$. For the cyclic code in Table 2.1, the generator polynomial accounts for $p(X)=X^{3}+X+1$.

### 2.3.1 Encoding

For an $(n, k)$ cyclic code with generator polynomial $p(X)$, a message polynomial $u(X)$ of degree $k-1$ can be translated into a code polynomial $\bar{u}(X)$, by simply multiplying it with the generator polynomial, such that $\bar{u}(X)=u(X) p(X)$. This guarantees that the code polynomial is divisible by the generator polynomial. The drawback of this method is, however, that the code is not in systematic form. A simple remedy can be provided as follows. If the message polynomial $u(X)$ is premultiplied by $X^{m}$, and subsequently divided by $p(X)$ to obtain the remainder $r(X)$, code polynomials in systematic form are produced by adding $r(X)$ to $X^{m} u(X)$, such that

$$
\bar{u}(X)=X^{m} u(X)+r(X) .
$$

The described systematic encoding process can easily be translated into hardware by employing a linear feedback shift register (LFSR) in Galois configuration as shown in Figure 2.5. It consists only of storage elements arranged into a circular shift register and XOR gates, which are used to couple the feedback path with different bits from the


Figure 2.6: LFSR for $p(X)=X^{3}+X+1$ with automatic premultiplication by $X^{3}$.
register according to the underlying generator polynomial. To compute the remainder, the LFSR is reset to the zero state and the message is then fed in serially starting from the most significant bit $u_{k-1}$. After all the message bits have entered the circuit, $m$ additional shifts are performed with the input set to zero to simulate the multiplication by $X^{m}$. As a result, the state of the LFSR corresponds to the remainder, which can then simply be appended to the message to form the code word.

The encoding process can be simplified by combining the input with the most significant register bit to form the feedback as shown in Figure 2.6. This modification allows the omission of the last $m$ register shifts with zero input.

A different modification concerns the length of the generated code words. Based on an ( $n, k$ ) cyclic code, it is possible to derive a shortened cyclic code with a length reduced by $l$. This can be achieved by considering the most significant message bits $u_{k-l}$ to $u_{k-1}$ to be always set to zero. The resulting code is, however, not cyclic anymore, but is an $(n-l, k-l)$ linear block code.

### 2.3.2 Decoding

The first step in the decoding of a cyclic code comprises the detection of errors. For this purpose the error syndrome is computed, which corresponds, in the case of a cyclic code, to the remainder of the division of the received polynomial and the generator polynomial. Since code polynomials are multiples of the generator polynomial $p(X)$, a syndrome, and therefore remainder, of zero indicates the detection of no errors. For all other syndrome cases, it is certain that errors have occurred.

The computation of the syndrome can easily be achieved by reusing the encoding circuit shown in Figure 2.5. Once the state of the circuit is reset to zero, the entire received polynomial is shifted into the circuit. The register will subsequently hold the syndrome. Alternatively, the circuit shown in Figure 2.6 can be used, with the difference that the obtained value will correspond to the syndrome after $m$ further register shifts with zero input, due to the automatic premultiplication by $X^{m}$. This
does not present any complication as the modification to the syndrome is easily reversible if necessary. For the case that no error is detected, the modified syndrome will also be zero. The efficient generation and validation of cyclic codes, employing the same circuit, contributed, to a great extent, to the popularity of cyclic codes in the field of error detection. In this context, a cyclic code is often referred to as a Cyclic Redundancy Checksum (CRC).

If error correction is considered, the situation is slightly different. The most basic case concerns the localisation and correction of a single-bit error and is considered in what follows. Under the assumption of a single-bit error occurrence at bit position $j$, where $0 \leq j \leq n-1$, a sent code polynomial $\bar{u}(X)$ is received as

$$
\bar{v}(X)=\bar{u}(X)+X^{j} .
$$

Since $\bar{u}(X)$ is divisible by $p(X)$, the syndrome $s(X)$ of $\bar{v}(X)$ will equal the remainder of the division of $X^{j}$ by $p(X)$, such that

$$
\begin{equation*}
X^{j} \equiv s(X) \quad(\bmod p(X)) . \tag{2.4}
\end{equation*}
$$

From (2.4) it is apparent that a maximum number of $2^{m}-1$ single-bit errors for a cyclic code are correctable, if the sequence of powers of $X$ modulo the generator polynomial $p(X)$ has a length of $2^{m}-1$. This is, at the same time, the maximum length that such a sequence can achieve with the available $m$ bits. Zero cannot be part of the sequence if $m>1$ as the least significant coefficient of the generator polynomial is one according to (2.3). The sequence is referred to as a maximum-length sequence and the generator polynomials that induce such sequences are called primitive polynomials. Primitive polynomials of degree $m$ can be used to generate $\left(2^{m}-1,2^{m}-m-1\right)$ cyclic codes, as every irreducible polynomial of degree $m$ is a factor of $X^{2^{m}-1}+1$ [Gol82; LN86]. These codes form a subclass of Hamming codes, which are characterised through their cyclic property.

A single-bit-error-correction mechanism for a cyclic or shortened cyclic code needs to deduce the bit error location $j$ from $s(X)$ in (2.4) to correct the erroneous bit. This computation is an instance of the generalised discrete logarithm problem, which is discussed in more detail with state-of-the-art algorithms in Chapter 3. There exist two very straightforward, but inefficient, solutions to this problem which form the basis of many error correction techniques. The first approach employs a table-lookup mechanism, which associates each single-bit error syndrome with the corresponding
bit error location $j$. This method is very fast, once it is set up, but impractical with regard to the memory requirements. At the other extreme, it is possible to search through all the powers of $X$ in (2.4); this requires a minimum amount of memory but needs, in the worst case, $n$ steps until completion. Predominantly qualified for this task is the LFSR, which traverses the power sequence through its shift operations.

Established error correction mechanisms for cyclic codes are based on the Meggitt decoder [Meg61]. It computes the error syndrome and uses it subsequently to decode the message bits in a serial manner. An error-pattern detection circuit is required that needs to sense all error syndromes that correspond to correctable error patterns that have been shifted to the high order bit positions. The complexity of the decoder is thus, on the one hand, determined by the number and characteristic of error patterns that are to be corrected. On the other hand, if a single-bit error is to be corrected, the Meggitt decoder requires, in the worst case, another $n$ steps after the syndrome has been calculated, as it can easily be verified that its operation is equivalent to the exhaustive search method described earlier. It starts its search from the syndrome, which is shifted inside an LFSR until a predefined value in the sequence is reachedsignaled by the error detection circuit-, so that the error position can be deduced and the error corrected.

An improvement to the Meggitt decoder is constituted by the error-trapping decoder [RM64; Kas64; LC83], which drastically reduces the complexity of the error detection circuit for some cases. Its operation is based on the fact that if the error bits can be cyclically shifted into the $m$ least significant bit positions of the received vector, the corresponding syndrome will coincide with those $m$ bits of the error pattern. It can be further shown that if the code is capable of correcting $t$ errors, the weight of the syndrome will be $t$ or less only if the error bits are located in the $m$ least significant bits of the received vector. It follows that the error detection circuit can be simplified since it needs to look out only for syndromes with a weight of $t$ or less. The drawback of the error-trapping method in comparison to the Meggitt decoder is, however, that the error bits of the correctable error patterns need to be confined to $m$ consecutive bit positions, including the case where the bits wrap from the most significant to the least significant end of the error vector. Also, as is the case with the Meggitt decoder, in the worst case $n$ steps are required to locate and correct a single-bit error.

### 2.3.3 Error Detection and Correction Capabilities

Since cyclic codes form a subclass of linear block codes, they inherit the error detection and correction capabilities from their superclass. These capabilities are described in Subsection 2.1.5. In addition, an $(n, k)$ cyclic code generated by $p(X)$ with degree $m=n-k$ is capable of detecting any error pattern where the erroneous bits are confined to $m$ or fewer consecutive bit positions, which includes the case of the erroneous bits wrapping from the most to the least significant bit position of the error vector [LC83; PW72]. These error patterns are referred to as cyclic burst errors of length $m$ or less, which cover also the single-bit error case. The probability that a cyclic burst error of length $m+1$ goes undetected is $2^{-(m-1)}$. For cyclic burst errors of length $l$ with $l>m+2$, the probability of undetectability is $2^{-m}$.

If restrictions are put on the generator polynomial $p(X)$, further properties can be derived. The code generated by a generator polynomial $p(X)$ that does not divide $x^{i}+1$ for any integer $i$ with $1 \leq i \leq l$, detects all double bit errors that are not more than $l$ bit positions apart [TW11]. In particular, all double bit errors are detected for $l=n-1$. If $(x+1)$ is a factor of $p(X)$, all error patterns with an odd number of bit errors are detected, at the expense of error patterns with an even number of bit errors [TW11; Koo02].

Cyclic codes possess, in general, further error detection capabilities that go beyond the ones stated above. These and their error correction capabilities need to be assessed in detail for each generator polynomial individually, as there are great variations among them [Koo02; KC04].

### 2.3.4 Generator Selection Considerations

There are many factors upon which the selection of a cyclic code generator polynomial for an application depends. First of all, there may be constraints on the degree of the generator polynomial, for instance, due to block length, information rate or system design specifications.

Secondly, only certain generator polynomials may be qualified for use in some cases, due to differences in the efficiency in which the corresponding encoder and decoder can be implemented. This concerns, particularly, the differences in speed or resource demand in hardware [CW94; Bra+96; Der01; CP06; KRM08; KRM09] or software [Ngu09].

Thirdly, different generator polynomials exhibit different error detection and correction capabilities as outlined in Subsection 2.1.5 and Subsection 2.3.3. In particular, if a generator polynomial is to be used for different sized codes, i.e. different shortened cyclic codes derived from a single cyclic code, it is important to analyse and weigh up the properties of different generator polynomials to select the most advantageous one [Koo02; KC04]. The capabilities of the code need also to be considered in conjunction with the error patterns that are anticipated. Errors that are very likely to occur should be manageable by the selected code. The choice of polynomial can also be influenced by the data that is to be protected, as it may exhibit redundancy that could assist in the error detection and correction process.

Finally, there exist cyclic code classes for which simplified error correction procedures have been devised as, for instance, is the case with BCH codes which are briefly outlined in Section 2.4, or codes that take advantage of the special factorisations of the sequence length in (2.4) [CW94]. Where decoding efficiency is of interest, it may be necessary to restrict the cyclic code search to such classes.

### 2.4 BCH Codes

An important class of cyclic code, the Bose-Chaudhuri-Hocquenghem (BCH) codes [Hoc59; BRC60], offer a simplified decoding mechanism for random bit errors. A $t$-error-correcting binary primitive BCH code for a positive integer $m$, has a code word length of $n=2^{m}-1$ and a number of redundancy bits that does not exceed $m t$. The generator polynomial $p(X)$ is defined as the polynomial of lowest degree that has $2 t$ consecutive powers of a primitive element $\alpha$ of the Galois field $G F\left(2^{m}\right)$ as its roots, such that $g\left(\alpha^{i}\right)=0$ for $1 \leq i \leq 2 t$ [LC83; PW72]. With $\phi_{i}(X)$ being the minimal polynomial of $\alpha^{i}$, the generator polynomial can be computed as the least common multiple of the minimum polynomials of the roots $\alpha^{2 j+1}$, where $0 \leq j \leq t-1$, so that

$$
p(X)=\operatorname{lcm}\left(\phi_{1}(X), \phi_{3}(X), \ldots, \phi_{2 t-1}(X)\right)
$$

The encoding of a BCH code with generator polynomial $p(X)$ can be accomplished in the same way as for general cyclic codes described in Subsection 2.3.1.

### 2.4.1 Decoding

The decoding process for BCH codes takes advantage of the roots of the generator polynomial $p(X)$. It is assumed that a code polynomial $\bar{u}(X)$ is corrupted with an error polynomial $e(X)$, such that $\bar{v}(X)=\bar{u}(X)+e(X)$ is received. The errors are assumed to be located at bit positions $j_{0}$ to $j_{v-1}$ with $0 \leq j_{0}<j_{1}<\cdots<j_{v-1} \leq n-1$, so that

$$
e(X)=X^{j_{v-1}}+X^{j_{v-2}}+\cdots+X^{j_{0}} .
$$

If $\bar{v}(X)$ is evaluated at the roots $\alpha^{i}$ of $p(X)$ for $1 \leq i \leq 2 t$, it is essentially the error polynomial $e(X)$, which is evaluated, since the $\alpha^{i}$ are also roots of code polynomials passed down from the generator. Alternatively, the syndrome $s(X)$ of $\bar{v}(X)$ can be evaluated at the roots $\alpha^{i}$ leading to the same result [CS09]. With $S_{i}=\bar{v}\left(\alpha^{i}\right)=e\left(\alpha^{i}\right)=$ $s\left(\alpha^{i}\right)$, a set of $2 t$ equations can be obtained:

$$
\begin{aligned}
S_{1} & =\left(\alpha^{j_{v-1}}\right)^{1}+\left(\alpha^{j_{v-2}}\right)^{1}+\cdots+\left(\alpha^{j_{0}}\right)^{1} \\
S_{2} & =\left(\alpha^{j_{v-1}}\right)^{2}+\left(\alpha^{j_{v-2}}\right)^{2}+\cdots+\left(\alpha^{j_{0}}\right)^{2} \\
S_{3} & =\left(\alpha^{j_{v-1}}\right)^{3}+\left(\alpha^{j_{v-2}}\right)^{3}+\cdots+\left(\alpha^{j_{0}}\right)^{3} \\
& \vdots \\
S_{2 t} & =\left(\alpha^{j_{v-1}}\right)^{2 t}+\left(\alpha^{j_{v-2}}\right)^{2 t}+\cdots+\left(\alpha^{j_{0}}\right)^{2 t} .
\end{aligned}
$$

Under the assumption that $v \leq t$, i.e. $t$ or fewer errors occurred, a unique solution for the error-location numbers $\beta_{i}=\alpha^{j_{i}}$ with $0 \leq i \leq v-1$, for the above set of nonlinear equations is computable. From an error-location number $\beta_{i}$, the actual bit error location $j_{i}$ can be derived by computing its discrete logarithm, which is touched on in Subsection 2.3.2, and for which more details are provided in Chapter 3.

In the standard decoding procedure for BCH codes, an error-location polynomial $\sigma(X)$ is constructed, which is defined as

$$
\begin{aligned}
\sigma(X) & =\left(\beta_{v-1} X+1\right)\left(\beta_{v-2} X+1\right) \cdots\left(\beta_{0} X+1\right) \\
& =\sigma_{v} X^{v}+\sigma_{v-1} X^{v-1}+\cdots+\sigma_{0} X^{0} .
\end{aligned}
$$

The polynomial coefficients $\sigma_{i}$ for $0 \leq i \leq v$ can be computed from the values of $S_{j}$, where $1 \leq j \leq 2 t$, with the help of the Berlekamp-Massey algorithm [Ber65; Mas69]. Once the error-location polynomial $\sigma(X)$ is determined, its roots will point to the error-location numbers, since a root is just the inverse of an error-location number.

An established procedure for determining the roots of $\sigma(X)$ is Chien's search [Pet60; Chi64]. Its operating principle consists in the successive evaluation of $\sigma(X)$ at all potential error-location numbers $\alpha^{1}$ to $\alpha^{n}$. For a discovered root $\alpha^{i}$ in step $i$ with $1 \leq i \leq n$, the error-location number corresponds to $\alpha^{n-i}$, and it follows that the erroneous bit is located at bit position $n-i$.

If the correction of a single-bit error at bit position $j$ on the basis of the described decoding procedures for BCH codes is considered, the computationally expensive step is performing Chien's search. The error-location number for a single-bit error is readily given by $S_{1}=s(\alpha)=\alpha^{j}$, from which the error-location polynomial $\sigma(X)$ can directly be assembled as $\sigma(X)=\left(\alpha^{j} X+1\right)$. Chien's search algorithm is then used to determine $j$ essentially by searching through all possible values, which is one of the inefficient methods of determining the discrete logarithm of $\alpha^{j}$ as described in Subsection 2.3.2. More details on the computation of discrete logarithms are given in Chapter 3.

### 2.5 Conclusion

Error-control codes provide a means to enhance the reliability of digital data that is exposed to noise during its processing, transmission, or storage. To employ a code, before transmission the sender of the data enriches it with redundancy to form a code word according to the code specification. Once the possibly corrupted code word is received by a consumer, it can use the redundancy to detect and possibly also correct errors.

Cyclic codes from an important class of error-control code, since their algebraic properties allow the use of simple LFSRs to generate and verify code words. Additionally, cyclic codes offer powerful error detection and correction capabilities. The specific capabilities depend not only on the cyclic code generator polynomial, but also on the length of the data that is to be protected. To exploit the full potential of cyclic codes, it may be necessary to provide the flexibility of an adaptable generator polynomial, as in the case of SpiNNaker which is described in more detail in Chapter 4. A novel solution to the creation of efficient programmable cyclic code circuits is proposed in Chapter 5.

The most basic case of error correction concerns the localisation of a single-bit error in the data. If cyclic codes serve as the basis for error control, the correction of a single-bit error requires the computation of the discrete logarithm in relevant groups.

This can be achieved with, for instance, a table-lookup mechanism or the exhaustive search method. However, neither of these methods is efficient. More details on the discrete logarithm problem together with more efficient state-of-the-art algorithms are found in Chapter 3, and two new solutions to this problem for particular groups are proposed in Chapter 6 and Chapter 7.

## Chapter 3

## Discrete Logarithms

For a generator element $\alpha$ of a finite cyclic group ( $G, \cdot$ ) of order $q$, and an element $\beta \in G$, the discrete logarithm is defined as the integer $k$ with $0 \leq k \leq q-1$, such that

$$
\begin{equation*}
\alpha^{k}=\beta, \tag{3.1}
\end{equation*}
$$

and denoted by $k=\log _{a} \beta$. Finding the unique integer $k$ in the above setting is described by the generalised discrete logarithm problem [MVO96], and is referred to as the discrete logarithm problem in what follows. For general finite cyclic groups $G$, no efficient classical method for solving the discrete logarithm problem has been reported yet [Odl85; Odl00; McC90; MVO96; Mos96; Buc01; Sti02; Gal12].

It is, however, the case that groups exist for which the discrete logarithm is easily obtainable. If the finite cyclic group $\left(\mathbb{Z}_{n},+\right)$ under addition modulo $n$ is considered for instance, exponentiation on the basis of the group operation as in (3.1) translates into multiplication in $\mathbb{Z}_{n}$, such that for a generator $\alpha \in \mathbb{Z}_{n}, \beta \in \mathbb{Z}_{n}$, and $0 \leq k \leq n-1$, the problem takes the following shape

$$
\alpha k \equiv \beta \quad(\bmod n) .
$$

Since $\alpha$ is a generator, it has a multiplicative inverse element $\alpha^{-1}$, and it follows

$$
k \equiv \alpha^{-1} \beta \quad(\bmod n)
$$

The computation of the inverse element $\alpha^{-1}$ can easily be accomplished with the help of the extended Euclidean algorithm, since the relation $\operatorname{gcd}(\alpha, n)=1$ holds for every generator element $\alpha$ of the group $\mathbb{Z}_{n}$.

As a matter of fact, cyclic groups of the same order are isomorphic to each other, so that every finite cyclic group of order $n$ can be transformed into $\mathbb{Z}_{n}$ [Gal02; BM77]. This means that, if the isomorphism between a finite cyclic group $G$ of order $n$ and $\mathbb{Z}_{n}$ can be computed efficiently, the discrete logarithm can also be computed efficiently for $G$. However, no method is known for determining this isomorphism efficiently for arbitrary groups [Sti02].

The difficulty of the discrete logarithm problem is independent of the choice of the generator element. For a cyclic group $G$, with generator elements $\alpha$ and $\bar{\alpha}$, it is assumed that logarithms can easily be computed to the base $\bar{\alpha}$. In this case, the logarithm of a $\beta \in G$ to the base $\alpha$ can simply be obtained as

$$
\log _{\alpha} \beta=\left(\log _{\bar{\alpha}} \beta\right)\left(\log _{\bar{\alpha}} \alpha\right)^{-1} \quad(\bmod q),
$$

where $q$ denotes the order of $G$ [MVO96].
A popular choice for a finite cyclic group in computer system applications is the multiplicative group of a finite field of binary characteristic, as its arithmetic can be implemented rather easily in hardware and software [Odl85]. The group can be represented as the polynomial ring over $G F(2)$ modulo an irreducible polynomial $f(X)$ over $G F(2)$, and indicated as $\mathbb{Z}_{2}[X] /\langle f(X)\rangle$. Different irreducible polynomials $f(X)$ of the same degree induce different representations of one and the same group. However, similarly to the choice of the primitive element, the choice of the irreducible polynomial $f(X)$ does not have any effect on the difficulty of the discrete logarithm problem, as the group representation can be changed in polynomial time [Zie74; Len91].

The inverse operation of the discrete logarithm, discrete exponentiation, is always computable in polynomial time with the square-and-multiply algorithm [Knu97]. This fact, and the wide belief that the computation of the discrete logarithm is impractical for groups of certain representation and order, has led to a number of cryptographic applications that base their operating principle on the discrete logarithm, such as the Diffie-Hellman-Merkle key exchange [DH76], the ElGamal public-key cryptosystem and signature scheme [Elg85], and its variant the Schnorr signature and identification scheme [Sch91].

Algorithms for computing discrete logarithms are subdivided into generic algorithms that do not rely on any special representation of the group elements, and special algorithms that work best only for certain group representations. Furthermore, some algorithms are dependent on the factorisation of the group order $q$. It has been
shown that the fastest constructable generic algorithm that assumes that each group element has a unique encoding, cannot improve on the time complexity $\Omega(\sqrt{p})$, where $p$ is the largest prime factor of the group order [Nec94; Sho97].

In what follows, the main ideas for the fastest algorithms for the discrete logarithm that have been published are briefly outlined. The algorithms are all of the generic type apart from the index-calculus method.

### 3.1 Shanks' Algorithm

This algorithm is also known as the baby-step giant-step algorithm [Sha71] and is essentially a time-space trade-off. A group ( $G, \cdot$ ) of order $q$ with generator element $\alpha$ and an element $\beta \in G$ is considered. The algorithm is based on the fact that a $\alpha^{k}$, $0 \leq k \leq q-1$, can be expressed for an $m, 1 \leq m \leq q$, as

$$
\alpha^{k}=\alpha^{i m+j}
$$

with $0 \leq i \leq\left\lceil\frac{q}{m}\right\rceil-1$ and $0 \leq j \leq m-1$. The algorithm generates, initially, a list of 'giant steps' $\alpha^{i m}$ for $0 \leq i \leq\left\lceil\frac{q}{m}\right\rceil-1$, which are stored sorted in memory and associated with their corresponding exponent factor $i$. If the logarithm of $\beta$ to the base $\alpha$ is to be computed, the baby steps $\beta \alpha^{-j}$ are evaluated and searched for in the stored list, for each $j$ that satisfies $0 \leq j \leq m-1$. Once a match is found so that $\beta \alpha^{-j}=\alpha^{i m}$, the discrete logarithm can be computed as

$$
\log _{\alpha} \beta \equiv i m+j \quad(\bmod q) .
$$

The pseudocode for the initialisation of Shanks' algorithm is shown in Algorithm 3.1, whereas the pseudocode for the evaluation of a specific discrete logarithm can be found in Algorithm 3.2.

```
Algorithm 3.1 Shanks' precomputation.
    procedure Shanks_Precomputation \((\alpha, q, m)\)
        for \(i=0\) to \(\left[\frac{q}{m}\right]-1\) do
            insert \(\left(\alpha^{i m}, i\right)\) into the memory, sorting pairs by their first component
        end for
    end procedure
```

```
Algorithm 3.2 Shanks' algorithm.
    function \(\operatorname{Shanks}(\alpha, \beta, q, m)\)
        for \(j=0\) to \(m-1\) do
            if \((x, i) \in\) memory with \(x=\beta \alpha^{-j}\) then
                return \(i m+j(\bmod q)\)
            end if
        end for
    end function
```

Considering that a search in the sorted list requires in the worst case $\left\lceil\log _{2} \frac{q}{m}\right\rceil$ steps, and that in the worst case a maximum number of $m$ baby steps need to be evaluated until a match is found, the worst-case running time can be indicated as $m\left\lceil\log _{2} \frac{q}{m}\right\rceil$. In terms of memory requirements, $\left\lceil\frac{q}{m}\right\rceil$ list entries need to be stored. If $m$ is chosen to be set to $m=1$, all possible powers of $\alpha$ are stored in memory, so that essentially a table-lookup mechanism is realised. The other extreme is achieved if $m$ is set to its upper bound $q$, which basically results in an exhaustive search algorithm with minimal memory requirements. For the case that $m=\lceil\sqrt{q}\rceil$, the time complexity accounts for $O\left(\sqrt{q} \log _{2} \sqrt{q}\right)$, while the memory complexity accounts for $O(\sqrt{q})$.

### 3.2 Pollard's Rho Algorithm

Pollard's rho algorithm for discrete logarithms [Pol78] is considered for a group ( $G, \cdot$ ) of order $q$ with generator element $\alpha$. For an element $\beta \in G$, the discrete logarithm to the base $\alpha$ is to be determined. The algorithm partitions $G$ into three sets $S_{0}$ to $S_{2}$ of comparable size, and defines a recursive sequence of elements in $G$ with starting value $x_{0}=1$ and

$$
x_{i+1}= \begin{cases}x_{i} \alpha & \text { if } x_{i} \in S_{0} \\ x_{i}^{2} & \text { if } x_{i} \in S_{1} \\ x_{i} \beta & \text { if } x_{i} \in S_{2}\end{cases}
$$

It needs to be ensured that $1 \notin S_{1}$, as otherwise all sequence elements would equal 1 . To keep track of the manipulations that are applied to the starting value $x_{0}$ during the traversal of the sequence, it is possible to record the exponents of $\alpha$ and $\beta$ in $a_{i}$ and $b_{i}$, respectively, so that $x_{i}$ can be expressed as

$$
x_{i}=\alpha^{a_{i}} \beta^{b_{i}} .
$$

The $a_{i}$ and $b_{i}$ can be defined recursively as a function of $x_{i}$ with $a_{0}=0$ and $b_{0}=0$ as follows

$$
a_{i+1}= \begin{cases}a_{i}+1 \bmod q & \text { if } x_{i} \in S_{0} \\ 2 a_{i} \bmod q & \text { if } x_{i} \in S_{1} \\ a_{i} & \text { if } x_{i} \in S_{2}\end{cases}
$$

and

$$
b_{i+1}= \begin{cases}b_{i} & \text { if } x_{i} \in S_{0} \\ 2 b_{i} \bmod q & \text { if } x_{i} \in S_{1} \\ b_{i}+1 \bmod q & \text { if } x_{i} \in S_{2}\end{cases}
$$

The algorithm searches now the sequence for a collision of the form $x_{i}=x_{j}$, for $i \neq j$, which is simplified for practical reasons to cases where

$$
x_{i}=x_{2 i},
$$

for $i>0$. Once such a collision is found, it follows that

$$
\alpha^{a_{i}} \beta^{b_{i}}=\alpha^{a_{2 i}} \beta^{b_{2 i}}
$$

which can be rewritten as

$$
\beta^{b_{i}-b_{2 i}}=\alpha^{a_{2 i}-a_{i}} .
$$

Taking the logarithm to the base $\alpha$ on both sides leads to

$$
\left(b_{i}-b_{2 i}\right) \log _{\alpha} \beta \equiv a_{2 i}-a_{i} \quad(\bmod q) .
$$

If $\operatorname{gcd}\left(b_{i}-b_{2 i}, q\right)=1$, the logarithm can be solved as

$$
\log _{\alpha} \beta \equiv\left(b_{i}-b_{2 i}\right)^{-1}\left(a_{2 i}-a_{i}\right) \quad(\bmod q) .
$$

Otherwise, the linear congruence exhibits $\operatorname{gcd}\left(b_{i}-b_{2 i}, q\right)$ solutions. If the greatest common divisor is not too large, all the different solutions can be tested until the unique solution is found that corresponds to the discrete logarithm. In the case that the number of solution to the linear congruence is too large, Pollard's rho algorithm can be repeated with a different starting value $x_{0}=\alpha^{a_{0}} \beta^{b_{0}}$ with $a_{0}, b_{0} \in \mathbb{Z}_{q}$ [MVO96]. The pseudocode for the algorithm is shown in Algorithm 3.3.

Under the assumption that the sequence behaves like a random mapping on $G$,

```
Algorithm 3.3 Pollard's rho algorithm for discrete logarithms.
    function \(\operatorname{Pollard}\left(\alpha, \beta, q, S_{0}, S_{1}, S_{2}\right)\)
        globalise \(\alpha, \beta, q, S_{0}, S_{1}, S_{2}\)
        \(\{x, a, b\}=\{1,0,0\}\)
        \(\{\bar{x}, \bar{a}, \bar{b}\}=\operatorname{sEQ}(x, a, b)\)
        while \(x \neq \bar{x}\) do
            \(\{x, a, b\}=\operatorname{seq}(x, a, b)\)
            \(\{\bar{x}, \bar{a}, \bar{b}\}=\operatorname{seq}(\bar{x}, \bar{a}, \bar{b})\)
            \(\{\bar{x}, \bar{a}, \bar{b}\}=\operatorname{seq}(\bar{x}, \bar{a}, \bar{b})\)
        end while
        if \(\operatorname{GCD}(b-\bar{b}, q)\) is small then
            determine \(k\) satisfying \((b-\bar{b}) k \equiv(\bar{a}-a)(\bmod q)\) and \(\alpha^{k}=\beta\)
            return \(k\)
        else
            restart with different starting values \(\{x, a, b\}\)
        end if
    end function
    function \(\operatorname{SEQ}(x, a, b)\)
        switch \(x \in\)
            case \(S_{0}:\) return \(\{x \alpha, a+1 \bmod q, b\}\)
            case \(S_{1}:\) return \(\left\{x^{2}, 2 a \bmod q, 2 b \bmod q\right\}\)
            case \(S_{2}:\) return \(\{x \beta, a, b+1 \bmod q\}\)
        end switch
    end function
```

it has been shown that the number of operations that the algorithm requires until a collision is found, has an expectation value close to

$$
\sqrt{\frac{\pi^{5} q}{288}} \approx 1.0308 \sqrt{q} .
$$

Alternative sequences that improve on the described sequence above have been suggested [Tes00].

### 3.3 Silver-Pohlig-Hellman Algorithm

The group ( $G, \cdot)$ of order $q$ with generator element $\alpha$ is considered. For an element $\beta \in G$, the discrete logarithm $k=\log _{\alpha} \beta$ is to be determined. The Silver-PohligHellman algorithm [PH78] simplifies the task by taking into account the factorisation of the group order $q$. It is assumed that $q$ decomposes into distinct primes $p_{i}$, such
that

$$
q=\prod_{i=1}^{N} p_{i}^{e_{i}}
$$

With this factorisation, the algorithm computes the discrete logarithm $k$ modulo each of the prime powers $p_{i}^{e_{i}}$ for $1 \leq i \leq N$. All those values can then be easily combined with the Chinese remainder theorem to the overall solution $k$.

To compute $k$ modulo $p_{i}^{e_{i}}$, the result is considered in the radix $p_{i}$ expansion as

$$
k \equiv \sum_{j=0}^{e_{i}-1} k_{j} p_{i}^{j} \quad\left(\bmod p_{i}^{e_{i}}\right)
$$

where $0 \leq k_{j} \leq p_{i}-1$. The values $k_{j}$ are determined one by one in increasing significance starting from $k_{0}$. It is now assumed that $k_{0}$ to $k_{j-1}$ with $j<e_{i}$, have already been computed, so that $k_{j}$ will be determined in the next step. For this reason $\beta_{j}$ is introduced as follows

$$
\begin{aligned}
\beta_{j} & =\left(\beta \alpha^{-\left(k_{j-1} p_{i}^{j-1}+k_{j-2} p_{i}^{j-2}+\cdots+k_{0}\right)}\right)^{q / p_{i}^{j+1}} \\
& =\left(\alpha^{k} \alpha^{-\left(k_{j-1} p_{i}^{j-1}+k_{j-2} p_{i}^{j-2}+\cdots+k_{0}\right)}\right)^{q / p_{i}^{j+1}} \\
& =\left(\alpha^{c p_{i}^{e_{i}}+k_{e_{i}-1} p_{i}^{e_{i}-1}+k_{e_{i}-2} p_{i}^{e_{i}-2}+\cdots+k_{j} p_{i}^{j}}\right)^{q / p_{i}^{j+1}} \\
& =\alpha^{\bar{c} q} \alpha^{\left(q / p_{i}\right) k_{j}} \\
& =\alpha^{\left(q / p_{i}\right) k_{j}} \\
& =\alpha_{j}^{k_{j}}
\end{aligned}
$$

where $c$ and $\bar{c}$ are integers. The coefficient $k_{j}$ can thus be computed as $k_{j}=\log _{\alpha_{j}} \beta_{j}$, whereby it is apparent from the transformation that $\alpha_{j}$ has an order of $p_{i}$.

It follows that with the Silver-Pohlig-Hellman algorithm, finding the discrete logarithm $k$ in a group $G$ of order $q$, is reduced to finding discrete logarithms in subgroups of $G$, whose orders correspond to the different prime factors $p_{i}$ of $q$. More precisely, for every prime factor $p_{i}$ with multiplicity $e_{i}$ of the group order $q, e_{i}$ instances of the discrete logarithm need to be solved in the subgroup of order $p_{i}$. If the group order $q$ decomposes into different prime powers $p_{i}^{e_{i}}$, the Chinese remainder theorem needs to be employed in a last step to combine the intermediate results to the overall solution $k$.

The operation of the algorithm is summarised in the pseudocode given in Algorithm 3.4. Obvious optimisations to the code have been omitted for enhanced

```
Algorithm 3.4 Silver-Pohlig-Hellman algorithm.
    function \(\operatorname{Silver-Pohlig-Hellman}\left(\alpha, \beta, q=\prod_{i=1}^{N} p_{i}^{e_{i}}\right)\)
        for \(i=1\) to \(N\) do
            for \(j=0\) to \(e_{i}-1\) do
            \(\beta_{j}=\left(\beta \alpha^{-\left(k_{j-1} p_{i}^{j-1}+k_{j-2} p_{i}^{j-2}+\cdots+k_{0}\right)}\right)^{q / p_{i}^{j+1}}\)
            \(\alpha_{j}=\alpha^{q / p_{i}}\)
            \(k_{j}=\log _{\alpha_{j}} \beta_{j}\)
            end for
            \(\bar{k}_{i}=k_{e_{i}-1} p_{i}^{e_{i}-1}+k_{e_{i}-2} p_{i}^{e_{i}-2}+\cdots+k_{0}\)
        end for
        compute \(k\) such that \(k \equiv \bar{k}_{i}\left(\bmod p_{i}^{e_{i}}\right)\) for \(1 \leq i \leq N\)
        return \(k\)
    end function
```

readability. The runtime of the algorithm is on the one hand influenced by the time $T_{p_{i}}$ that is necessary to evaluate the discrete logarithm in the subgroup of order $p_{i}$, for which for instance, Shanks' or Pollard's rho algorithm can be used. On the other hand it is necessary to dedicate time in the order of $\log _{2} q$ to compute $\beta_{j}$. Certain calculations for the Chinese remainder theorem can be precomputed for the factorisation of $q$, so that its execution time can be neglected. In summary, the overall time complexity of the algorithm is given by $O\left(\sum_{i=1}^{N} e_{i}\left(\log _{2} q+T_{p_{i}}\right)\right)$.

The Silver-Pohlig-Hellman is particularly efficient if the group order $q$ is smooth, that is to say its prime factors are all below a certain threshold, such that the discrete logarithms can easily be evaluated in the corresponding subgroups. However, if the group order $q$ is prime, the method is without effect.

### 3.4 Index-Calculus Algorithm

The index-calculus method [Od185; Odl00; Sch+96; MVO96; Sti02; Sch08; Gal12] is a special algorithm for solving discrete logarithms, as its effectiveness depends on properties of the underlying group representation. It is probabilistic and requires a substantial amount of memory but it is, at the same time, the most powerful algorithm that has been devised for computing discrete logarithms. Its main idea is sketched in what follows.

A cyclic group $G$ of order $q$ with the primitive element $\alpha$ is considered. Furthermore, a factor base $\mathcal{B}=\left\{b_{1}, b_{2}, \ldots, b_{|\mathcal{B}|}\right\}$ is selected, which is a rather small subset of elements in $G$, such that a large portion of elements in $G$ factorise over $\mathcal{B}$. If an
element factorises over $\mathcal{B}$, it is considered to be smooth with respect to $\mathcal{B}$. During the precomputation phase of the index-calculus method, which is discussed in more detail below, the discrete logarithms to the base $\alpha$ of the elements in the factor base $\mathcal{B}$ are determined.

Under the assumption that the precomputation phase has been completed, a particular discrete logarithm to the base $\alpha$ for an element $\beta \in G$ can be computed as follows. An integer $r$ is chosen at random with $0 \leq r \leq q-1$, until $\beta \alpha^{r}$ is found to factorise over $\mathcal{B}$, such that

$$
\beta \alpha^{r}=\prod_{i=1}^{|\mathcal{B}|} b_{i}^{e_{i}} .
$$

If the logarithm to the base $\alpha$ is taken on both sides of the equation, it follows that

$$
\log _{\alpha} \beta \equiv \sum_{i=1}^{|\mathcal{B}|} e_{i} \log _{\alpha} b_{i}-r \quad(\bmod q) .
$$

Thus, $\log _{\alpha} \beta$ can be computed since the logarithms of the elements in the factor base are known.

The precomputation phase is subdivided into a sieving and a linear algebra stage. During the sieving stage, random integers $r$ with $0 \leq r \leq q-1$ are chosen, and used to test if $\alpha^{r}$ can be composed of elements in $\mathcal{B}$, such that

$$
\alpha^{r}=\prod_{i=1}^{|\mathcal{B}|} b_{i}^{e_{i}},
$$

which leads to the linear congruence

$$
r \equiv \sum_{i=1}^{|\mathcal{B}|} e_{i} \log _{\alpha} b_{i} \quad(\bmod q) .
$$

Once enough of these linear relations modulo $q$ have been collected such that the resulting linear system has a unique solution, the linear algebra stage is initiated to compute this solution and thus the logarithms of the elements in the factor base.

If the underlying group is the multiplicative group $\mathbb{F}_{q}^{*}$ of a finite field $\mathbb{F}_{q}$, a more general variant of the index-calculus method is usually considered [Sch+96; Sch08]. Let $\alpha$ be a primitive element of $\mathbb{F}_{q}^{*}$. The logarithm to the base $\alpha$ for an element $\beta \in \mathbb{F}_{q}^{*}$ is to be determined. Two Dedekind domains $R_{0}$ and $R_{1}$ need to be selected with corresponding surjective ring homomorphisms $\phi_{0}: R_{0} \rightarrow \mathbb{F}_{q}^{*}$ and $\phi_{1}: R_{1} \rightarrow \mathbb{F}_{q}^{*}$.

Furthermore, two factor bases $\mathcal{A}=\left\{\mathfrak{a}_{1}, \mathfrak{a}_{2}, \ldots, \mathfrak{a}_{|\mathcal{A}|}\right\}$ and $\mathcal{B}=\left\{\mathfrak{b}_{1}, \mathfrak{b}_{2}, \ldots, \mathfrak{b}_{|\mathcal{B}|}\right\}$ are defined that consist of prime ideals of $R_{0}$ and $R_{1}$, respectively. For $\alpha$ and $\beta$, the preimages $a \in R_{0}$ and $b \in R_{1}$ under the corresponding ring homomorphisms need to be obtained, such that $\phi_{0}(a)=\alpha, \phi_{1}(b)=\beta$ and the ideals generated by $a$ and $b$ are smooth in respect to their corresponding factor bases $\mathcal{A}$ and $\mathcal{B}$. It follows that the ideals can be expressed as

$$
(a)=\prod_{j=1}^{|\mathcal{A}|} \mathfrak{a}_{j}^{v_{j}} \quad \text { and } \quad(b)=\prod_{j=1}^{|\mathcal{B}|} \mathfrak{b}_{j}{ }^{w_{j}} .
$$

The algorithm searches now for pairs $\left(a_{i}, b_{i}\right) \in R_{0} \times R_{1}$, such that

$$
\begin{gathered}
\phi_{0}\left(a_{i}\right)=\phi_{1}\left(b_{i}\right), \\
\left(a_{i}\right)=\prod_{j=1}^{|\mathcal{F}|} \mathfrak{a}_{j}^{v_{i, j}} \quad \text { and } \quad\left(b_{i}\right)=\prod_{j=1}^{|\mathcal{B}|} \mathfrak{b}_{j}^{w_{i, j}} .
\end{gathered}
$$

With a sufficiently large collection of $N$ pairs $\left(a_{i}, b_{i}\right)$ and the corresponding factorisation of their ideals, the following linear system modulo $q-1$ is constructed

$$
\left[\begin{array}{ccccc}
v_{1} & v_{1,1} & v_{2,1} & \cdots & v_{N, 1} \\
v_{2} & v_{1,2} & v_{2,2} & \cdots & v_{N, 2} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
v_{|\mathcal{A}|} & v_{1,|\mathcal{A}|} & v_{2,|\mathcal{F |}|} & \cdots & v_{N,|\mathcal{F}|} \\
0 & w_{1,1} & w_{2,1} & \cdots & w_{N, 1} \\
0 & w_{1,2} & w_{2,2} & \cdots & w_{N, 2} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & w_{1,|\mathcal{B}|} & w_{2,|\mathcal{B}|} & \cdots & w_{N,|\mathcal{B}|}
\end{array}\right] X \equiv-\left[\begin{array}{c}
0 \\
0 \\
\vdots \\
0 \\
w_{1} \\
w_{2} \\
\vdots \\
w_{|\mathcal{B}|}
\end{array}\right] \quad(\bmod q-1) .
$$

With a solution $X=\left[x_{0}, x_{1}, x_{2}, \ldots, x_{N}\right]^{T}$ to the system of linear congruences, the elements

$$
\bar{a}=a^{x_{0}} \prod_{i=1}^{N} a_{i}^{x_{i}} \quad \text { and } \quad \bar{b}=b \prod_{i=1}^{N} b_{i}^{x_{i}}
$$

are defined. Furthermore,

$$
\begin{equation*}
\beta \phi_{0}(\bar{a})=\beta \alpha^{x_{0}} \phi_{0}\left(\prod_{i=1}^{N} a_{i}^{x_{i}}\right)=\alpha^{x_{0}} \beta \phi_{1}\left(\prod_{i=1}^{N} b_{i}^{x_{i}}\right)=\alpha^{x_{0}} \phi_{1}(\bar{b}) \tag{3.2}
\end{equation*}
$$

If additionally, both $\bar{a}$ and $\bar{b}$ are assumed to be ( $q-1$ )st powers, the corresponding ring homomorphisms would map them to the multiplicative identity element in $\mathbb{F}_{q}$, such that (3.2) would yield

$$
\alpha^{x_{0}}=\beta,
$$

where $x_{0}$ is the discrete logarithm that is being searched for.
Instead of using $R_{0}$ and $R_{1}$ for the preimages of $\alpha$ and $\beta$, respectively, it is possible to modify the algorithm slightly to locate the preimages in only one of the two Dedekind domains [Sch08]. Also, in contrast to the former described index-calculus variant, the precomputation phase has been integrated with the main computation, however, they can be split if more than one discrete logarithm needs to be computed for the same setup [JL02; Sch05].

It will be convenient to introduce the following notation for the characterisation of the time complexity of the index-calculus method

$$
L_{q}(a, c)=\exp \left((c+o(1))(\ln q)^{a}(\ln \ln q)^{1-a}\right),
$$

which interpolates between polynomial and exponential behaviour as $a$ varies from zero to one. Different variants of the index-calculus method have been devised to deal with different sizes of the multiplicative group of a finite field $\mathbb{F}_{q}$ with $q=p^{m}$ and $p$ being a prime number. The time complexity for some of the variants has been proven rigorously, in other cases plausible assumptions are made that lead to algorithms with only conjectured, but better, time complexities.

If rigorously proven algorithms are considered, for $m=1$ and $p \rightarrow \infty$, an expected subexponential running time of $L_{q}(1 / 2, \sqrt{2})$ is achievable [Pom87]. The same time complexity can be achieved if $p \leq m^{o(m)}$ for $q \rightarrow \infty$ [LP98]. For the special case of $m=2$ and $p \rightarrow \infty$, an index-calculus variant with expected running time of $L_{q}(1 / 2,3 / 2)$ is constructable [Lov92].

If unproven assumptions are allowed for the runtime analysis, versions of the index-calculus method have been devised that yield a better performance than in the case of the rigorously proven ones. Adleman and DeMarrais have proposed for $p>m$ as $q \rightarrow \infty$ an algorithm with conjectured expected running time of $L_{q}(1 / 2,2)$ [Adl79; AD93; AD94]. Thus, if this result is combined with the rigorously proven algorithm for $m=1$ and $p \rightarrow \infty$, a subexponential algorithm for all finite fields with a time complexity of the form $L_{q}(1 / 2, c)$ for $q \rightarrow \infty$ with $\sqrt{2} \leq c \leq 2$ is obtained.

The most prominent variants of the index-calculus method, which lead to even
better conjectured time complexities, are the number field sieve and the function field sieve. Initially, the number field sieve was introduced for the factorisation of integers [Len+93; Rie94], but has been adapted to deal with the discrete logarithm problem by Gordon [Gor93]. It has been subsequently further studied and improved [Sch93; Sch+96; JL03; Sch05; Jou +06 ; Sch08]. As long as $m$ does not grow too fast, such that $m \leq o(\log q / \log \log q)^{1 / 3}$, it has been shown that the following conjectured expected time complexity is achievable

$$
L_{q}\left(1 / 3,(64 / 9)^{1 / 3}\right),
$$

for $q \rightarrow \infty$. For special cases of prime fields, where $p=2^{n} \pm 1$, a conjectured expected time complexity of $L_{p}\left(1 / 3,(32 / 9)^{1 / 3}\right)$ is obtainable [Sch10].

The function field sieve has been proposed by Adleman [Adl94] and further improved since then [AH99; Sch02; JL02; Gra+04; JL06; Sch08]. For the case that $p \leq m^{o(\sqrt{m})}$ and $q \rightarrow \infty$, it has been conjectured that the running time has an expectancy value of

$$
L_{q}\left(1 / 3,(32 / 9)^{1 / 3}\right) .
$$

As a special case of the function field sieve the Coppersmith algorithm [Cop84; Odl85] is considered. It has been devised for finite fields with binary characteristic so that $p=2$. The conjectured expected value for the running time accounts for $L_{2^{m}}(1 / 3, c)$, where $(32 / 9)^{1 / 3} \leq c \leq(4)^{1 / 3}$.

More recently, it has been shown that the gap that existed between the number field sieve and the function field sieve in terms of field sizes, has been closed, so that it is now possible to construct, for all finite fields, an algorithm that runs in conjectured expected running time of $L_{q}(1 / 3, O(1))$ for $q \rightarrow \infty$ [JL06; Jou +06 ; Sch08].

The size of the factor base for all of the variants of the index calculus method, is approximately in the order of the corresponding running times, and thus is an important factor that cannot be neglected [Odl85; Sch08; Gal12].

### 3.5 Conclusion

The interest in the discrete logarithm is two-sided. On one hand, it is assumed that the computation of the discrete logarithm is hard in certain groups and this is exploited in many cryptographic applications. On the other hand, the efficient correction of single-bit errors based on cyclic codes, as described in Chapter 2, requires the discrete
logarithm to be easily computable in relevant groups. These two applications have conflicting interests and research progress in favour of either may have negative implications for the other.

The discrete logarithm problem can efficiently be solved for the finite cyclic group $\left(\mathbb{Z}_{n},+\right)$ under addition modulo $n$. However, it is currently unclear as to whether this may also apply to certain other groups. It has been shown that if no special properties of the group element encodings are assumed, the fastest constructable algorithm requires $\Omega(\sqrt{p})$ group operations to compute the discrete logarithm, where $p$ is the largest prime factor of the group order. However, this assumption may not hold for all groups of practical interest, leaving open the possibility of algorithms that improve on the lower bound.

Shank's algorithm is a time-space trade-off and thus is only practically applicable to groups of small order. Pollard's rho algorithm has constant space requirements and an expected runtime in the order of the square root of the group order, if the simulated walk in the underlying group is assumed to be random. The Silver-Pohlig-Hellman algorithm reduces the initial discrete logarithm problem effectively into subgroups, whose orders correspond to the prime factors of the initial group order. However, the most powerful algorithm for the computation of discrete logarithms is the indexcalculus method which can be applied to multiplicative groups of finite fields. It is a probabilistic algorithm with an expected subexponential running time and space requirements in the same order.

Two new solutions are proposed to the discrete logarithm problem for certain groups in Chapter 6 and Chapter 7.

## Chapter 4

## SpiNNaker

The functioning of the human brain still remains a mystery and is under constant investigation [FT07; Fur12]. Within the brain the main cell type that processes information is believed to be the neuron and an average human brain consists of about $10^{11}$ of these neurons interconnected to form a large neural network [Kan+12; DA01]. Neurons receive input signals from other neurons in the form of electrical impulses; these are often modelled as spike events in artificial neural networks. Within a neuron received spikes may trigger the emission of a new outgoing spike, or a sequence of spikes, which are transmitted to a set of downstream neurons to contribute to their input. It is believed that the timing with which neurons emit spikes (fire) is one of the key principles that underpins how information is represented and processed in the brain [TFM96; Mao+01].

A connection between two neurons is established through a synapse, permitting the transmission of signals from one neuron to the other. Each synapse has characteristics which determine how signals are relayed, including their specific timing. The strength of a connection is a further important synaptic property, describing the magnitude of a relayed signal on the receiving neuron, and is often modelled as a single numerical value: the synaptic weight. It is estimated that about $10^{15}$ synapses can be found in the human brain [Kan+12]; these are dynamic, strengthening or weakening neural connections over time. This dynamism extends to synaptic connections which may be completely removed, or new connections which may arise between neuron pairs. This whole dynamic process that governs the neural network is known as synaptic plasticity and it is believed that it is central to biological processes of a higher level, such as learning or memory.

The understanding of the operating mechanisms of the brain is presently far from
complete. For this reason, the SpiNNaker project [FT07; FB09; Fur12; Fur+12] has been initiated to support the quest to understand how the brain works, and in the hope of finding new and more efficient ways of computing-inspired by biology. It aims to deliver a research platform for the large-scale simulation of arbitrary spiking neural networks and operate in biological real-time. SpiNNaker is a spiking neural supercomputer architecture that is tailored to support the efficient simulation of real-time networks of a billion neurons; the scale and complexity of the simulation depends on the neuron and synapse models used. In the billion neuron case each neuron receives a biologically plausible average input from 1,000 other neurons with a mean neuron firing rate of 10 Hz . Although simulation of spiking neural networks is a fairly specific task, SpiNNaker is not limited to this computational scope, as the following architectural description will make clear.

### 4.1 Architecture

The SpiNNaker architecture has been devised to facilitate a massively-parallel computing platform consisting of a million processors, primarily for the real-time simulation of large-scale spiking neural networks. With respect to the target machine, particular emphasis has been laid upon a power-efficient and fault-tolerant design. The central element of the architecture is the SpiNNaker chip, which is a custom-designed MultiProcessor System-on-Chip (MPSoC), fabricated on a 130 nm CMOS process [Fur+12]. A 128 MiB off-die mobile Double Data Rate (DDR) Synchronous Dynamic Random Access Memory (SDRAM) is stacked and connected on top of the MPSoC, forming a System-in-Package (SiP), photographed in Figure 4.1. A SpiNNaker die is depicted in Figure 4.2 and a schematic of the SpiNNaker MPSoC in Figure 4.3.

The MPSoC incorporates 18 processing subsystems that are built around lowpower ARM968 processing cores with tightly-coupled local memories of 32 KiB for instructions, and 64 KiB for data storage as illustrated in Figure 4.4. Each processing subsystem is also equipped with a Direct Memory Access (DMA) controller that manages block transfers between the local subsystem data and the SDRAMs. In addition to using the SDRAM as the source or target for the DMA data transfers, other shared resources on the chip can transparently be selected such as the System RAM or the System ROM. Each processing subsystem is complemented by an interrupt controller, a communication interface, and two timers/counters.


Figure 4.1: SpiNNaker MPSoC with stacked SDRAM on top. 3D packaging by UNISEM (Europe) Ltd.


Figure 4.2: SpiNNaker Die.


Figure 4.3: SpiNNaker chip schematic.

An important and innovative feature of the SpiNNaker chip is its custom communication system. At its heart is the multicast packet router, which relays packets between the processors on the chip, and to the routers of six neighbouring chips through external links. The interconnect fabric that ties together both on-chip processors and external links to the router is a self-timed Network-on-Chip (NoC), referred to as the Communications NoC. Similarly, the System NoC, a second, independent self-timed interconnect fabric, is used to allow the processor subsystems access to shared resources on the chip including the SDRAM. By employing asynchronous interconnection systems, the SpiNNaker chip follows the Globally Asynchronous, Locally Synchronous (GALS) design paradigm which eliminates the requirement to distribute a global synchronous clock signal across all the cores in a system [BF02; $\mathrm{Pla}+07 ; \mathrm{Pla}+11]$. Benefits also arise within the design of chip, as the processor subsystems are decoupled from one other, and from the rest of the on-chip system. The implementation process of the chip is therefore eased, as timing issues are limited to smaller areas of the chip. A further communication interface integrated into the SpiNNaker chip is an optional Ethernet link. It is primarily intended for the connection of a host system for configuration and monitoring purposes, and is deployed to a limited


Figure 4.4: Processor subsystem schematic.
number of nodes as depicted in Figure 4.5.

### 4.2 System

SpiNNaker chips will be used to compose large programmable computing systems, in the first instance for the simulation of spiking neural networks at biological real-time. Since each chip can be interfaced to six others, it is possible to form a triangular grid of chips as one configuration. It is then intended to connect this grid into a toroid as illustrated in Figure 4.5. The advantage of such a constellation is that packets can be rerouted with only one additional hop around links that are congested or that have been detected as broken. This fault-tolerance feature of the SpiNNaker architecture is known as emergency routing.

It is intended to build a SpiNNaker system consisting of 57,600 chips that will include more than a million processing cores [Fur12]. Depending on the neuron and the synapse models selected, this should be sufficient to model approximately a billion biologically plausible neurons. Neural simulations on a SpiNNaker system operate with processing cores executing neuron simulations according to the required neuron


Figure 4.5: SpiNNaker system.
model. A neuron may, if certain conditions are met, emit a spike which is represented in the system as a short packet of 40 bits. 32 bits of the packet are reserved for the identification of the neuron emitting the spike, and 8 bits are used to carry further routing control information. Such spike packets are released from the processing cores into the routing fabric that handles the distribution and bifurcation to all target neurons connected to the emitting neuron, according to information stored in the routing tables of the routers. If a spike has been transmitted to a processing core modelling a destination neuron, then a DMA transfer is initiated to retrieve the corresponding synaptic parameters for that connection including its synaptic weight and the associated connection delay. This data is transferred by DMA from SDRAM to the local memory of the processing core. It is then possible to calculate the effect of the input spike on the target neuron via its synaptic connection completing the cycle of a basic neural simulation.

### 4.3 Memory

The large SpiNNaker system in its envisaged configuration of 57,600 nodes will incorporate in the order of 7 TiB of SDRAM. With such an immense amount of memory, the effect of data bit errors is significant during the operation of the machine.

Memory bit errors are subdivided into two different classes: hard and soft errors
[ZL79; Zie96; Zie+96]. A hard error is characterised by a permanent hardware fault in a memory cell that will result in a consistent reliability issue. For instance, it may be the case that a memory cell will always provide one particular bit value during readout, no matter what value has been written to it. Soft errors are transient faults that occur randomly and may, for example, be induced through cosmic rays or the decay of radioactive atoms in the memory packaging materials. Also, a soft error may arise either directly in the memory, or along the data path during the memory read or write phase.

Recently, a large-scale study has been conducted to investigate statistics for error rates in Dynamic Random Access Memory (DRAM) in production systems [SPW09]. It suggests that the average error rate ranges from 25,000 to 75,000 FIT (failures in time per billion hours of operation) per Mibit, however a distinction between hard and soft errors is not made. If these numbers are applied to the SpiNNaker system of 57,600 nodes, 25 to 74 bit errors on average can be expected to occur within the SDRAM per minute, roughly approximated as one bit error per second.

It may be the case that the number of expected bit errors in the SpiNNaker system will not have a significant impact on particular applications such as neural network simulations. However, it is not known to what extent neural network simulations can compensate for memory faults, and other potential applications may not tolerate bit errors at all, so appropriate measures need to be taken to deal with them in the SpiNNaker system. For this reason error-control codes are employed within SpiNNaker to provide a layer of protection against memory faults.

### 4.4 CRC Unit

The DMA controller of each processing subsystem has been equipped with a CRC unit that allows the generation and verification of error-control codes. The circuit primarily supports cyclic codes as they offer powerful error detection and correction capabilities and as they are, at the same time, easily implementable in hardware [LC83]. If, for instance, a processor initiates a DMA transfer to copy a data block from the local memory to the SDRAM, the CRC unit can be instructed to calculate (transparently and in parallel) the redundancy part for a cyclic code and, automatically, append this to the SDRAM data block. The CRC unit can be used to calculate the error syndrome for a data block retrieved from memory and signal the corresponding processing core if an integrity issue arose. The program that is executed on the
processing core has to decide what action is to be taken in the event of a detected data inconsistency. A simple retransmission of the data block could correct the error if it occurred along the data path during the readout phase, however even this may not be fast enough for the 'real-time' operation of a SpiNNaker neural simulation. Therefore it is necessary to consider appropriate error correction procedures in software, to recover from memory faults based on the obtained error syndrome, including when they are uncorrectable. These can range from a simple disregard of the error, through a localisation and correction of the error, to a shutdown of the relevant SpiNNaker system components for replacement if hard errors are involved.

In the choice of employed cyclic code, many factors need to be taken into account for the selection of the generator polynomial as outlined in Subsection 2.3.4. For instance, certain undiscovered subclasses of cyclic codes may allow the realisation of very efficient error correction procedures in software, or data blocks of different lengths may be stored in the SDRAM so that a polynomial offering best combined error protection for all of the block lengths should be selected. To offer maximal flexibility within SpiNNaker, a programmable CRC circuit has been incorporated that permits switching the generator polynomial to any of degree 32 or lower whenever required. A direct advantage is that the polynomial is adaptable to the length of the data block that is to be protected, which means that the best choice of offered error protection can be made. Another feature of the CRC circuit is that several cyclic codes of a smaller degree can be generated based on different bits of the data stream. For example, for each half-word of the data stream, a cyclic code based on a generator polynomial of degree 16 can be computed.

The width of the data bus that traverses the DMA controller in SpiNNaker is 32 bits, and the CRC unit has been designed to process this number of bits in parallel to avoid being a bottleneck to DMA data transfers. To configure the unit for the usage of a cyclic code or any other supported error-control code, one Kibit of configuration data needs to be supplied by the corresponding processing core to the appropriate registers inside the unit. Since the data bus is used to provide this configuration data, the transfer takes place as a series of 32 bit words. The registers are realised as latches to reduce the hardware demand, as each SpiNNaker chip accommodates one CRC unit for each of the 18 DMA controllers (one per processor subsystem). A detailed description of the SpiNNaker CRC circuit together with its derivation and capabilities can be found in the next chapter.

### 4.5 Conclusion

The SpiNNaker architecture has been created to support large-scale simulations of spiking neural networks in biological real-time. It has been dimensioned to scale up to machines consisting of a million processors with SDRAM totalling about 7 TiB. With such a vast amount of memory, it is estimated that, on average, about 1 bit error per second will occur.

To improve the reliability of memory transfers, SpiNNaker employs cyclic codes for error control due to the efficient realisation of the code generation and verification circuit. Once inconsistencies are detected for a block of data, software procedures may be triggered to attempt an error recovery. The correction of a single-bit error on the basis of a cyclic code, essentially requires the computation of the discrete logarithm in relevant groups as described in Chapter 2. However, no efficient algorithm is known for the computation of this type of discrete logarithm as discussed in Chapter 3. Two new solutions for the computation of the discrete logarithm in certain groups are proposed in Chapter 6 and Chapter 7.

The optimal choice of cyclic code to employ is influenced by many factors including the length of the data that is to be protected and the desired error control capabilities. Therefore, programmable cyclic code circuits are employed within SpiNNaker to maximise flexibility in the choice of cyclic code for different scenarios. A novel method for the generation of efficient programmable cyclic code circuits is proposed in Chapter 5.

## Part III

## Contributions

## Chapter 5

## Programmable CRC Hardware

Cyclic codes constitute a powerful class of error-control code as set out in Section 2.3. They offer effective error detection and can easily be realised in hardware, which makes them a popular choice for many applications that require the detection of errors, including Ethernet [TW11]. In the context of pure error detection, a cyclic code is often referred to as a Cyclic Redundancy Checksum (CRC) and the popularity of cyclic codes has led to a number of different software and hardware implementations [LC83; RG88]. Speed requirements usually make software schemes impractical and dedicated hardware is needed. The generic hardware approach uses an inexpensive Linear Feedback Shift Register (LFSR), which assumes serial data input. In the presence of wide data buses, the serial computation has been extended to parallel versions that process whole data words based on derived equations [AS90; PZ92; CPR03; Shi +01 ] and on cascading the LFSR [Spr01]. Various optimisation techniques have been developed that target resource reduction [Bra+96] and speed increase [Der01; CP06; KRM08; KRM09].

A wide range of factors influence the selection of an appropriate CRC generator polynomial for a particular application as outlined in Subsection 2.3.4. This range includes the error detection and correction capabilities of a generator polynomial, which depend on the length of the data that is to be protected. For instance, in scenarios where data blocks of different length are used, or where the final requirements of the generator polynomial are not known at the time of the hardware implementation, it is beneficial to employ programmable CRC circuits that can be configured to different generator polynomials.

This applies precisely to the SpiNNaker project, whose target is to provide a research platform for the simulation of arbitrary spiking neural networks as described
in Chapter 4. The planned large SpiNNaker system will incorporate a substantial amount of SDRAM to hold relevant data for the neural network simulations; in this context, CRCs are used to reduce the effect of data errors arising in the memory. Since the length of the data blocks may vary between different neural network simulations, for instance, it has been decided to incorporate programmable CRC circuits into the SpiNNaker system to offer maximal flexibility.

With the design of a circuit, there is usually a tradeoff between speed and area. In this particular scenario, there are two dimensions to the speed of a programmable CRC circuit: the time necessary to process a data word, and the time required to reconfigure to a new polynomial. This chapter directly extends the parallel CRC circuit by Campobello et al. [CPR03] based on state space representation in several ways. On the one hand, restrictions between the width of the data processed in parallel and the order of the polynomial are lifted. On the other hand, a novel scheme is presented allowing the inexpensive computation of the CRC transition and control matrix in hardware. This leads to a programmable parallel CRC implementation that offers an improved balance between area and both dimensions of speed.

### 5.1 From Serial to Parallel

Where systems use wide data buses, it is advantageous for CRC circuits to operate on data words and many approaches have been made to address this issue. Albertengo and Sisto [AS90] derived equations, in 1990, for a parallel CRC circuit with automatic premultiplication based on the Z-transform. A simpler method utilising state space transformation leading to basically the same circuit was published two years later by Pei and Zukowski [PZ92]. In 2003, Campobello et al. [CPR03] developed a similar proof for the parallel CRC circuit without automatic premultiplication, under the assumption that the order of the polynomial and the length of the message are both multiples of the number of bits to be processed in parallel, and reported a recursive formula for calculating powers of the state transition matrix.

In this section, the equations for the circuit with automatic premultiplication are derived; the principle of the derivation is very similar to the variant without premultiplication. Furthermore, the proof is extended in such a way that there will be no restriction on the order $m$ of the polynomial or the number of bits $w$ that are to be processed in parallel. The parameters are unrelated; it is only assumed that the $k$-bit message that is to be encoded can be split into data words of $w$ bits, which will
usually be the case in computer systems.
The starting point for the derivation of the parallel CRC circuit for a generator polynomial $p(X)$ is the LFSR with automatic premultiplication by $X^{m}$ for a serial data input as shown in Figure 2.6. The $m$ least significant coefficients of $p(X)$ are aggregated into the coefficient vector $p=\left[p_{m-1}, \ldots, p_{1}, 1\right]^{T}$. Since the LFSR is a discrete time-invariant linear system, it can be expressed as:

$$
\begin{equation*}
s[i+1]=T s[i]+p \bar{d}[i] \tag{5.1}
\end{equation*}
$$

where

$$
T=\left[p \left\lvert\, \frac{I_{m-1}}{0}\right.\right]=\left[\begin{array}{ccccc}
p_{m-1} & 1 & 0 & \cdots & 0  \tag{5.2}\\
p_{m-2} & 0 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
p_{1} & 0 & 0 & \cdots & 1 \\
1 & 0 & 0 & \cdots & 0
\end{array}\right]
$$

$I_{m-1}$ denotes the identity matrix of size $m-1$ and $s[i]$ is the state of the system at time step $i$, which is equivalent to the corresponding $m$-bit LFSR value. The scalar input to the system is denoted by $\bar{d}[i]$. In each time step $i$, one bit of the $k$-bit message $u$ is shifted into the system, starting from the most significant bit, and thus $\bar{d}[0]=u_{k-1}$, $\bar{d}[1]=u_{k-2}$ and so forth. It can be verified that the solution for system (5.1) takes the following shape:

$$
\begin{equation*}
s[i]=T^{i} s[0]+\left[T^{i-1} p, \ldots, T p, p\right][\bar{d}[0], \ldots, \bar{d}[i-1]]^{T}, \tag{5.3}
\end{equation*}
$$

with $s[0]$ being the initial state of the LFSR.
A simplified way exists to obtain $T^{i}$ from $T^{i-1}$ :

$$
\begin{align*}
T^{i} & =T^{i-1} T \\
& =T^{i-1}\left[p \left\lvert\, \frac{I_{m-1}}{0}\right.\right] \\
& =\left[T^{i-1} p \left\lvert\, T^{i-1} \frac{I}{I_{m-1}}\right.\right] \tag{5.4}
\end{align*}
$$

where $i$ starts from 2. Expanding $T$ to the power of $w$ with the help of (5.4) leads to:

$$
T^{w}= \begin{cases}{\left[\left[T^{w-1} p, \ldots, T p, p\right] \left\lvert\, \frac{I_{m-w}}{0}\right.\right]} & \text { if } w \leq m  \tag{5.5}\\ {\left[T^{w-1} p, \ldots, T^{w-m} p\right]} & \text { otherwise. }\end{cases}
$$

The columns of $T^{w}$ that drop out in the case where $w$ exceeds $m$ are combined in the auxiliary rectangular matrix

$$
T_{w}:=\left[T^{w-m-1} p, \ldots, T p, p\right] .
$$

In order to obtain the LFSR value after $w$ bits have been processed, $s[w]$ simply needs to be evaluated. For this purpose the $w$-dimensional data input vector $d[t]=$ $\left[u_{k-1-t}, \ldots, u_{k-w-t}\right]^{T}$ is introduced. Then (5.3) becomes

$$
\begin{equation*}
s[w]=T^{w} s[0]+\left[T^{w-1} p, \ldots, T p, p\right] d[0] . \tag{5.6}
\end{equation*}
$$

Two basic cases can be differentiated:

## Case $w \leq m:$

$$
\begin{aligned}
s[w] & =T^{w} s[0]+\left[\left[T^{w-1} p, \ldots, T p, p\right] \left\lvert\, \frac{I_{m-w}}{0}\right.\right]\left[\frac{d[0]}{0}\right] \\
& =T^{w} s[0]+T^{w}\left[\frac{d[0]}{0}\right] .
\end{aligned}
$$

Considering additionally that the system is time-invariant, the behaviour of the circuit can be described as:

$$
\begin{equation*}
s[i+w]=T^{w}\left(s[i]+\left[\frac{d[i]}{0}\right]\right) . \tag{5.7}
\end{equation*}
$$

The special case of $w=m$ leads to the compact form:

$$
\begin{equation*}
s[i+w]=T^{w}(s[i]+d[i]) . \tag{5.8}
\end{equation*}
$$

## Case $w>m$ :

$$
\begin{equation*}
s[i+w]=T^{w} s[i]+\left[T^{w} \mid T_{w}\right] d[i] . \tag{5.9}
\end{equation*}
$$

The result of (5.7) and (5.9) can be condensed into a single equation:

$$
\begin{equation*}
s[i+w]=\left[T^{w} \mid T_{w}\right]\left(\left[\frac{s[i]}{0}\right]+\left[\frac{d[i]}{0}\right]\right) . \tag{5.10}
\end{equation*}
$$

As an example, generator polynomial $p(X)=X^{4}+X^{3}+X+1$ is selected. It is


Figure 5.1: Programmable parallel CRC circuit with $w=m$ for CRC bit $s_{i}$ utilising control latches.
intended to process four bits in parallel $(w=4)$. Consequently

$$
T=\left[\begin{array}{llll}
1 & 1 & 0 & 0  \tag{5.11}\\
0 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 \\
1 & 0 & 0 & 0
\end{array}\right], \quad T^{4}=\left[\begin{array}{llll}
0 & 0 & 1 & 1 \\
0 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 \\
0 & 1 & 1 & 1
\end{array}\right]
$$

According to (5.8), the necessary logic can be directly assembled with the help of (5.11). The time step indices will be dropped in the following where not needed for simplification. Matrix entries of $T^{w}$ are numbered from $m-1$ to 0 , where the top left most element is denoted by ( $m-1, m-1$ ). Thus, an entry $t_{i, j}$ in matrix $T^{w}$ indicates that $s_{j} \mathrm{XOR} d_{j}$ is an input to the XOR forming the new value of $s_{i}$ one clock cycle later.

### 5.2 From Static to Programmable

The parallel CRC architecture from the previous section can be transformed into a programmable entity that is no longer bound to a specific CRC generator polynomial $p(X)$. A polynomial directly affects the transition and control matrix $\left[T^{w} \mid T_{w}\right]$ of the linear system (5.10). Programmability can be achieved by introducing an AND gate with a controlling latch for each signal that may be a potential input to an XOR function as illustrated in Figure 5.1. Flipflops can be utilised as well, but will have 'in general' higher demands in terms of area, which may become crucial as $m \max (m, w)$ bits need to be stored.

The derivation of the matrix necessary to set up all the latches can be performed
in software. As the data bus width imposes a limitation on the transferable data, the matrix may need to be communicated line by line, which requires $m$ clock cycles. Additionally, the software function itself relies on a processor. Many scenarios may even imply a dedicated core for this task, if the polynomial needs to be changed frequently and faster than the matrix can be communicated over the data bus.

In the case of the SpiNNaker architecture which is described in more detail in Chapter 4, it has been decided to employ this variant of the programmable CRC circuit, as it offers wide flexibility in terms of codes to which it can be configured. The number of cyclic codes that this circuit supports accounts for $2^{m-1}$. This can easily be seen from the fact that a cyclic code generator polynomial $p(X)$ needs to divide $X^{n}+1$ for an integer $n$. Such an $n$ exists, as long as $p(0) \neq 0$ [GG05], which is fulfilled for every $p(X)$ due to its constant term, so that the degree of freedom for the configurable cyclic codes equals ( $m-1$ ). However, the matrix that is supplied to the circuit exhibits $m \max (m, w)$ degrees of freedom, which is clearly more than for the case of the cyclic codes that the circuit supports.

To see how the redundancy part $s[k]$ for a $k$-bit data message is calculated by the circuit for an arbitrary binary matrix $\left[T^{w} \mid T_{w}\right]$, the effect of the first $w$ message bits is considered at first, so that $s[w]$ can be computed according to (5.10) as

$$
s[w]=\left[T^{w} \mid T_{w}\right]\left(\left[\frac{s[0]}{0}\right]+\left[\frac{d[0]}{0}\right]\right)=T^{w}{ }^{s}[0]+\left[T^{w} \mid T_{w}\right]\left[\frac{d[0]}{0}\right],
$$

with $s[0]$ being the initial state of the LFSR. If this process is continued with the subsequent input data words, $s[k]$ can be obtained as follows

$$
s[k]=T^{k} s[0]+\sum_{i=0}^{k / w-1} T^{k-(i+1) w}\left[T^{w} \mid T_{w}\right]\left[\frac{d[i w]}{0}\right] .
$$

For the case that $w \leq m$, the formula can be simplified to

$$
s[k]=T^{k} s[0]+\sum_{i=0}^{k / w-1} T^{k-i w}\left[\frac{d[i w]}{0}\right] .
$$

One alternative sensible configuration of the circuit is obtained if several cyclic codes are calculated for independent bits of the data stream. For example, it is possible to employ two cyclic codes, each with a generator polynomial of degree $m / 2$ and each designated for a half-word of the data input. If the general case is considered, the
configuration matrix $\left[T^{w} \mid T_{w}\right]$ takes the following shape

$$
\left[T^{w} \mid T_{w}\right]=\left[\begin{array}{ccccc}
{\left[\bar{T}^{\bar{w}} \mid \bar{T}_{\bar{w}}\right]} & 0 & 0 & \cdots & 0 \\
0 & {\left[\tilde{T}^{\tilde{w}} \mid \tilde{T}_{\tilde{w}}\right]} & 0 & \cdots & 0 \\
0 & 0 & {\left[\breve{T}^{\breve{w}} \mid \breve{T}_{\check{w}}\right]} & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & {\left[\hat{T}^{\hat{w}} \mid \hat{T}_{\hat{w}}\right]}
\end{array}\right]
$$

The sum of the degrees of the individual cyclic codes cannot exceed $m$. Similarly, the sum of the numbers of the bits that are to be processed in parallel for the cyclic codes is bounded by $w$. Furthermore, it needs to be ensured that the top left element of each small configuration matrix is aligned with the diagonal of $\left[T^{w} \mid T_{w}\right]$, as the relevant state bits need to be fed back to the leftmost columns of the matrix.

### 5.2.1 Proposed Method

The chosen approach to construct efficient programmable cyclic code circuits is to outsource the matrix derivation into hardware if the circuit is supposed to be used exclusively for cyclic codes. As there is no computational effort involved to obtain the identity matrix in (5.5) for the case $w<m$, it is assumed that $w \geq m$, and thus

$$
\left[T^{w} \mid T_{w}\right]=\left[T^{w-1} p, \ldots, T p, p\right] .
$$

A new recursive formula for a column $T^{i} p$ with $i \geq 1$ is established as follows

$$
\begin{align*}
T^{i} p & =T T^{i-1} p \\
& =\left[p \left\lvert\, \frac{I_{m-1}}{0}\right.\right] T^{i-1} p \\
& =p e_{1}^{T} T^{i-1} p+\left[0 \left\lvert\, \frac{I_{m-1}}{0}\right.\right] T^{i-1} p, \tag{5.12}
\end{align*}
$$

where $e_{1}=[1,0 \ldots, 0]^{T}$ is the unit vector. Hence, each column element $\left(T^{i} p\right)_{j}$ can easily be computed with the help of the previous column $T^{i-1} p$ and $p$ :

$$
\left(T^{i} p\right)_{j}= \begin{cases}p_{j}\left(T^{i-1} p\right)_{m-1}+\left(T^{i-1} p\right)_{j-1} & \text { if } j \neq 0  \tag{5.13}\\ p_{j}\left(T^{i-1} p\right)_{m-1} & \text { otherwise }\end{cases}
$$

For an implementation in hardware, an ( $m-1$ )-bit register needs to be provided


Figure 5.2: Programmable parallel CRC circuit for the case $w>m$. Corresponding elements of [ $T^{w} \mid T_{w}$ ] are indicated with black pixels in the rectangular representing the matrix.
to hold the coefficients of the polynomial; this register already forms the rightmost column of $\left[T^{w} \mid T_{w}\right]$. Each other column can then be obtained with $m$ AND gates and $m-1$ XOR gates. The column element $\left(T^{i} p\right)_{0}$ of a column $i$ can be obtained through an AND gate that takes as inputs the polynomial coefficient $p_{0}$, and the column element $\left(T^{i-1} p\right)_{m-1}$ of the previous column. For every other element $\left(T^{i} p\right)_{j}$ of column $i$, polynomial coefficient $p_{j}$ and $\left(T^{i-1} p\right)_{m-1}$ need to be fed into an AND gate, before combining its result with column element $\left(T^{i-1} p\right)_{j-1}$ in an XOR gate.

It is possible to reduce the area with equivalent logic as illustrated in Figure 5.2, which additionally shows the attached CRC circuitry. Apart from the column storing the polynomial, each other column requires $m-1$ NAND gates, $m-1$ XNOR gates and 1 AND gate. The controlling AND gates have been replaced with NAND gates under the assumption that $w$ is even. Inverting an even number of inputs to an XOR function does not affect the result of the function.

A circuit dimensioned for a certain polynomial degree $m$ can be used to calculate CRCs for a polynomial of smaller degree $r$. This can be achieved by providing the polynomial premultiplied by $X^{m-r}$. Additional multiplexing circuitry is required to switch between different data input widths, as the most significant bits of the data $d$ and the state of the system $s$ need to be aligned, when being combined by the bitwise

XOR function according to (5.10).
The proposed circuit allows a further improvement if only generator polynomials $p(X)$ of degree $m$, for which the circuit is dimensioned, are used. In this case $p_{0}=1$, since every cyclic code generator polynomial exhibits a constant term. This implies that the $(w-1)$ AND gates can be omitted in the circuit by setting the value for the matrix element $\left(T^{i} p\right)_{0}$ to $\left(T^{i-1} p\right)_{m-1}$, for $i>0$.

### 5.3 From Theory to Silicon

With the scheme from the previous section it is possible to replace each latch of the programmable CRC circuit (except for the first column that holds the generator polynomial) with a NAND and XNOR gate, which can compute the necessary value of the latch. The $w-1$ latches corresponding to the least significant LFSR bit can each be replaced with only an AND gate. If the circuit is only intended for the use of generator polynomials of degree $m$, not even the AND gates will be required. Several 130-nm standard cell libraries indicate a saving of about $6-7 \%$ in logic gate area for a NAND plus XNOR gate in comparison to a latch. Further savings arise from the much smaller AND gate area, if required at all, and irrelevant latch select logic.

To assess the performance of the new circuit in Figure 5.2, it is necessary to consider two different paths. Assuming that the generator polynomial $p(X)$ is already set up, the logic gate delay for the data (from $d$ to $s$ ) adds up to

$$
\begin{equation*}
T_{D S}=\left(\left\lceil\log _{2} w\right\rceil+1\right) T_{X O R}+T_{N A N D}, \tag{5.14}
\end{equation*}
$$

where $T$ is a logic gate or path delay. Changing the polynomial, on the other hand, affects a longer path. The difference between $T_{P}$ and $T_{D S}$ accounts for

$$
\begin{equation*}
\Delta T_{P}=T_{L A T C H}+(w-1)\left(T_{X N O R}+T_{N A N D}\right)-T_{X O R} \tag{5.15}
\end{equation*}
$$

Appending a CRC value to a message will typically require a data stream to stall for at least one clock cycle. Hence, this clock cycle can be used to provide a new generator polynomial for the subsequent message. This means that the polynomial has two clock cycles to propagate through the entire circuit.

The same behaviour can be achieved on the message receiving side. Instead of inputting the received CRC as the last data word into the circuit, and checking the result for 0 , the received and calculated CRC value can be compared directly. Again,


Figure 5.3: Fully routed layout of the proposed programmable parallel CRC circuit.
this would free up an extra clock cycle that can be used to set up a new polynomial.

Without any further clock cycles, the frequency of the circuit would be limited to $f_{\text {worst }}=1 / \max \left(T_{D S}, \Delta T_{P}\right)$. To achieve best case frequency $f_{\text {best }}=1 / T_{D S}$, additional $\left\lceil\Delta T_{P} f_{\text {best }}\right\rceil-1$ clock cycles would be necessary between messages to propagate $p(X)$. Alternatively, the number of clock cycles can be reduced by providing $c$ columns of matrix [ $T^{w} \mid T_{w}$ ] instead of only one to split up $\Delta T_{P}$. This requires more area as these columns need to be stored in latches again. In the extreme case the individual paths have a length of $\Delta T_{P i}=i T_{D S}$, for, $i=1, \ldots, c$.

The design has been simulated for $m=w=32$ targeting 130-nm high-speed standard cell technology using Synopsys, Inc., synthesis tools with the resulting fully routed layout shown in Figure 5.3. The simulation results in Table 5.1 were obtained assuming a typical-typical process corner and operating conditions of 1.2 V and $25^{\circ} \mathrm{C}$. These are compared to a previous design [Toa+09], which will be discussed in the following section. Better performance is anticipated with a full-custom design that will further exploit the regular structure of the circuit and the omission of the AND gates as described earlier.

Table 5.1: Programmable CRC circuit implementation comparison

|  | Cell Array [Toa+09] | Novel Circuit | Result |
| :--- | :--- | :--- | :---: |
| Clock Frequency | 154 MHz | 481 MHz |  |
| Data Throughput | 4.92 Gbps | 15.38 Gbps | $+212.70 \%$ |
| Reconfiguration | 33 clock cycles | 4 clock cycles |  |
|  | 214.29 ns | 8.32 ns | $-96.12 \%$ |
| Core Area | $0.150 \mathrm{~mm}^{2}$ | $0.033 \mathrm{~mm}^{2}$ | $-78.00 \%$ |
| Core Utilization | Not specified | $96.13 \%$ |  |
| Total Power | 5.70 mW | 6.37 mW |  |
| Internal Power | 3.42 mW | 3.69 mW |  |
| Switching Power | 2.19 mW | 2.67 mW |  |
| Leakage Power | 0.0896 mW | 0.0077 mW |  |
| Energy ${ }^{\mathrm{a}}$ | $63 \mathrm{pJ} /$ word | $14 \mathrm{pJ} /$ word | $-77.78 \%$ |

${ }^{\text {a }}$ Based on the setup of a polynomial with a subsequent CRC calculation for 47 data words.

### 5.4 Comparison

A programmable parallel CRC architecture was recently proposed [Toa +09 ], which is referred to as the cell array architecture. It incorporates additional circuitry to switch between two different data input widths, which is considered in the following critical path and area analysis.

The main component of the cell array is a configurable array of $m \max (m, w)$ cells, each consisting of an XOR, two multiplexers, and a configuration register. A preliminary stage of XOR gates combines data with the current state of the system, which is then fed into the array. Furthermore, a configuration processor is integrated, which performs matrix multiplications to obtain the state transition matrix for a provided generator polynomial. The matrix is transferred row-wise into the configuration registers of the array.

For the basic CRC calculation, both architectures require the same number of two-input XOR gates. The present work however, also allows the utilisation of wider and proportionally smaller XOR gates that will assemble $m$ trees each with $w$ inputs.

Considering the logic that is necessary for programmability, each cell in the array constitutes two multiplexers and one register. In the new design, this corresponds in the general case to two NAND gates and one XNOR gate, in $m$ cases to one NAND gate and one latch, and depending on the implementation, in $w-1$ cases to only one

NAND gate and one AND gate or just one NAND gate. In all cases this is typically less than for a cell in the cell array design. More area is saved as there is no need for a processor.

The worst-case data path in the cell array can be specified as follows:

$$
T_{D S 2}=w\left(T_{X O R}+T_{M U X}\right) .
$$

This linear growth is inferior to the logarithmic growth of $T_{D S}$ (5.14), which has also a reduced scaling factor by $T_{M U X}$.

The reconfiguration time of the new circuit accounts for $\Delta T_{P}$ (5.15). For the cell array, a reconfiguration time of $w+1$ clock cycles is indicated. The operating frequency of the circuit is limited to $f_{\text {best2 }}=1 / T_{D S 2}$. This means that

$$
\Delta T_{P 2} \approx w(w+1)\left(T_{X O R}+T_{M U X}\right) .
$$

Consequently

$$
\frac{\Delta T_{P 2}}{\Delta T_{P}} \in O(w) .
$$

This suggests that the new design reconfigures in the order of approximately $w$ times faster than the cell array with the configuration processor.

Both designs have been implemented targeting 130-nm standard cell technology, and are compared in Table 5.1. The new design can be operated at a frequency more than three times higher than the cell array, and has a correspondingly increased data throughput. It can reconfigure to a new generator polynomial 25 times faster than the cell array, while occupying only $22 \%$ of its area. Similarly, the energy consumption dropped by about $78 \%$.

An alternative approach in realising, at least partial, programmability is to multiplex between several CRC modules dedicated to fixed generator polynomials. This method is beneficial if only a few polynomials come into consideration, for which each module can be specifically optimised in terms of speed. Beyond a certain number of different polynomials however, which depends on the polynomials and their realisation, the area requirements will exceed those for the proposed architecture. Furthermore, the multiplexing overhead will offset the speed advantage if too many polynomials are involved.

### 5.5 Conclusion

An existing proof [CPR03] for the derivation of parallel CRC circuits has been extended to any generator polynomial size $m$ and data width $w$. The proof has been conducted for the LFSR realisation with automatic premultiplication, which avoids inserting a final zero data word.

A simple method has been presented to incorporate programmability into the circuit through latches thus allowing the generator polynomial to be changed during runtime.

Furthermore, a novel scheme has been proposed to compute the state transition and control matrix of the CRC circuit easily in hardware. The scheme is based on a new recursive formula and offers a range of advantages over existing techniques.

Firstly, it is necessary to provide only the desired generator polynomial, instead of a complete matrix for the CRC core; a preliminary matrix calculation in software is no longer required. Secondly, the logic area requirements are lower than those for a realisation that stores the matrix in latches. A recently proposed architecture [Toa+09] has significantly higher demands in terms of area as it incorporates a configuration processor, and more core logic in comparison to the latch variant. Thirdly, the data path grows only logarithmically with $w$ in contrast to the existing architecture where it grows linearly with $w$ with a higher scaling factor; this implies a faster CRC calculation. Most importantly however, the new circuit reconfigures approximately $w$ times faster than the previous circuit.

Implementation figures support the theoretical results showing a significant improvement in speed, area and energy efficiency.

## Chapter 6

## Logarithms: A Generic Algorithm

The question as to whether it is possible to compute discrete logarithms efficiently in certain cyclic groups is of major interest for many applications, notably in the cryptographic field as outlined in Chapter 3.

For some cryptographic algorithms that are in wide use, the intractability of the discrete logarithm problem in certain groups is a key requirement, as the presumed security could otherwise be compromised easily. Other applications such as event counters based on the LFSR [CW94] or the use of cyclic codes for error control would benefit greatly from a method that allows the evaluation of discrete logarithms in polynomial time.

In the case of cyclic codes, an easy computation of discrete logarithms would enable the efficient correction of single-bit errors as presented in Chapter 2. This would be extremely useful for the operation of SpiNNaker machines, as the hardware provides support for cyclic codes to protect stored data in the SDRAM. The codes can transparently be generated and verified in hardware, but in the event of a detected error occurrence, software procedures will need to step in to attempt a data recovery as described in Chapter 4.

This chapter presents a new approach for determining discrete logarithms. For analysed groups where the order equals a Mersenne number with an exponent of a power of two, a generic algorithm is obtained that can be used with any group representation, requiring execution time in the order of the square root of the size of the group and negligible space. The operating principle of the algorithm is based on size differences of cyclotomic cosets which is explained below.

### 6.1 Computing Discrete Logarithms

It is assumed that $G$ is a finite cyclic group of order $q$ with a primitive element $\alpha$. The discrete logarithm $k$ of an element $\beta \in G$ is to be determined. The sequence of the form

$$
\begin{equation*}
(\beta)^{c^{i}}, \tag{6.1}
\end{equation*}
$$

for an integer $c$ with $2 \leq c \leq q-1$ and index $i$ starting from zero, is considered. Since $G$ is a finite group, the elements in the sequence will start recurring at some point according to the pigeonhole principle, so that two indices $i, j$ of smallest possible value with $0 \leq i<j$ can be identified, where

$$
(\beta)^{c^{i}}=(\beta)^{c^{j}},
$$

holds. From the fact that $\beta=\alpha^{k}$, it follows that

$$
k c^{i} \equiv k c^{j} \quad(\bmod q)
$$

which is equivalent to $q \mid k\left(c^{j}-c^{i}\right)$. It may very well be the case that $q$ and $k$ share common factors, which leads to

$$
\frac{q}{\operatorname{gcd}(q, k)} \left\lvert\, \frac{k\left(c^{j}-c^{i}\right)}{\operatorname{gcd}(q, k)}\right.
$$

Since $q / \operatorname{gcd}(q, k)$ is not a factor of $k / \operatorname{gcd}(q, k)$, the following simplification is obtained

$$
\begin{equation*}
c^{i} \equiv c^{j} \quad(\bmod q / \operatorname{gcd}(q, k)) \tag{6.2}
\end{equation*}
$$

It can be seen that the indices $i$ and $j$ are influenced through $k$, and more precisely through the greatest common divisor of $k$ and $q$. Thus, if all possible values of $k$ are taken into account, the factorisation of the group order $q$ has a vital impact on the solutions of the congruences. The idea is now to draw interferences from the values of $i$ and $j$ about $k$, which will be described in what follows.

Group orders $q$ that equal a Mersenne number, where the exponent is restricted to a power of two, are now considered. This type of number is denoted by

$$
M_{2^{t}}=2^{2^{t}}-1
$$

Another similar type of number that will be used hereafter is the Fermat number,
which has the form

$$
F_{t}=2^{2^{t}}+1 .
$$

Attention is drawn to the special groups of order $q=M_{2^{t}}$ as properties can be derived that lead to a new algorithm for the solution of the discrete logarithm based on the relations obtained from (6.2). A second reason for the focus on these groups is that they are generated by primitive polynomials, where the degree is a power of two, which is a typical setting in computer systems. They also include the largest group that needs to be dealt with if the potential cyclic codes within SpiNNaker are considered. Since SpiNNaker permits cyclic code generator polynomials up to degree 32, the largest group in which the discrete logarithm needs to be solved has an order of $M_{32}$. The following theorem gives useful information about the structure of the considered $M_{2^{t}}$ numbers.

Theorem 6.1. $M_{2^{t}}$ can be factored into Fermat numbers $F_{0}$ to $F_{t-1}$ for $t>0$. Furthermore, $F_{i}$ and $F_{j}$ are coprime for $i \neq j$.

Proof. The first statement is true for $t=1$, since $M_{2}=3=F_{0}$. Now it is assumed that the statement is true for $t$. It follows that $M_{2^{t+1}}=2^{2^{t+1}}-1=\left(2^{2^{t}}-1\right)\left(2^{2^{t}}+1\right)=M_{2^{t}} F_{t}$. Therefore, by induction the statement is true for all $t>0$. The second statement is Goldbach's theorem [KLS02; Ros93].

It is the case that two and $M_{2^{t}}$ are coprime, so that two has finite multiplicative order modulo $M_{2^{t}}$. With $c=2$ in (6.1), it follows from (6.2) with $i=0$ that

$$
2^{i} \equiv 2^{0} \equiv 1 \equiv 2^{j} \quad\left(\bmod M_{2^{t}} / \operatorname{gcd}\left(M_{2^{t}}, k\right)\right),
$$

for a $j>0$. This means that in the sequence given by (6.1), the starting value $\beta$ will reoccur after $j$ steps. Before information is provided about the period of the sequence, a few supporting theorems are introduced at first.

Theorem 6.2. The smallest integer $x>0$ that satisfies $2^{x} \equiv 1\left(\bmod F_{t}\right)$ is $x=2^{t+1}$.
Proof. A solution for the congruence is provided by $\bar{x}=2^{t+1}$, for

$$
\frac{2^{\bar{x}}-1}{F_{t}}=\frac{2^{2^{t+1}}-1}{2^{2^{t}}+1}=2^{2^{t}}-1=M_{2^{t}} .
$$

Furthermore, $x$ can be restricted to $2^{t}<x \leq 2^{t+1}$, as $2^{x} \not \equiv 1\left(\bmod F_{t}\right)$ for $1<2^{x}<F_{t}$. It is assumed that $\tilde{x}$ is the smallest solution and different from $\bar{x}$, such that $2^{t}<\tilde{x}<\bar{x}$.

Then

$$
2^{\bar{x}} \equiv 2^{\tilde{x}} 2^{\bar{x}-\tilde{x}} \equiv 2^{\bar{x}-\tilde{x}} \equiv 1 \quad \bmod F_{t},
$$

which implies that $\bar{x}-\tilde{x} \geq \tilde{x}$, as $\tilde{x}$ is the smallest solution. This means that $\tilde{x} \leq$ $\bar{x} / 2=2^{t}$, which contradicts that $2^{t}<\tilde{x}$, and therefore $\bar{x}=2^{t+1}$ must be the smallest solution.

The proof of the following theorem can be found in the literature [BS96].
Theorem 6.3. For every factor $f$ of $F_{t}$, the smallest integer $x>0$ that satisfies $2^{x} \equiv 1$ $(\bmod f)$ is $x=2^{t+1}$.

It is now possible to specify the period of the considered sequence as follows.
Theorem 6.4. The smallest positive integer $j$ for which $2^{j} \equiv 1\left(\bmod M_{2^{t}} / \operatorname{gcd}\left(M_{2^{t}}, k\right)\right)$ is satisfied, accounts for $j=2^{u+1}$ if $F_{u}$ is the largest Fermat number that does not divide $k$, where $u<t$. If such a Fermat number does not exist, $j=1$.

Proof. With Theorem 6.1 and the premise $F_{u}$ being the largest Fermat number of $M_{2^{t}}$ with $\operatorname{gcd}\left(F_{u}, k\right) \neq F_{u}$, the congruence can be transformed into

$$
2^{j} \equiv 1 \quad \bmod \left(\frac{F_{0}}{\operatorname{gcd}\left(F_{0}, k\right)} \frac{F_{1}}{\operatorname{gcd}\left(F_{1}, k\right)} \cdots \frac{F_{u}}{\operatorname{gcd}\left(F_{u}, k\right)}\right) .
$$

According to Theorem 6.1, the congruence holds for $j=2^{u+1}$. Since at least one Fermat factor of $F_{0}$ to $F_{u}$ remains in the modulus, Theorem 6.3 guarantees that $j=2^{u+1}$ is the smallest positive solution fulfilling the congruence.

If $k$ is divisible by all the Fermat numbers $F_{0}$ to $F_{u}$, which means it is divisible by $M_{2^{t}}$, it follows that $k=0$. The congruence simplifies to $2^{j} \equiv 1(\bmod 1)$, so that the smallest positive $j$ becomes $j=1$.

With the result of Theorem 6.4 it is clear that the period of the sequence is influenced by the largest Fermat number $F_{0}$ to $F_{t-1}$ that is not a factor of $k$. This means in particular that if $F_{u}$ is the largest Fermat number with $u<t$ that is not a factor of $k$, the period of the corresponding sequence can be indicated as $j=2^{u+1}$. The period for the special case, where $k=0$, accounts to $j=1$. In other words, the period length equals $j=2^{t}$, unless $k$ is a multiple of Fermat numbers $F_{u+1}$ to $F_{t-1}$, but not of $F_{u}$, in which case the period shrinks to $2^{u+1}$.

The group order $q=M_{4}$ is considered as an example. For the sequence of group elements $\beta^{2^{j}}=\alpha^{k 2^{j}}$ with starting index of $j=0$, the resulting sequence of exponents

Table 6.1: Sequences of the form $k 2^{j} \operatorname{modulo}\left(M_{4} / \operatorname{gcd}\left(M_{4}, k\right)\right)$.

| k | $j$ |  |  |  |
| :---: | :---: | :---: | :---: | :---: |
|  | 0 | 1 | 2 | 3 |
| 0 | 0 |  |  |  |
| 1 | 1 | 2 | 4 | 8 |
| 2 | 2 | 4 | 8 | 1 |
| 3 | 3 | 6 | 12 | 9 |
| 4 | 4 | 8 | 1 | 2 |
| 5 | 5 | 10 |  |  |
| 6 | 6 | 12 | 9 | 3 |
| 7 | 7 | 14 | 13 | 11 |
| 8 | 8 | 1 | 2 | 4 |
| 9 | 9 | 3 | 6 | 12 |
| 10 | 10 | 5 |  |  |
| 11 | 11 | 7 | 14 | 13 |
| 12 | 12 | 9 | 3 | 6 |
| 13 | 13 | 11 | 7 | 14 |
| 14 | 14 | 13 | 11 | 7 |

of the generator element, $k 2^{j}$ modulo $\left(M_{4} / \operatorname{gcd}\left(M_{4}, k\right)\right)$, is listed in Table 6.1 for an entire period as a function of $k$. The group order factors into $q=M_{4}=F_{0} F_{1}$, which explains why the period of the sequence is $j=2$ if $k$ is a multiple of $F_{1}$, but not of $F_{0}$. If $k$ is a multiple of both, $F_{0}$ and $F_{1}$, the period equals $j=1$. In every other case, the period exhibits a maximum length of $j=4$.

A different way of looking at the sequences is to consider cyclotomic cosets $C_{k}$ modulo $q$ with respect to $c$, where $c$ and $q$ are coprime [GG05]. These are defined as

$$
C_{k}=\left\{k, k c, \ldots, k c^{j}\right\}
$$

with $j$ being the smallest positive integer so that $k \equiv k c^{j}(\bmod q)$. The coset leaders $k$ are chosen such that they are smallest nonnegative integers in their corresponding sets $C_{k}$. For the example in Table 6.1 , where $q=15$ and $c=2$, the cyclotomic cosets are

$$
C_{0}=\{0\},
$$

$$
\begin{aligned}
& C_{1}=\{1,2,4,8\}, \\
& C_{3}=\{3,6,12,9\}, \\
& C_{5}=\{5,10\}, \\
& C_{7}=\{7,14,13,11\} .
\end{aligned}
$$

It is possible to exploit the regular structure governing the differences in sequence period lengths or size of cyclotomic cosets, to deduce the discrete logarithm $k=\log _{\alpha} \beta$. First of all, $\beta$ can be examined to determine if it corresponds to a certain period $j \leq 2^{u}$. Since all the periods are powers of two, smaller periods divide larger periods, and so a simple verification of $\beta=\beta^{2^{u+1}}$ would imply $j \leq 2^{u}$. Secondly, periods of the length $j \leq 2^{u}$ occur if $k$ is a multiple of $F_{u} F_{u+1} \cdots F_{t-1}$, which implies that smaller periods $\bar{j}$ with $\bar{j}<j$ occur only if $k$ is additionally a multiple of $F_{u-1}$.

Under the assumption that $\beta$ belongs to a period of length $j \leq 2^{u}$, it is possible to test for the tighter period length bound $i \leq 2^{u-1}$, and modify $\beta$ through multiplication with $\alpha^{F_{u} \cdots F_{t-1}}$ if the test fails. This process is repeated until the period length of $\beta$ adheres to the bound $i$, which will require a maximum number of $F_{u-1}$ tests. With the new $\beta$, the bound $i$ can now be reduced to half of its value and the search restarted. The process stops once $\beta$ corresponds to a period of length one, where it is known that the corresponding $k$ equals $k=0$. If track has been kept of the modifications that were applied to $\beta$, it is possible to work out the original value of $k$. The method is summarised through the pseudocode shown in Algorithm 6.1.

```
Algorithm 6.1 Discrete logarithm computation based on cyclotomic cosets.
    function \(\operatorname{LOG}\left(\alpha, \beta, q=M_{2^{t}}\right)\)
        \(k=0\)
        for \(i=t-1\) downto 0 do
            \(\bar{\beta}=\beta^{2^{i+1}}\)
            while \(\beta \neq \bar{\beta}\) do
                \(\beta=\beta \alpha^{M_{2^{t}} / M_{2^{i+1}}}\)
                \(\bar{\beta}=\bar{\beta} \alpha^{M_{2^{t}} / M_{2^{i+1}}}\)
                \(k=k-M_{2^{t}} / M_{2^{i+1}} \bmod q\)
            end while
        end for
        return \(k\)
    end function
```

The worst-case running time of the algorithm for $q=M_{2^{t}}$ can be expressed as

$$
O\left(t \log _{2} q+\sum_{i=0}^{t-1} F_{i}\right)
$$

where the first term takes into account the time necessary to compute $\bar{\beta}$ and $\alpha^{M_{2} t / M_{2^{i+1}}}$, and the second term to traverse the while loop. Since $F_{t-1} \approx \sqrt{q}$, the time complexity can be simplified to $O(\sqrt{q})$. The space requirements account for $O(1)$.

### 6.2 Improvement

The running time of the algorithm can be improved if the probabilities of occurrence of the discrete logarithm values $k$ with $0 \leq k \leq q-1$ are unequal. This is, for instance, the case if the discrete logarithm is restricted to lie in a certain interval.

Instead of multiplying both $\beta$ and $\bar{\beta}$ in Algorithm 6.1 by $\alpha^{M_{2^{t}} / M_{2^{i+1}}}$ for a specific iteration $i$, it is possible to multiply with the inverse element likewise. This would require a corresponding advancement of $k$ by $M_{2^{t}} / M_{2^{i+1}}$ rather than by $\left(-M_{2^{t}} / M_{2^{i+1}}\right)$. If the first iteration of the algorithm for $i=t-1$ is considered, the following two sets of discrete logarithms $k$ can be defined

$$
\begin{aligned}
& S_{0}=\left\{k \mid 1 \leq k+r F_{t-1} \leq\left(F_{t-1}-1\right) / 2\right\} \\
& S_{1}=\left\{k \mid\left(F_{t-1}-1\right) / 2+1 \leq k+r F_{t-1} \leq F_{t-1}\right\},
\end{aligned}
$$

where $r$ is an integer. Let $p(k)$ be the probability of occurrence for the discrete logarithm $k$. If

$$
\sum_{k \in S_{0}} p(k)>\sum_{k \in S_{1}} p(k),
$$

then the alternative method as suggested in this section could have an improved average or even improved worst-case running time, as a sequence length of $j \leq 2^{t-1}$ could be reached on average and, possibly, also in the worst case faster than with the standard method. The same principle can then be applied to each subsequent iteration step with $i \leq t-2$, by determining the corresponding probabilities and deciding on the appropriate method.

### 6.3 Properties

An interesting property can be derived for the considered sequences, which is summarised in the following theorem.

Theorem 6.5. Let $G$ be a cyclic group of order $q=M_{2^{t}}$ with primitive element $\alpha$. If $k \equiv 1\left(\bmod F_{t-1}\right)$, it follows that

$$
\alpha^{k}=\left(\alpha^{k+M_{2} t-1}\right)^{F_{t-1}-1} .
$$

Proof. Starting from the left hand side of the equation, the logarithm to the base $\alpha$ is taken and $k$ is replaced by $\left(r F_{t-1}+1\right)$, where $r$ is an integer and which leads to the corresponding logarithm of the right hand side as follows

$$
\begin{aligned}
k & \equiv r F_{t-1}+1 \\
& \equiv r F_{t-1}+1+M_{2^{t}}(r+1) \\
& \equiv r\left(F_{t-1}+M_{2^{t}}\right)+\left(M_{2^{t}}+1\right) \\
& \equiv r F_{t-1}\left(F_{t-1}-1\right)+\left(M_{2^{t-1}}+1\right)\left(F_{t-1}-1\right) \\
& \equiv\left(r F_{t-1}+1+M_{2^{t-1}}\right)\left(F_{t-1}-1\right) \\
& \equiv\left(k+M_{2^{t-1}}\right)\left(F_{t-1}-1\right) \quad\left(\bmod M_{2^{t}}\right) .
\end{aligned}
$$

Theorem 6.5 can be applied directly to the example sequences shown in Table 6.1, which belong to the group order $q=M_{2^{t}}$ with $t=2$. It can be seen that for every integer $r$, the sequence for $k \equiv F_{1} r+1 \equiv 5 r+1(\bmod 15)$ repeats at $\bar{k} \equiv F_{1} r+M_{2^{t-1}}+1 \equiv$ $5 r+4(\bmod 15)$ shifted by $t-1=2$ elements. The involved sequences for $k$ and $\bar{k}$ are just the sequences that are adjacent to sequences with a period length of $j \leq 2^{t-1}$, since $F_{t-1} \mid k-1$ and $F_{t-1} \mid \bar{k}+1$.

For the finite field $G F\left(2^{m}\right)$, the trace function is defined as

$$
\operatorname{Tr}(x)=\sum_{i=0}^{m-1} x^{2^{i}},
$$

where $x \in G F\left(2^{m}\right)$. $\operatorname{Tr}(x)$ defines a mapping from $G F\left(2^{m}\right)$ to $G F(2)$ [GG05]. This result can be refined for the considered sequences in the following theorem.

Theorem 6.6. Let $x$ be an element of the finite field $G F\left(2^{m}\right)$ and $j$ the smallest positive integer such that $x=x^{2^{j}}$. It follows that

$$
\overline{\operatorname{Tr}}(x)=\sum_{i=0}^{j-1} x^{2^{i}} \in G F(2)
$$

Proof. The same principle is applied as for the mentioned result of the trace function [GG05]. Let $\beta \in G F\left(2^{m}\right)$ and $j$ be the smallest positive integer such that $\beta=\beta^{2^{j}}$. Therefore,

$$
\left(\sum_{i=0}^{j-1} \beta^{2^{i}}\right)^{2}=\sum_{i=0}^{j-1} \beta^{2^{i+1}}=\sum_{i=0}^{j-1} \beta^{2^{i}} .
$$

Since $\beta=\beta^{2}$ is equivalent to $\beta \in G F(2)$, it follows that

$$
\sum_{i=0}^{j-1} \beta^{2^{i}} \in G F(2)
$$

The theorem can be applied to the considered sequences if the underlying cyclic group is the multiplicative group of a finite field. To construct an example corresponding to the one given in Table 6.1, the finite field $G F\left(2^{4}\right)$ is considered, which will be represented as the polynomial ring over $G F(2)$ modulo the primitive polynomial $p(X)=X^{4}+X^{3}+1$. Furthermore, $\alpha=X$ is a root of $p(X)$. The sequences of the form $\left(\alpha^{k}\right)^{2^{j}}$ to which the trace function $\operatorname{Tr}(x)$ and its variant $\overline{\operatorname{Tr}}(x)$ from Theorem 6.6 are applied, are shown in Table 6.2.

For the finite field $G F\left(M_{2^{t}}+1\right)$, where the order of the multiplicative group accounts for $q=M_{2^{t}}$, it has been shown that for a group element $\beta$, the sequence $\beta^{2^{i}}$ with starting index $i=0$ has a period length of a power of two. This means that the trace function $\operatorname{Tr}(x)$ can be synthesised from the trace function variant $\overline{\operatorname{Tr}}(x)$. If a group element $\beta$ has a period length $j$, it follows

$$
\operatorname{Tr}(\beta)=\sum_{i=1}^{2^{t} / j} \overline{\operatorname{Tr}}(\beta) .
$$

Therefore, if the group element $\beta$ has a period length $j<2^{t}$, it can be stated that $\operatorname{Tr}(\beta)=0$. In other words, whenever the period length of $\beta$ is below the maximum value of $2^{t}$, the trace function will result in zero. This applies to all $\beta=\alpha^{k}$, where $\alpha$ is

Table 6.2: Sequences of the form $\left(X^{k}\right)^{2^{j}}$ modulo $X^{4}+X^{3}+1$ are shown for an entire period. The trace function $\operatorname{Tr}(x)$ and its variant $\overline{\operatorname{Tr}}(x)$ are applied to the sequences. Furthermore, the sum of the sequence elements $X^{k}$ and $X^{4 k}$ is specified. Sequence elements are indicated in 4 -tuple and 1 -tuple representation, where the leftmost bit is the most significant bit.

| k | $j$ |  |  |  | $\overline{T r}\left(X^{k}\right)$ | $\operatorname{Tr}\left(X^{k}\right)$ | $X^{k}+X^{4 k}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | 0 | 1 | 2 | 3 |  |  |  |
| 0 | 0001 |  |  |  | 1 | 0 | 0000 |
| 1 | 0010 | 0100 | 1001 | 1110 | 0 | 1 | 1011 |
| 2 | 0100 | 1001 | 1110 | 0010 | 0 | 1 | 1010 |
| 3 | 1000 | 1111 | 0011 | 0101 | 1 | 1 | 1011 |
| 4 | 1001 | 1110 | 0010 | 0100 | 0 | 1 | 1011 |
| 5 | 1011 | 1010 |  |  | 1 | 0 | 0000 |
| 6 | 1111 | 0011 | 0101 | 1000 | 1 | 1 | 1010 |
| 7 | 0111 | 1100 | 0110 | 1101 | 1 | 0 | 0001 |
| 8 | 1110 | 0010 | 0100 | 1001 | 0 | 1 | 1010 |
| 9 | 0101 | 1000 | 1111 | 0011 | 1 | 1 | 1010 |
| 10 | 1010 | 1011 |  |  | 1 | 0 | 0000 |
| 11 | 1101 | 0111 | 1100 | 0110 | 1 | 0 | 0001 |
| 12 | 0011 | 0101 | 1000 | 1111 | 1 | 1 | 1011 |
| 13 | 0110 | 1101 | 0111 | 1100 | 1 | 0 | 0001 |
| 14 | 1100 | 0110 | 1101 | 0111 | 1 | 0 | 0001 |

a primitive element of the group and $F_{t-1} \mid k$ as can be observed in Table 6.2.
An irreducible polynomial $p(X)=X^{m}+p_{m-1} X^{m-1}+p_{m-2} X^{m-2}+\cdots+p_{1} X+p_{0}$ over $G F(2)$ of degree $m$ defines a linear recursive sequence over $G F(2)$ with

$$
s_{k+m}=\sum_{i=0}^{m-1} p_{i} s_{k+i},
$$

where the index $k$ starts from $k=0$ and where $s_{0}$ to $s_{m-1}$ form the initial state [GG05]. It can be shown that if $\alpha$ is a root of $p(X)$, then an element $\beta \in G F\left(2^{m}\right)$ exists, such that the linear recursive sequence can equivalently be described as

$$
s_{k}=\operatorname{Tr}\left(\beta \alpha^{k}\right),
$$

for $k \geq 0$. The initial state is encoded through $\beta$ in this case. Thus, the sequence that is obtained through the trace function in the example in Table 6.2 corresponds to the
linear recursive sequence defined by $p(X)$.
An additional property that can be derived concerns the summation of field elements of $\alpha \in G F\left(2^{m}\right)$. If two sets of indices $K$ and $\bar{K}$ can be identified such that

$$
\sum_{k \in K} \alpha^{k}=\sum_{k \in \bar{K}} \alpha^{k},
$$

then the equation will also hold if each index within $K$ and $\bar{K}$ is multiplied by a power of two, since the characteristic of the field is two. The property can also be observed in the example that is given in Table 6.2, where $\alpha=X$ is a primitive element of $G F\left(2^{4}\right)$. It can be seen that $\alpha^{k}+\alpha^{4 k}$ is equal for $k \in\{1,3,4,12\}$. This implies that $\alpha^{k}+\alpha^{4 k}$ has the same value for $k \in\{2,6,8,9\}$ as can be verified in the example.

### 6.4 Comparison

As long as the involved Fermat numbers $F_{0}$ to $F_{t-1}$ are all prime, the proposed algorithm competes with the Silver-Pohlig-Hellman algorithm, due to the similar running time. Composite Fermat numbers are split by the Silver-Pohlig-Hellman algorithm into its prime factors, and for the corresponding group sizes the discrete logarithms are solved individually to be combined to the overall solution as outlined in Section 3.3. Therefore, the Silver-Pohlig-Hellman algorithm has a time-complexity advantage if composite Fermat numbers are present in the group order factorisation. Currently, the only known Fermat primes range from $F_{0}$ to $F_{4}$ [CMP03; Kel]. The subsequent Fermat numbers $F_{5}$ to $F_{32}$ have been proven to be composite. At present, the status of $F_{33}$ is unknown.

Another aspect concerns the combinability of the Silver-Pohlig-Hellman algorithm with other methods. The algorithm breaks down the group order $q$ into its prime factors and computes the discrete logarithm in the corresponding subgroups for which more efficient methods can be used than the exhaustive search. It is currently not obvious how this approach could be carried over to the proposed method.

### 6.5 Conclusion

A new approach for the computation of discrete logarithms has been proposed. For analysed cyclic groups with orders of the form $q=M_{2^{t}}$, a novel deterministic generic algorithm based on size differences of cyclotomic cosets has been obtained. The
algorithm requires $O(\sqrt{q})$ time and $O(1)$ space. It may very well be the case that a similar or better running time can be achieved for other classes of group orders. There is no speed advantage if the algorithm is applied to groups of prime order, since the factorisation of the group order plays a key role in the operation of the algorithm.

It has been shown how the average and worst-case running time of the algorithm can be improved if not all discrete logarithm values are equally likely to occur.

Furthermore, a set of properties has been derived that applies to the sequences that form the basis of the algorithm. Some of these properties assume a finite field within which the discrete logarithm is to be solved. These and possibly also other properties may allow for an improvement of the algorithm.

## Chapter 7

## Logarithms: Reduce-Map Algorithm

This chapter considers finite fields $G F\left(2^{m}\right)$, represented as the polynomial ring over $G F(2)$ modulo a primitive polynomial $p(X)$ of degree $m$ over $G F(2)$. For each of these fields, new properties are presented that establish relationships between its elements.

On the basis of some of these properties, a novel approach is proposed for computing discrete logarithms in the multiplicative groups of the fields that can be used for the correction of single-bit errors based on cyclic codes as described in Chapter 2. This approach has been evaluated for all primitive polynomials up to degree 12 and the first primitive polynomials of degree 13 and 14, for which a deterministic algorithm is obtained that is efficient in time and space. The algorithm requires a number of parameters, where currently the optimal set can only be determined in exponential time in the degree of the defining polynomial. This is the reason why only partial results are provided for the time requirements of the algorithm for polynomials of higher degree up to 32 .

### 7.1 Preliminaries

The finite field $G F\left(2^{m}\right)$ is represented as the polynomial ring over $G F(2)$ modulo a primitive polynomial $p(X)=X^{m}+p_{m-1} X^{m-1}+\cdots+p_{1} X+p_{0}$ of degree $m$, whose coefficients belong to $G F(2)$. The $m$ least significant coefficients of $p(X)$ will be combined in the coefficient vector $p=\left[p_{m-1}, p_{m-2}, \ldots, p_{0}\right]^{T}$.

It will be convenient to treat the field elements as the states of a Linear Feedback Shift Register (LFSR) that is configured to the generator polynomial $p(X)$ as described in Section 2.3. An example of such an LFSR is shown in Figure 2.5 for the primitive polynomial $p(X)=X^{3}+X+1$. The register content is regarded as the state polynomial
$s(X)=s_{m-1} X^{m-1}+s_{m-2} X^{m-2}+\cdots+s_{0}$ with the corresponding vector notation $s=\left[s_{m-1}, s_{m-2}, \ldots, s_{0}\right]^{T}$. A single shift operation of the register corresponds to a multiplication of $s(X)$ by $X$ modulo the generator polynomial $p(X)$. If the LFSR is initialised to a nonzero state $s$, it will traverse, by continuous shift operations, all of its $q=2^{m}-1$ possible nonzero states as illustrated in Table 7.1 for the primitive polynomial $p(X)=X^{5}+X^{2}+1$. The sequence of states produced by such an LFSR is referred to as a maximum-length sequence, for which new relationships are developed in the next section.

In this context the following discrete logarithm problem is considered. The polynomial $X$ is a root of $p(X)$ and thus a primitive element. Then, for a nonzero $s(X)$, the discrete logarithm $k$ with $0 \leq k \leq q-1$ to the base $X$ is to be determined, such that

$$
\begin{equation*}
X^{k} \equiv s(X) \quad(\bmod p(X)) \tag{7.1}
\end{equation*}
$$

From the perspective of the LFSR, the discrete logarithm $k$ corresponds to the number of shifts that the register, initialised to the state vector $[0, \ldots, 0,1]^{T}$, needs to perform to reach the corresponding coefficient vector $s$ of $s(X)$.

### 7.2 Shift Register Sequences

This section establishes new facts about maximum-length shift register sequences. In what follows, all possible state vectors of the LFSR are aggregated into the set $S^{*}$; the set $S$ includes additionally the zero vector. The $2^{m}$ elements of $S$ form an $m$-dimensional vector space over $G F(2)$.

### 7.2.1 Sequence Numbering

The states of the maximum-length LFSR sequence will be numbered for convenience, and without loss of generality, in a certain way. A state at the sequence position $i$ will be denoted by $s[i]$. The first state $s[0]$ is defined to equal $s[0]=[0, \ldots, 0,1]^{T}$. Consequently, the polynomial vector $p$ appears $m$ states later, such that $s[m]=p$, as can be seen in the example in Table 7.1. With this definition, the position of a state in the sequence is equivalent to the discrete logarithm of that state as described by (7.1).

### 7.2.2 Transformation

Within the sequence of LFSR states, it is possible to advance a state $s[i]$ by any number of $j$ positions with a simple invertible linear transformation $T$, where $s[i+j]=T^{j} s[i]$. The transformation $T$, which simulates a single LFSR shift, takes, thereby, the following shape

$$
T=\left[\begin{array}{ccccc}
p_{m-1} & 1 & 0 & \cdots & 0  \tag{7.2}\\
p_{m-2} & 0 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
p_{1} & 0 & 0 & \cdots & 1 \\
1 & 0 & 0 & \cdots & 0
\end{array}\right]
$$

With this information the transformation $T^{j}$ that advances a state by $j$ steps, can be expressed as

$$
\begin{align*}
T^{j} & =T^{j-1} T \\
& =T^{j-1}[s[m], s[m-1], \ldots, s[1]] \\
& =[s[j+m-1], s[j+m-2], \ldots, s[j]] . \tag{7.3}
\end{align*}
$$

The transformation $T^{j}$ is thus composed of the $m$ consecutive state vectors $s[j+m-1]$ to $s[j]$.

### 7.2.3 Parity Vector Spaces

The elements of $S$ can be grouped into different parity sets $P_{l, j}$, which are characterised by the parity level $l$, where $0 \leq l \leq m-1$, and the parity $j \in\{0,1\}$. For a vector $x \in S$ and a parity level $l$, the following sub-parity function is introduced

$$
b_{l, i}(x)=\sum_{j=0}^{\left\lceil\frac{m-i}{+1}\right\rceil-1} x_{m-1-i-j(l+1)},
$$

where $0 \leq i \leq l$. As an example, the calculation of the sub-parities $b_{2, i}$, where $0 \leq i \leq 2$, for a vector $s$ of length $m=9$ is shown in Figure 7.1. A parity set $P_{l, j}$ is now defined as

$$
\begin{equation*}
P_{l, j}=\left\{s \mid b_{l, i}(s)=j, 0 \leq i \leq l, s \in S\right\} . \tag{7.4}
\end{equation*}
$$



Figure 7.1: Calculation of all sub-parities $b_{2, i}$, where $0 \leq i \leq 2$, for parity level 2 for a vector $s$ of length $m=9$.
$P_{l, 0}$ is considered to be of even parity, whereas $P_{l, 1}$ will be of odd parity for each parity level $l$. It will be convenient to define $P_{l}$ as

$$
P_{l}=P_{l, 0} \cup P_{l, 1}
$$

A sample parity set categorisation is shown in the left part of Table 7.1.
In what follows a few properties are established that are inherent to the defined parity sets.

Theorem 7.1. For $0 \leq l \leq m-1$ and $j \in\{0,1\}$, the number of elements in each parity set $P_{l, j}$ is

$$
\left|P_{l, j}\right|=2^{m-l-1}
$$

Proof. The base set $S$ from which the parity sets are constructed, consists of all possible vectors of length $m$ over $G F(2)$. A single sub-parity $b_{l, i}(s)$ computes a simple parity over a set of bits of a vector $s \in S$. The elements of $S$ are thus divided into two equal sized sets of even and odd sub-parity $b_{l, i}$. The $l+1$ different sub-parities $b_{l, i}$ for a specific parity level $l$ are computed from disjoint bit subsets of $s$, and are thus independent of each other. For $P_{l, j}$ requires all $l+1$ sub-parities $b_{l, i}$ to equal $j$, the number of elements satisfying this condition equals $|S| 2^{-l-1}=2^{m-l-1}$.

Further information about the nature of the parity sets is given by the following theorem.

Theorem 7.2. For every parity level $l$, where $0 \leq l \leq m-1, P_{l, 0}$ and $P_{l}$ are subspaces of $S$. The dimension of $P_{l, 0}$ and $P_{l}$ is $m-l-1$ and $m-l$, respectively.

Proof. The zero vector is, for every level $l$, of even parity and therefore an element of every $P_{l, 0}$. Scalar multiplication is closed in $S$ since $c s \in S$ for all $c \in\{0,1\}, s \in S$. Let $s \in P_{l, j}$ and $\bar{s} \in P_{l, \bar{j}}$ where $j, \bar{j} \in\{0,1\}$. The values of the $l+1$ sub-parities $b_{l, i}$

Table 7.1: Parity set decomposition for the nonzero elements of $\mathbb{Z}_{2}[X] /\left\langle X^{5}+X^{2}+1\right\rangle$ in 5-tuple representation. Parity sets on level $l$ with $0 \leq l \leq 4$ are indicated with $P_{l}$, where even and odd parity is denoted with $o$ and $x$, respectively. The parity set elements are aligned with lower level parity set elements in the aligned version which is shown in the right-hand side of the table.

| Position | State | Unaligned | Aligned |
| :---: | :---: | :---: | :---: |
|  |  | $P_{0} P_{1} P_{2} P_{3} P_{4}$ | $P_{0} P_{1} P_{2} P_{3} P_{4}$ |
| 0 | 00001 | x | x |
| 1 | 00010 | x | x |
| 2 | 00100 | X | x |
| 3 | 01000 | x | x |
| 4 | 10000 | x | x |
| 5 | 00101 | O o | 0 ox |
| 6 | 01010 | o o | o o x |
| 7 | 10100 | - 0 | o o x |
| 8 | 01101 | x | x |
| 9 | 11010 | x | x |
| 10 | 10001 | o o o | o o o ox |
| 11 | 00111 | x x | x |
| 12 | 01110 | x x | x |
| 13 | 11100 | x x | x |
| 14 | 11101 | 0 x | 0 x |
| 15 | 11111 | x x | x |
| 16 | 11011 | ○ o o | o o x |
| 17 | 10011 | x | x |
| 18 | 00011 | 0 x | 0 x |
| 19 | 00110 | O x | 0 x |
| 20 | 01100 | 0 x | 0 x |
| 21 | 11000 | o x | 0 x |
| 22 | 10101 | x x | x |
| 23 | 01111 | o o x | 0 om x |
| 24 | 11110 | o o x | o o o x |
| 25 | 11001 | x | x |
| 26 | 10111 | o x | 0 x |
| 27 | 01011 | x | x |
| 28 | 10110 | x | x |
| 29 | 01001 | O x 0 | O x |
| 30 | 10010 | O x 0 | o X |

for $0 \leq i \leq l$ will be $j$ for $s$, and $\bar{j}$ for $\bar{s}$. Therefore, for the sum of $s$ and $\bar{s}$, all the sub-parities will account for $j+\bar{j}$, and it follows $s+\bar{s} \in P_{l, j+j}$. If $s$ and $\bar{s}$ are drawn from the same parity set, the sum will be of even parity, otherwise it will be odd. $P_{l, 0}$ and $P_{l}$ are thus closed under addition and they both form a subspace of $S$.

Furthermore, $P_{l, 0}$ and $P_{l}$ are both vector spaces over $G F(2)$ and have according to Theorem 7.1, $2^{m-l-1}$ and $2^{m-l}$ elements, respectively. For this reason, the dimension of $P_{l, 0}$ is $m-l-1$ and the dimension of $P_{l}$ is $m-l$.

### 7.2.4 Blocks

The kernel of an element $s \in S^{*}$ is defined in this work as the largest sub-vector of $s$, which has a starting and ending coefficient of 1 . For example, the kernel of the vector $s=[0,0,1,0,1,0]^{T}$ is $[1,0,1]^{T}$. If an LFSR is initialised to a state vector $s$ with its kernel of length $c$ located in its least significant part, a total number of $m-c$ shifts will slide the kernel across the register to its most significant end. Provided that $m>1$ and that the generated sequence is of maximum length, the next LFSR shift will push the most significant kernel bit into the feedback path of the register, which will modify the kernel. Should the kernel remain unaffected, the sequence would have a length of $m-c+1$, which is impossible as the length would be smaller than the presumed maximum sequence length of $2^{m}-1$.

The consecutive LFSR states that are generated by pure LFSR shifts, and which therefore share the same kernel of length $c$, are considered to form a block of size

$$
\begin{equation*}
b=m-c+1 . \tag{7.5}
\end{equation*}
$$

As an example, the first five states in the sequence of Table 7.1 consolidate to a block of size $b=5$. The zero vector forms an exception to the other vectors. It is considered to generate an even parity block of unspecified length that will be referred to as the zero-block.

The next theorem provides a useful relationship between blocks and parities.
Theorem 7.3. Even or odd parity set membership is invariant within a block.
Proof. Let $s \in S$ belong to a block of size $b$ with $b>1$, and let $l$ denote a parity level with $0 \leq l \leq m-1$. It is further assumed that $s \in P_{l, j}$ where $j \in\{0,1\}$. According to (7.4), all the sub-parities of $s$ for parity level $l$ will have the same value $j$, such that $b_{l, i}(s)=j$ for $0 \leq i \leq l-1$. Without loss of generality, it will be assumed that a
single kernel-conserving forward LFSR shift on $s$ will lead to the successor $\bar{s}$ in the same block. The sub-parities of $\bar{s}$ will then be the cyclic shifted sub-parities of $s$, i.e. $b_{l, i}(\bar{s})=b_{l,((i+1) \bmod (l+1))}(s)$, for $0 \leq i \leq l$. This means that the sub-parities of $\bar{s}$ are identical to those of $s$, and therefore $\bar{s} \in P_{l, j}$.

The states of a sequence can thus be classified into blocks of even and odd parity for different levels of parity. In the following theorem, information is provided on the block structure for every level of parity.

Theorem 7.4. For a maximum-length sequence and parity level $l$, where $0 \leq l \leq m-1$, the largest block is of odd parity and exhibits a size of $m-l$ with kernel $[1, \ldots, 1]^{T}$ of length $l+1$. The next smaller block is of even parity and has size $m-l-1$ with kernel $[1,0, \ldots, 0,1]^{T}$ of length $l+2$, where $l<m-1$. For every smaller block size $b$, where $1 \leq b \leq m-l-2$, there are $2^{m-l-b-2}$ blocks of even and $2^{m-l-b-2}$ blocks of odd parity. The total number of blocks on parity level laccounts for $2^{m-l-1}$, of which $\left\lceil 2^{m-l-2}\right\rceil$ are of odd parity and $\left\lfloor 2^{m-l-2}\right\rfloor$ are of even parity.

Proof. A block of even or odd parity can be characterised by its specific kernel according to Theorem 7.3. It is, therefore, sufficient to analyse the different kernels that occur in a maximum-length sequence. The longer a block is, the shorter is its kernel. The shortest kernel that satisfies the odd parity criterion is of length $c=l+1$ and consists entirely of ones $[1, \ldots, 1]^{T}$. This is the shortest kernel that can set all the $l+1$ sub-parities $b_{l, i}$ to one; it determines the block of size $b=m-l$. With a kernel of size $c=l+1$ or shorter, every sub-parity is computed from at most a single kernel bit. To achieve even parity, all kernel bits would have to be set to zero. As the zero state is excluded from a maximum-length sequence, an even block of size $b=m-l$ or smaller does not exist.

There is only a single parity configuration for a kernel of size $c=l+2$, where $l<m-1$. The two outer bits of the kernel are ones and are the only two kernel bits that belong to a single sub-parity. This sets that particular sub-parity to even parity. The only overall parity class this kernel can qualify for is the even parity class. In this case, each of the remaining kernel bits has to equal zero, leading to the kernel $[1,0, \ldots, 0,1]^{T}$ of length $l+2$, which characterises the block of size $m-l-1$.

For larger kernel sizes $c$, where $l+3 \leq c \leq m$, an outer kernel bit is always combined with at least one inner kernel bit to form a certain sub-parity. The outer kernel bits are fixed to ones and can thus be left out of the consideration as they do not contribute any degree of freedom to the parity, which can be controlled by
the inner kernel bits. For $c=l+3$, each inner kernel bit contributes to one of the $l+1$ sub-parities. There are two options, they all can be set to either zero or to one, depending on which of the two parity classes is targeted. With each increase of the kernel length, a new inner kernel bit contributes a degree of freedom to a sub-parity, doubling the choices for even and odd parity configurations. Therefore, both even and odd kernels number $2^{c-l-3}$. Substituting $c$ with the help of (7.5) leads to $2^{m-l-b-2}$.

The summation of the block frequencies for all block sizes for parity level $l$ with $0 \leq l \leq m-3$, leads to the total number of $2+2 \sum_{b=1}^{m-l-2} 2^{m-l-b-2}=2+2\left(2^{m-l-2}-1\right)=$ $2^{m-l-1}$, of which $2^{m-l-2}$ blocks are of even parity and $2^{m-l-2}$ blocks are of odd parity. On parity level $l=m-2$, there is only one block of even parity and one block of odd parity. Parity level $l=m-1$ has only one block of odd parity.

### 7.2.5 $M$ Transformation

There exists a useful relationship between the nonzero blocks of a certain parity level $l$, which is described by the next theorem.

Theorem 7.5. Within the blocks of a parity level $l$ of a maximum-length sequence, where $0 \leq l \leq m-1$, the $b-1$ bottom vectors of a block of size $b$ with $b>1$ of either even or odd parity, can be transformed into an even parity block of size $b-1$, with a linear transformation $M$ of the form $M=T^{x}$ that is specific to the defining primitive polynomial $p(X)$. The transformation takes thereby the following shape

$$
M=\left[\begin{array}{ccccc}
1 & \cdots & 0 & 0 & 1 \\
1 & \cdots & 0 & 0 & p_{m-1} \\
\vdots & \ddots & \vdots & \vdots & \vdots \\
0 & \cdots & 1 & 0 & p_{3} \\
0 & \cdots & 1 & 1 & p_{2} \\
0 & \cdots & 0 & 1 & \bar{p}_{1}
\end{array}\right]
$$

Proof. At first, the existence of the linear transformation $M$ is shown. The left-hand side of $M$ is composed of the block with the kernel $[1,1]^{T}$. Due to the fact that the generator polynomial $p(X)$ is primitive, it is guaranteed that this block will appear at some point in the sequence of the shift register states. The rightmost vector of $M$ is simply the state vector that precedes the block. Since $M$ is composed of consecutive state vectors, it describes a linear transformation of the form $T^{x}$ as shown in Subsection 7.2.2.

To show the effect of the transformation $M$ on the $b-1$ bottom vectors of a block of size $b$, an arbitrary bottom vector $s=\left[s_{m-1}, \ldots, s_{0}\right]^{T}$ of the block is considered. Since $s$ is not a topmost block vector, it follows that $s_{0}=0$. With this information, the mapping of $s$ can be expressed as

$$
M s=\left[s_{m-1}+s_{0}, s_{m-2}+s_{m-1}, \ldots, s_{0}+s_{1}\right]^{T} .
$$

$M s$ is thus the addition of $s$ and its cyclically right-shifted version. The sub-parities for parity level $l$ can therefore be computed as

$$
b_{l, i}(M s)=b_{l, i}(s)+b_{l,(i+1)} \bmod (l+1)(s),
$$

for $0 \leq i \leq l \leq m-1$. Vector $s$ belongs to either an even or odd parity block, which means that all the $b_{l, i}(s)$ have the same value and, therefore, $b_{l, i}(M s)=0$ for $0 \leq i \leq l \leq m-1$. It follows that $M s \in P_{l, 0}$. The kernel size of $M s$ increases by one in comparison to $s$, for its kernel is composed of the kernel of $s$ and its right-shifted version. As a result, the bottommost $b-1$ vectors of a block of size $b$ are mapped by $M$ onto an even parity block of size $b-1$ within the same parity level.

A block, whose lower vectors are mapped by the transformation $M$ onto the vectors of another block, is considered to be linked to that block. Within a maximumlength sequence, the blocks of even and odd parity of a specific parity level are all linked up to chains by $M$. Each chain starts with a block of odd parity that links to blocks of even parity. The size of the chain blocks decreases with every link by one until a block of size one is reached. Odd parity blocks of size one do not link to any further blocks and form thus chains of length one.

The chain property is explained in more detail in what follows. The number of odd parity blocks of size $b$ with $b>1$ within parity level $l$, where $0 \leq l \leq m-2$, accounts for $\left\lceil 2^{m-l-3}\right\rceil$ as can be derived from Theorem 7.4. This number equals the number of blocks of even parity with size one on the same parity level. Furthermore, every block, whose size exceeds one, can be linked to a block of even parity and a size decreased by one in the same parity level according to Theorem 7.5. In this way all the even parity blocks become part of a chain. The only remaining blocks that have not been included in any chain yet, are odd parity blocks of size one, which are considered to form their own chain with only one element. This implies that all the chains start with an odd parity block and end with a block of size one. Chain elements other than the first one are all of even parity.


Figure 7.2: $M$ transformation. It links up the blocks of every parity level to chains. A vector of odd parity is indicated with an $x$, whereas an even parity vector is indicated with an $o$. Subsequent block vectors can also be generated through the addition of consecutive block vectors.


Figure 7.3: Pascal's triangle modulo two.

The chain property is an important result that will play a central role in subsequent sections. For the example sequence given in Table 7.1, $M=T^{17}$, since the state vector $\left[1, p_{4}, p_{3}, p_{2}, \bar{p}_{1}\right]^{T}$ is located at position 17 in the sequence. The bottommost vectors of the linked chain blocks have, in this case, a displacement of 17 positions in their occurrence in the sequence.

If the $M$ transformation is applied to a non-topmost block vector $s$, the resulting vector is the sum of $s$ and its right-shifted version,

$$
M s=s+T^{-1} s,
$$

as can directly be derived from the definition of $M$. This relationship is illustrated in Figure 7.2, which shows a complete chain, starting form an odd parity block.

The effect of $M$ on a non-topmost vector $s$ of a block of size $b$ can be described by Pascal's triangle modulo two which is shown in Figure 7.3. A row in the triangle can be obtained by the sum of the previous row with itself right shifted by one. This is the
reason why row $r$ in the triangle with $r<b$ describes the effect of $M^{r}$ on $s$ as

$$
M^{r} s=\sum_{i=0}^{r} T^{-i}\binom{r}{i} s
$$

If the longest chain on parity level zero is considered, the kernel of the starting block consists of a single one, as is the case with the first row of the triangle. The chain block kernels of the longest chain correspond therefore directly to the rows of the triangle.

Additionally, the following corollary can be established for the inverse transformation of $M$.

Corollary 7.6. Within a parity vector space of a maximum-length sequence, a block of even parity of size $b-1$ is transformed by $M^{-1}$ into the bottom $b-1$ vectors of a block of size $b$, which will be either of even or odd parity.

Proof. According to Theorem 7.4, the number of even parity blocks of size $b-1$ equals the sum of the number of even and odd parity blocks of size $b$ within the same parity level. Theorem 7.5 shows that the bottom vectors of every even or odd parity block of size $b$ are mapped by $M$ onto the vectors of an even parity block of size $b-1$. Therefore, the converse that the inverse transformation of $M$ maps every even parity block of size $b-1$ to the bottom vectors of either an even or odd parity block of size $b$ must hold.

An even nonzero vector $s$ is thus mapped by $M^{-1}$ onto a vector $t$ of the previous block. From the fact that the sum of $t$ and its right-shifted version equals $s$, it is easily possible to calculate $t$ bitwise from $s$, starting from the most or least significant bit, as illustrated in Figure 7.4.

On the basis of the chain relations between blocks, it is possible to formulate a reduction function that maps every vector to a specific vector of its chain on parity level zero, by shift operations and applications of $M$. The designated chain vector to which it is reduced could be, for instance, a vector of the starting block or the terminating vector of the chain. Since every chain ends with a block of size one, the number of size-one-blocks equals the number of chains. This means that by omitting the zero vector, the set of elements in $S$ can easily be reduced to a subset of a quarter of its size.

A further useful property of the transformation $M$ is given by the following theorem.


Figure 7.4: $M^{-1}$ transformation. A nonzero even vector $s$ is mapped by $M^{-1}$ onto the corresponding block vector $t$ of the previous block. The calculation of $t$ can easily be accomplished in a serial way, starting from the most or least significant bit.

Theorem 7.7. The transformation matrix $M$ for a primitive polynomial of degree $m$ fulfils the following condition

$$
\begin{aligned}
& M^{2^{t}}[\underbrace{0, \ldots, 0}_{r}, s_{2^{t}-1}, \ldots, s_{0}, 0, \ldots, 0]^{T} \\
= & {[\underbrace{0, \ldots, 0}_{r}, s_{2^{t}-1}, \ldots, s_{0}, s_{2^{t}-1}, \ldots, s_{0}, 0, \ldots, 0]^{T} }
\end{aligned}
$$

wherer $+2^{t+1} \leq m$.

Proof. Induction is used to prove the statement. Since the multiplication of $M$ with a non-topmost block vector $s$ leads to the sum of $s$ and its right-shifted version, the case for $t=0$ holds. It is now assumed that the statement is true for $t$, leading to

$$
\begin{aligned}
& M^{2^{t+1}}[\underbrace{0, \ldots, 0}_{r}, s_{2^{t+1}-1}, \ldots, s_{0}, 0, \ldots, 0]^{T} \\
= & M^{2^{t}} M^{2^{t}}([\underbrace{0, \ldots, 0}_{r}, s_{2^{t+1}-1}^{0, \ldots, s_{2^{t}}, 0, \ldots, 0}]^{T}+[\underbrace{0, \ldots, 0}_{r+2^{t}}, s_{2^{t}-1}, \ldots, s_{0}, 0, \ldots, 0]^{T}) \\
= & M^{2^{t}}(\underbrace{0, \ldots, 0}_{r}, s_{2^{t+1}-1}^{0, \ldots, s_{2^{t}}, 0, \ldots, 0}]^{T}+[\underbrace{\left.0, \ldots, 0, s_{2^{t+1}-1}, \ldots, s_{2^{t}}, 0, \ldots, 0\right]^{T}}_{r+2^{t}} \\
& +[\underbrace{0, \ldots, 0}_{r+2^{t}}, s_{2^{t}-1}^{0, \ldots, s_{0}, 0, \ldots, 0}]^{T}+[\underbrace{0, \ldots,}_{r+2^{t+1}}, s_{2^{t}-1}, \ldots, s_{0}, 0, \ldots, 0]^{T}) \\
= & ([\underbrace{0, \ldots, 0}_{r}, s_{2^{t+1}-1}, \ldots, s_{2^{t}}^{0, \ldots, 0}, \underbrace{0, \ldots 2^{t+1}-1}_{2^{t}}, \ldots, s_{2^{t}}, 0, \ldots, 0]^{T}
\end{aligned}
$$

$$
\begin{aligned}
& +[\underbrace{0, \ldots, 0}_{r+2^{t}}, s_{2^{t}-1}, \ldots, s_{0}, \underbrace{0, \ldots, 0}_{2^{t}}, s_{2^{t}-1}, \ldots, s_{0}, 0, \ldots, 0]^{T}) \\
= & {[\underbrace{0, \ldots, 0}_{r}, s_{2^{t+1}-1}^{0, \ldots, s_{0}}, s_{2^{t+1}-1}, \ldots, s_{0}, 0, \ldots, 0]^{T} . }
\end{aligned}
$$

A direct consequence of the theorem is the following corollary.
Corollary 7.8. For the transformation matrix $M$ of degree $m$ and $0 \leq t \leq\left\lfloor\log _{2} m\right\rfloor$ it follows

$$
M^{2^{t}-1}[1,0, \ldots, 0]^{T}=[\underbrace{1, \ldots, 1}_{2^{t}}, 0, \ldots, 0]^{T} .
$$

Proof. With Theorem 7.7 it follows

### 7.2.6 Base Transformations

The parity subspaces $P_{l, 0}$ and $P_{l}$ of the vector space $S$ for $0 \leq l \leq m-1$, can each be characterised by a distinct basis. Their dimensions coincide with the sizes of their largest blocks according to Theorem 7.2 and Theorem 7.4. Each of these blocks consists, furthermore, of a set of linearly independent vectors, for the vectors are generated by a single kernel shifted from one vector end to the other. It follows that the vectors of the unique even parity block of size $m-l-1$ form a basis for the vector space $P_{l, 0}$. Similarly, the distinct basis for $P_{l}$ is formed by the block of odd parity of size $m-l$. The distinct basis will be referred to as base blocks in what follows.

The blocks of a parity set have a fixed location within the sequence of LFSR states. It is possible to align the blocks of $P_{l, 0}$ or $P_{l}$ on parity level $l$ with blocks of a parity set of a lower level by advancing the blocks of one parity set by a certain number of steps forward or backward in their occurrence in the sequence; in other words, $P_{l, 0}$ or $P_{l}$ can be mapped onto $P_{\bar{l}, 0}$ or $P_{\bar{l}}$ with a linear transformation $T^{x}$, where $0 \leq \bar{l}<l \leq m-1$. The vector space $P_{l}$ will be of particular interest and is thus considered solely in what follows; the principle can, however, be applied in the same way to $P_{l, 0}$. To compute the transformation $T^{x}$ for an alignment of all the blocks of $P_{l}$ with equally-sized blocks
of $P_{\bar{l}}$, where $0 \leq \bar{l}<l \leq m-1$, it is only necessary to determine the displacement $x$ from the base block of $P_{l}$ to an equally-sized block of $P_{\bar{l}}$ as will be explained in what follows.

The base block of $P_{l}$ will be denoted by $B_{l}$. It is now assumed that the transformation $T^{x}$ transforms this block into an equally-sized block on parity level $\bar{l}$; this new block describes an isomorphic representation of $P_{l}$ and will be denoted as the base block $B_{\bar{l}}$. If the topmost vector of a base block is included in a linear combination of the base vectors, then the resulting vector will be a topmost vector of a block, as only a kernel in a rightmost position can set the rightmost bit to one. Similarly, the bottommost base block vector needs to contribute to the linear combination to obtain a vector that constitutes a lower block boundary. This is one explanation why the aligned blocks will be equally-sized in each case.

Furthermore, there are two options for the base block $B_{\bar{l}}$. It can be either of even or odd parity. If it is of even parity, then any linear combination of the base vectors will result in an even parity vector. The blocks of even and odd parity of $P_{l}$ would thus be purely aligned with even parity blocks on parity level $\bar{l}$.

If the base block $B_{\bar{l}}$ is of odd parity as is $B_{l}$, then each two aligned blocks will have the same parity. This is due to the fact that the parity of a vector obtained through a linear combination of odd vectors is odd if the number of vectors is odd, and even otherwise. Since $T^{x}$ is an isomorphism, the number of base vectors involved in the linear combination of aligned vectors is the same and, therefore, also their parities.

The base block $B_{l}$ is of size $m-l$. It can only be aligned with a single block of the same size on the previous parity level $\bar{l}=l-1$. This block is of even parity according to Theorem 7.4. The vector spaces of smaller parity levels $\bar{l}$ with $0 \leq \bar{l} \leq l-2$ contain $2^{l-\bar{l}-2}$ even and $2^{l-\bar{l}-2}$ odd parity blocks of size $m-l$. Hence, the number of possible alignments increases exponentially with the difference between the two parity levels $l$ and $\bar{l}$.

An interesting alignment is achieved if each vector space $P_{l}$ for $1 \leq l \leq m-1$, is aligned with its previous vector space $P_{\bar{l}}$, where $\bar{l}=l-1$. Table 7.1 shows on its right-hand side this kind of alignment of the parity vector spaces of the sample sequence. The number of blocks in a parity vector space is halved with every step into a higher parity level according to Theorem 7.4. Furthermore, two parity vector spaces on adjacent parity levels can only be aligned in a single way, where the higher level odd parity base block is aligned with the lower parity level even parity block of the same size, as described earlier. The lower level even parity block, is at the same
time, a base for $P_{l-1,0}$, which means that $P_{l-1,0}$ is aligned with $P_{l}$. In this constellation, an even parity block is aligned, in half of the cases, with an even parity block of the next higher level, and in the other cases with an odd parity block. Odd parity blocks, in contrast, do not have any corresponding block on the higher parity level. They are in this sense terminating blocks that lie on top of even parity blocks in the stack of parity levels.

Further insights into the nature of the block alignments is given by the following theorem.

Theorem 7.9. An odd parity block of size $b$ with $b>1$ on parity level l, links through $M$ to an odd parity block of size $b-1$ on parity level $l+1$, if the two parity vector spaces are aligned.

Proof. Theorem 7.5 shows that an odd parity block of size $b$ with $b>1$ is linked to an even parity block of size $b-1$ within the same parity level $l$. This smaller block can only be aligned with either an even or odd parity block of the next higher parity level $l+1$ as described in this section. It is first assumed that the smaller block on parity level $l+1$ is of even parity. If this is the case, the block would link back to a block of size $b$ of either even or odd parity on the same level $l+1$, according to Corollary 7.6. This block would be aligned with the initial odd parity block of size $b$ on level $l$. Due to the fact that odd parity blocks are terminating blocks that do not align with any further even or odd parity blocks on the next higher level, a contradiction is established. This shows that the supposition is false, which means that the smaller block on parity level $l+1$ must be of odd parity.

An additional result is given by the following theorem.
Theorem 7.10. If a complete alignment of all the parity vector spaces $P_{l}$ for $0 \leq l \leq$ $m-1$ is considered, then all the parity vector spaces on level $l=2^{t}-1$ for $0 \leq t \leq\left\lfloor\log _{2} m\right\rfloor$ are already intrinsically aligned with each other.

Proof. To align two adjacent parity vector spaces $P_{\bar{l}}$ and $P_{l}$ with $0 \leq \bar{l}<l \leq m-1$, it is necessary to determine the displacement between the base block of $P_{l}$ and the biggest even parity block of $P_{\bar{l}}$. Within a single parity vector space, the base block is linked to the biggest even parity block through the transformation $M$, as can be derived from Theorem 7.4 and Theorem 7.5. This implies that in a complete alignment of all vector spaces, the base blocks of vector spaces on adjacent parity levels are also linked through $M$. In other words, the base block of every vector space needs
to be aligned with the equally-sized block of the longest chain on parity level zero, to achieve a complete alignment. The base block for parity level $l$ can be identified through the kernel $[1, \ldots, 1]^{T}$ of size $l+1$. For parity level zero, the base block is equivalent to the starting block of the chain. The bottommost vector of this base block equals $[1,0, \ldots, 0]^{T}$. It follows, with the help of Corollary 7.8 , that the base blocks for parity levels $l=2^{t}-1$ for $1 \leq t \leq\left\lfloor\log _{2} m\right\rfloor$ appear at their corresponding positions in the chain, which means that their vector spaces are intrinsically aligned with the vector space on parity level zero.

The sample sequence with its parity vector spaces in Table 7.1 serves as an illustration for Theorem 7.10. It can be seen that vector spaces on level one and three are already intrinsically aligned with the vector space on parity level zero.

The relations that have been developed in this section form the basis for Section 7.4 that will present the new approach for the computation of the discrete logarithm. The next section advances some of the results established in this section.

### 7.3 Additional Properties

In this section further properties for a maximum-length shift register sequence, generated by a primitive polynomial $p(X)$ of degree $m$, are derived complementing the ones established in Section 7.2.

### 7.3.1 Blocks

The next theorem gives useful information on how smaller blocks on higher parity levels can be generated from a block on parity level zero, where all the blocks share the same parity.

Theorem 7.11. A block of size $b$ on parity level zero is considered. If the bottom $j$ vectors of this block are added together, where $j \leq b$, the bottommost vector of a block of the same parity class on parity level $j-1$ of size $b-j+1$ is obtained.

Proof. Let $s=\left[s_{m-1}, s_{m-2}, \ldots, s_{0}\right]^{T}$ be the bottommost vector of the block on parity level zero. Due to the fact that $s_{0}$ to $s_{b-2}$ are zero, it follows that the addition of the $j$
bottommost vectors of the block can be expressed as the vector

$$
\bar{s}=\left[\begin{array}{c}
s_{m-1}+s_{0}+\cdots+s_{(j-2 \bmod (m))} \\
s_{m-2}+s_{m-1}+\cdots+s_{(j-3 \bmod (m))} \\
s_{m-3}+s_{m-2}+\cdots+s_{(j-4 \bmod (m))} \\
\vdots \\
s_{0}+s_{1}+\cdots+s_{(j-1 \bmod (m))}
\end{array}\right]
$$

It follows that the sub-parities $b_{j-1, i}(\bar{s})=b_{0,0}(s)$ for $0 \leq i \leq j-1$. This means that the parity class of $s$ on level zero is equivalent to the one of $\bar{s}$ on level $j-1$. Since $s$ is a bottommost vector of a block of size $b$, it follows that $s_{m-1}=s_{b-1}=1$; this implies that $\bar{s}_{m-1}=\bar{s}_{b-j}=1$ and $\bar{s}_{b-j-1}$ to $\bar{s}_{0}$ are zero, so that $\bar{s}$ is the bottommost vector of a block of size $b-j+1$.

### 7.3.2 L Transformation

It has been shown in Subsection 7.2.5 that the elements of the maximum-length sequence can be combined to blocks, which can be further connected to $2^{m-2}$ chains on parity level zero. Thus, if every chain is reduced to one of its elements, for instance the last block of size one, the $2^{m}-1$ sequence elements are reduced to about a quarter of their number. Each of the obtained elements can be characterised through the length of the chain on parity level zero to which it belongs. If a complete alignment of parity vector spaces is considered, as described in Subsection 7.2.6, the last block of a chain of length $h$ reaches up to parity level $h-1$, where the odd parity terminating block of size one can be found. Therefore, the $2^{m-2}$ elements will be regarded in what follows either as the blocks of size one on parity level zero, characterised by the length of the chain to which they belong, or equivalently as the odd parity blocks of size one that are spread across the different parity levels.

The following conjecture gives information on how the $2^{m-2}$ elements can be transformed into each other.

Conjecture 7.12. For a defining polynomial of degree $m$ and parity level $l$ with $1 \leq$ $l \leq m-3$, there exist $\min \left(2^{l-1}, 2^{m-3-l}\right)$ linear transformations of the form $T^{x}$, such that every odd parity block of size one on level l is mapped by one of the transformations onto an element of an even parity block on the same level.

It is possible to transform all the $2^{m-2}$ odd parity blocks of size one into the single block on the highest parity level $m-1$ with the help of this conjecture. Starting from
an arbitrary odd parity block of size one on a certain parity level, the idea is to apply a transformation of the form $T^{x}$ that will result in an odd parity block of size one on a higher parity level. This process will be repeated until the highest level has been reached. In regard to the transformations, several cases need to be distinguished.

First of all, the considered odd parity block of size one may be located on parity level zero. In this case, it is possible to transform the block to a block of the same parity and size but on a higher parity level in a simple way. First of all, an element of an even parity block is obtained by simply going one step forward or backward in the sequence of states. Primitive polynomials exhibit an odd number of terms, such that an addition of the polynomial to a vector changes its parity class on level zero. Furthermore, since the corresponding state vector of a size one block has the leastand most-significant bit set, moving one step forward or backward in the sequence implies an addition of the polynomial and, therefore, a change in parity class. The obtained even parity block element is either the first or the last vector of a block, depending on the direction of the step. This vector can now be reduced to the last element of the its chain, which will be an even parity block of size one. If the complete parity vector space alignment is taken into account, there will be a parity level, where the terminating block of size one is located. Since the last element of the considered chain is of even parity, the terminating block is on parity level one or higher; a transformation to this block completes the algorithm step, since the odd parity block of size one of a higher parity level has been reached.

If the odd parity block of size one that is to be transformed to a higher level is located on level $l$ with $1 \leq l \leq\left\lceil\frac{m-2}{2}\right\rceil-1$, a maximum number of $2^{l-1}$ transformations would need to be tested until an even parity block on the same level has been obtained according to Conjecture 7.12. This block can then be reduced to an odd parity block of size one on a higher level in the same way as has been described in the previous case.

The last case concerns parity levels $l$, where $\left\lceil\frac{m-2}{2}\right\rceil \leq l \leq m-3$. Since the number of odd parity blocks of size one accounts for $2^{m-l-3}$, which is less or equal to $2^{l-1}$, it is possible to select $2^{m-l-3}$ transformations that each map a block directly onto the block of the highest level. Therefore, a maximum number of $2^{m-l-3}$ transformations would need to be tested, until the highest level has been reached.

Conjecture 7.12 can be further refined for parity level two as follows.
Conjecture 7.13. For a defining polynomial $p(X)$ of degree $m$, where $m \geq 6$, there exist two linear transformations of the form $T^{x}$ that map each, one half of the odd parity blocks of size one on parity level two, onto elements of even parity blocks on the same parity
level. The first transformation has an invariant first row of the form $[0,0,1, \ldots, 1,1,0]$ for all primitive polynomials, so that the entire matrix can be specified as

$$
L_{2 a}=\left[\begin{array}{ccccccc}
0 & 0 & 1 & \cdots & 1 & 1 & 0 \\
\overline{p_{m-1}+p_{m-2}} & 0 & p_{m-1} & \cdots & \bar{p}_{m-1} & \bar{p}_{m-1} & 1 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\
1 & \bar{p}_{1} & \bar{p}_{1} & \cdots & p_{1} & \overline{p_{m-1}+p_{1}} & \frac{\overline{p_{2}+p_{1}}}{0} \\
1 & 1 & \cdots & 1 & 0 & \overline{p_{m-1}+p_{1}}
\end{array}\right] .
$$

With the knowledge of this first transformation, the second one can be obtained as $L_{2 b}=L_{2 a}+\operatorname{diag}(1, \ldots, 1)$.

The following theorem introduces the $L$ transformation.
Theorem 7.14. The linear transformation

$$
L=\left[\begin{array}{cccccc}
\bar{p}_{m-1} & 1 & 0 & \cdots & 0 & 1 \\
\bar{p}_{m-2} & 1 & 1 & \cdots & 0 & p_{m-1} \\
p_{m-3} & 1 & 1 & \cdots & 0 & p_{m-2} \\
p_{m-4} & 0 & 1 & \cdots & 0 & p_{m-3} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
p_{2} & 0 & 0 & \cdots & 1 & p_{3} \\
p_{1} & 0 & 0 & \cdots & 1 & \bar{p}_{2} \\
1 & 0 & 0 & \cdots & 1 & \bar{p}_{1}
\end{array}\right]
$$

maps odd parity blocks of size one on parity level one onto elements of even parity blocks on the same parity level for primitive polynomials $p(X)$ of degree $m \geq 4$.

Proof. Let $s$ be a vector of length $m$ that forms an odd parity block of size one on parity level one, i.e. $s \in P_{1,1}$ and $s_{0}=s_{m-1}=1$. Since $p(X)$ is a primitive polynomial of degree $m$, it exhibits an odd number of terms. Furthermore, if $m$ is assumed to be even, it follows that

$$
\begin{aligned}
& b_{1,0}(L s)=\sum_{i=0}^{m-1} p_{i}+\sum_{i=0}^{\frac{m}{2}-2} s_{2 i+1}=\sum_{i=0}^{m-1} p_{i}+b_{1,0}(s)+s_{m-1}=0+1+1=0 \quad \text { and } \\
& b_{1,1}(L s)=\sum_{i=0}^{m-1} p_{i}+\sum_{i=1}^{\frac{m}{2}-1} s_{2 i}=\sum_{i=0}^{m-1} p_{i}+b_{1,1}(s)+s_{0}=0+1+1=0 .
\end{aligned}
$$

Table 7.2: $L$ transformation. It is indicated to how many different chains on parity level zero of a certain length, the blocks of odd parity on parity level one are mapped to by $L$ according to Conjecture 7.15 for the polynomial degree $m$ with $5 \leq m \leq 12$.

| m | Chain Length |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 5 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7 | 1 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 8 | 2 | 3 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 9 | 4 | 6 | 3 | 2 | 0 | 0 | 1 | 0 | 0 | 0 |
| 10 | 8 | 12 | 6 | 3 | 2 | 0 | 0 | 1 | 0 | 0 |
| 11 | 16 | 24 | 12 | 6 | 3 | 2 | 0 | 0 | 1 | 0 |
| 12 | 32 | 48 | 24 | 12 | 6 | 3 | 2 | 0 | 0 | 1 |

For the case that $m$ is odd, it can be shown that

$$
\begin{aligned}
& b_{1,0}(L s)=p_{0}+\sum_{i=0}^{m-1} p_{i}+\sum_{i=1}^{\frac{m-1}{2}-1} s_{2 i}=p_{0}+\sum_{i=0}^{m-1} p_{i}+b_{1,0}(s)=1+0+1=0 \quad \text { and } \\
& b_{1,1}(L s)=\sum_{i=1}^{m-1} p_{i}+\sum_{i=0}^{\frac{m-1}{2}-1} s_{2 i+1}=\sum_{i=1}^{m-1} p_{i}+b_{1,1}(s)=1+1=0 .
\end{aligned}
$$

If both cases for $m$ are considered together, it follows that $L s \in P_{1,0}$.
A further important property of the $L$ transformation is given by the following conjecture.

Conjecture 7.15. The linear transformation $L$ as defined in Theorem 7.14, maps all $2^{m-4}$ odd parity blocks of size one on parity level one onto half the number of chains on parity level zero, i.e. $2^{m-5}$ different chains, for $m \geq 5$. These $2^{m-5}$ chains include the longest chain, $\left\lfloor\frac{1+2^{m-6}}{2}\right\rfloor$ chains of length 3 for $m \geq 6$, and $2^{m-h-2}-\left\lfloor 2^{m-h-4}\right\rfloor$ chains of length $h$, where $4 \leq h \leq m-3$, as illustrated by Table 7.2.

The sequence given in Table 7.1 is used as an example with $m=5$ to illustrate Theorem 7.14 and Conjecture 7.15. There are two blocks of size one on parity level one of the sequence, which are located at positions 14 and 26. The $L$ transformation can be indicated for this example as $L=T^{10}$. It can be verified that an application of $L$ leads to the vectors at positions 24 and 5 , which have even parity on level one.

Furthermore, the vectors are part of one and the same chain on level zero as predicted by Conjecture 7.15, which means that the number of starting vectors has been reduced to half the number of chains.

The reduction of odd parity blocks of size one on level one that is obtained through the $L$ transformation as outlined in Conjecture 7.15, can be further increased if the degree $m$ of the determining polynomial is such that $m \geq 10$ with the following conjecture.

Conjecture 7.16. The elements that are obtained by mapping the odd parity blocks of size one on level one through L for a primitive polynomial of degree $m$ are considered. It is assumed that the resulting elements are each reduced to an element of their corresponding chain on level zero. The chain element to which the elements are reduced, is assumed to be equal for chains of the same length. According to Conjecture 7.15, there are $2^{m-5}$ different elements that are considered, grouped by the length of the chain to which they belong.

For $h$ with $7 \leq h \leq m-3$ and $\bar{h}$ with $4 \leq \bar{h} \leq h-3$, there exist exactly $\left\lceil 3 \cdot 2^{h-\bar{h}-4}\right\rceil$ transformations of the form $T^{x}$ that map each the elements that belong to chains of length $h$ onto elements that belong to chains of length $\bar{h}$.

At this point a small example using a primitive polynomial of degree $m=12$ is provided in Table 7.3 to illustrate the element reductions that can be achieved efficiently with the presented results. The starting point are the $2^{m-2}$ odd parity blocks of size one to which all the nonzero field elements can easily be reduced using the chain properties as described in Subsection 7.2.5. Each of those blocks corresponds to a certain parity level and the number of blocks per level, according to Theorem 7.4, is indicated in Table 7.3. Blocks on level zero can be transformed to blocks on higher levels by going one step forward or backward in the sequence of states as described earlier in the subsection, which can be modelled by $T$ or $T^{-1}$. For blocks on level two, $L_{2 a}$ and $L_{2 b}$ as introduced in Conjecture 7.13 can be used to map the blocks to blocks of higher levels. The odd parity blocks of a certain level $l$ are contained in $P_{l, 1}$ as introduced in Subsection 7.2.3. Furthermore, $P_{l}$ can be aligned with $P_{\bar{l}}$, for $\bar{l} \leq l-2$, in such a way that blocks of the same parity align as described in Subsection 7.2.6. This means that for every level $l$, where $3 \leq l \leq m-1$, a transformation can be determined that will map the blocks on level $l$ onto the blocks on level one. The initial number of $2^{m}-1$ nonzero field elements has thus been reduced to the $2^{m-4}$ odd parity blocks of size one on level one that belong to $P_{1,1}$. A further reduction to half of the number of blocks is achieved if the $L$-transformation is applied as described in Conjecture 7.15.

Table 7.3: Illustration of the achievable reduction of the nonzero elements of a finite field using the example of a determining polynomial of degree 12 . For every parity level, the number of odd parity blocks of size one is indicated to which the $2^{12}-1$ elements can be initially reduced using the chain property. The blocks on level zero can be transformed to higher level blocks with the help of $T$ or alternatively $T^{-1}$. In a next step, blocks on level two can be mapped to higher level blocks with $L_{2 a}$ and $L_{2 b}$. Blocks of level three or higher can then be mapped onto the blocks of level one by considering the alignment of $P_{1,1}$ with $P_{3,1}$ to $P_{11,1}$. The blocks on level one can then be mapped by $L$ onto half the number of blocks on higher levels. According to Conjecture 7.16, the resulting blocks on level 5 to 11 can be mapped, for instance, to the blocks on level two with a single transformation for each level.

| Level | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Blocks | 512 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | 0 | 1 |
| $T / T^{-1}$ | 0 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | 0 | 1 |
| $L_{2 a} / L_{2 b}$ | 0 | 256 | 0 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | 0 | 1 |
| $P_{1,1}$ | 0 | 256 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| $L$ | 0 | 0 | 32 | 48 | 24 | 12 | 6 | 3 | 2 | 0 | 0 | 1 |
| $P_{2,1}$ | 0 | 0 | 32 | 48 | 24 | 12 | 0 | 0 | 0 | 0 | 0 | 0 |

The resulting blocks that are located on level two or higher, can be mapped to the blocks on level two to five according to Conjecture 7.16. If the number of blocks that are obtained through the $L$-transformation on levels two to five are summed up, $29 \cdot 2^{m-10}$ blocks are obtained for $m \geq 10$. Thus, the number of nonzero field elements has been reduced, with a linear number of transformations in the size of the polynomial degree, from $2^{m}-1$ to $29 \cdot 2^{m-10}$ for $m \geq 10$.

### 7.3.3 $Q$ Transformation

The alignments of parity vector spaces $P_{0}$ and $P_{2}$ are considered. For these two vector spaces, there exist two alignments as described in Subsection 7.2.6. One alignment maps blocks of the same parity onto each other as can be seen in the left-hand side of the example given in Table 7.1, where the blocks of $P_{0}$ and $P_{2}$ that align have the same parity. The second possible alignment maps all the blocks of $P_{2}$ onto even parity blocks of $P_{0}$, which can be seen in the right-hand side of Table 7.1. The two alignments are determined by the locations of the odd and even parity blocks of size $m-2$ in the sequence of states. The $Q$ transformation is defined as the transformation that maps the odd parity block of size $m-2$ onto the even parity block of size $m-2$. Since $P_{2}$ has a quarter of the odd parity elements of $P_{0}$, it follows that $Q$ maps at least a
quarter of the odd parity sequence elements onto even parity sequence elements. For the example in Table 7.1, $Q=T^{25}$.

Further information regarding the $Q$ transformation is established in what follows.
Theorem 7.17. For every parity level $l$, where $0 \leq l \leq m-3$, $Q$ maps the only odd parity block of size $m-l-2$ onto the only even parity block of the same size on the same parity level.

Proof. Let $s$ be the bottommost vector of the only odd parity block of size $m-2$ on parity level zero, and $\bar{s}$ the corresponding vector of the even parity block. By definition of $Q$, it follows that $Q s=\bar{s}$. The bottommost vector of the only odd parity block of size $m-2-l$ on parity level $l$ with $0 \leq l \leq m-3$, can be expressed using Theorem 7.11 as $\sum_{i=0}^{l} T^{-i} s$. If this vector is mapped by $Q$, it follows that

$$
Q \sum_{i=0}^{l} T^{-i} s=\sum_{i=0}^{l} T^{-i} \bar{s}
$$

which describes the bottommost vector of the only even parity block of size $m-2-l$ on parity level $l$.

This result implies that the $Q$ transformation allows to map a quarter of the odd parity blocks of parity level $l$ with $0 \leq l \leq m-3$ onto odd parity blocks of parity level $l+2$ as explained in Figure 7.5. A quarter of the odd parity blocks on level $l$ is mapped by $Q$ onto equally-sized even parity blocks on the same level. These even parity blocks have their terminating block two levels higher up, if a complete block alignment is considered as outlined in Subsection 7.2.6. Thus, a quarter of the odd parity blocks can easily be upgraded to equally-sized blocks two levels higher up.

The only odd parity blocks of size one on the different parity levels, to which all the other vectors can easily be reduced to, are considered. Furthermore, it is assumed that all the vector spaces have been completely aligned. With the arguments provided in Figure 7.5, it follows that up to level $m-5$, a quarter of the blocks are mapped by $Q$ onto blocks $\Delta=2$ levels higher up. Furthermore, the only block on level $m-3$ is mapped by $Q$ onto the only block on level $m-1$. In other words, for every level $l$, where $l \geq 2, Q^{-1}$ maps all the corresponding blocks onto blocks $\Delta=2$ levels further down as illustrated for $m=10$ in Table 7.4.

Similarly to the $Q$ transformation, it is possible to specify transformations that connect blocks that are further than $\Delta=2$ levels apart. If a level difference of $\Delta=3$ is considered, for instance, two transformations exist, $E_{0}$ and $E_{1}$, where the

Figure 7.5: $Q$ transformation. It maps the only odd parity block of size $m-l-2$ on parity level $l$, where $0 \leq l \leq m-3$, onto the only even parity block of the same size on the same parity level. A vector of odd parity for a specific level is indicated with an $x$, whereas an even parity vector is indicated with an $o$. If it is assumed that the base block of $P_{l+2}$ is aligned with the only odd parity block of size $m-l-2$ of $P_{l}$, it follows that every odd parity block of $P_{l+2}$, is aligned with an equally-sized odd parity block of $P_{l}$. If all the blocks of $P_{l+2}$ are advanced by $Q$, they will be in the appropriate alignment with the blocks of $P_{l}$ if a complete block alignment of all parity vector spaces is considered, since the base block of $P_{l+2}$ would need to be aligned with the only even parity block of size $m-l-2$ of $P_{l}$. This implies that $Q$ maps a quarter of the odd parity blocks of $P_{l}$ onto even parity blocks, which have their terminating block in a complete block alignment two levels further up in $P_{l+2}$.

Table 7.4: $Q$ transformation. A defining polynomial of degree $m=10$ is considered. For every parity level the number of odd parity blocks of size one is shown. Under the assumption that the parity vector spaces are completely aligned, $Q^{-1}$ maps the blocks of a parity level $l, l \geq 2$, onto blocks two levels further down.

| Level | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | 128 | $Q^{-1}$ | 32 | $\stackrel{Q^{-1}}{\longleftarrow}$ | 8 | $\stackrel{Q^{-1}}{\longleftarrow}$ | 2 |  | 0 |  |
| Blocks |  | 64 | $\stackrel{Q^{-1}}{\longleftarrow}$ | 16 | $\stackrel{Q^{-1}}{\longleftarrow}$ | 4 | $\stackrel{Q^{-1}}{\longleftarrow}$ | 1 | $\underset{Q^{-1}}{\longleftarrow}$ | 1 |

Table 7.5: $E$ transformation. A defining polynomial of degree $m=10$ is considered. For every parity level the number of odd parity blocks of size one is shown. Under the assumption that the parity vector spaces are completely aligned, $E^{-1}$ indicates how $E_{0}^{-1}$ or $E_{1}^{-1}$ map the blocks of a parity level $l, l \geq 3$, onto blocks three levels further down.

inverse transformations maps the odd parity blocks of size one of each level, onto corresponding blocks three levels further down, as depicted in Table 7.5. In general, the number of possible transformations increases exponentially with the difference in levels $\Delta$, since the number of corresponding alignments increases in the same way as explained in Subsection 7.2.6.

If a transformation is applied which maps the blocks of each level $l, l \geq \Delta$, onto blocks $\Delta$ levels further down, a block clustering is obtained. The following theorem specifies the number of clusters that is obtained in this way.

Theorem 7.18. For a defining primitive polynomial of degree $m$, a transformation is considered that maps odd parity blocks of size one of each level l onto corresponding blocks $\Delta$ levels further down, where $\Delta \leq l \leq m-1$ and $2 \leq \Delta \leq m-1$. The resulting number of clusters equals

$$
\begin{equation*}
\sum_{i=0}^{\Delta-3}\left\lfloor 2^{m-i-3}\right\rfloor+\left\lceil 2^{m-\Delta-1}\right\rceil+\left\lfloor 2^{m-\Delta-2}\right\rfloor \tag{7.7}
\end{equation*}
$$

Proof. Higher parity blocks are mapped by the transformation onto blocks of lower parity, which means that it is sufficient to determine the number of blocks on the lowest levels. Since the difference in levels between mapped blocks accounts for $\Delta$, the sum of the number of the blocks on the lowest $\Delta$ levels equals the number of resulting clusters. From level 0 to $m-3$, the number of odd parity blocks of size one on level $l$ equals $2^{m-l-3}$. There is no block of this type on level $m-2$, but one on level $m-1$.

By using transformation $Q$ to map higher parity blocks onto lower parity blocks, the number of clusters that is obtained equals $\left\lceil 2^{m-3}\right\rceil+\left\lfloor 2^{m-4}\right\rfloor$, since $\Delta=2$. As an example the 192 clusters for $m=10$ are considered which are shown in Figure 7.6.


Figure 7.6: Clustering using the $Q$ transformation for $m=10$. Each coloured box represents an odd parity block of size one, where the colour signifies the corresponding parity level. A connection between two boxes indicates that the higher parity box can be transformed with $Q$ into the lower parity box. For each cluster prototype the number of instances is shown. The total number of clusters is 192.

Table 7.6: Number of clusters obtained using the $Q$ and $E$ transformations for a defining polynomial of degree up to $m=10$.

| Degree | Number of Clusters |
| :---: | :---: |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 4 |
| 6 | 10 |
| 7 | 19 |
| 8 | 37 |
| 9 | 73 |
| 10 | 147 |
| 11 | 297 |

If $Q$ and the two $E$ transformations are considered together, for instance, the number of clusters reduces from 192 to 147 , for the case of $m=10$; the resulting clustering is shown in Figure 7.7. The number of clusters obtained by considering all three transformations for polynomial degrees up to $m=10$ can be found in Table 7.6.


Figure 7.7: Clustering using the $Q$ and $E$ transformations for $m=10$. Each coloured box represents an odd parity block of size one, where the colour signifies the corresponding parity level. A horizontal connection between two boxes indicates that the higher parity box can be transformed with $Q$ into the lower parity box. The other two connection types indicate the relationship for the two $E$ transformations. For each cluster prototype the number of instances is shown. The total number of clusters is 147.

### 7.3.4 Parity Subspaces

This subsection establishes additional relationships between the different parity vector spaces. The following theorem gives further information on the intrinsic alignment of the vector spaces.

Theorem 7.19. A primitive polynomial of degree $m$ is considered. The base block of parity level $l, 0 \leq l \leq m-1$, is intrinsically aligned with a block on level $\bar{l}$, where $0 \leq \bar{l} \leq l$, only if $\frac{l+1}{l+1}$ is an integer. If the integer is even, the parity of the block on level $\bar{l}$ is even, otherwise it is odd.

Proof. Let $s$ be the bottommost vector of the base block on level $l$, so that

$$
s=[\underbrace{1, \ldots, 1}_{l+1}, 0, \ldots, 0]^{T} .
$$

Base blocks are of odd parity and for the sub-parities for level $l$ it follows that $b_{l, i}(s)=1$ for all $i$ with $0 \leq i \leq l$. Now it is assumed that $0 \leq \bar{l} \leq l$ and

$$
\frac{l+1}{\bar{l}+1}=r .
$$

If $r$ is an integer,

$$
b_{\bar{l}, i}(s)=\sum_{j=0}^{r-1} b_{l, i+(\bar{l}+1) j}(s) \equiv r \quad(\bmod 2),
$$

for all $i$ with $0 \leq i \leq \bar{l}$. Therefore, if $r$ is additionally even, all sub-parities on level $\bar{l}$ are also even and $s \in P_{\bar{l}, 0}$, otherwise all sub-parities are odd and $s \in P_{\bar{l}, 1}$.

For the case, where $r$ is not an integer, the following two sub-parities for level $\bar{l}$

$$
\begin{aligned}
& b_{\bar{l}, 0}(s)=\sum_{j=0}^{\lceil r\rceil-1} s_{m-1-j(\bar{l}+1)} \equiv\lceil r\rceil \quad(\bmod 2) \quad \text { and } \\
& b_{\bar{l}, \bar{l}}(s)=\sum_{j=0}^{\lfloor r\rfloor-1} s_{m-1-j(\bar{l}+1)-\bar{l}} \equiv\lfloor r\rfloor \quad(\bmod 2)
\end{aligned}
$$

are different, which implies that $s \notin P_{\bar{l}}$.
This result implies, for instance, that the base block of a parity vector space $P_{l}$ for a primitive polynomial of degree $m$, where $0 \leq l \leq m-1$, is intrinsically aligned with
an even parity block in $P_{0}$ if $l$ is odd. Otherwise, the base block is aligned with an odd parity block in $P_{0}$.

Before the next theorem is proven, the following lemma is introduced.
Lemma 7.20. A defining primitive polynomial of degree $m$ is considered. For $0 \leq r \leq \frac{m}{2}$, it follows

$$
M^{r}[\underbrace{1, \ldots, 1}_{r}, 0, \ldots, 0]^{T}=\left(T^{0}+T^{-r}\right) M^{r-1}[1,0, \ldots, 0]^{T} .
$$

Proof. The relationship follows from

$$
\begin{aligned}
M^{r} \underbrace{1, \ldots, 1}_{r}, 0, \ldots, 0]^{T} & =M^{r}\left(T^{0}+T^{-1}+\cdots+T^{-(r-1)}\right)[1,0, \ldots, 0]^{T} \\
& =\left(T^{0}+T^{-1}+\cdots+T^{-(r-1)}\right) M M^{r-1}[1,0, \ldots, 0]^{T} \\
& =\left(T^{0}+T^{-1}+\cdots+T^{-(r-1)}\right)\left(T^{0}+T^{-1}\right) M^{r-1}[1,0, \ldots, 0]^{T} \\
& =\left(T^{0}+T^{-r}\right) M^{r-1}[1,0, \ldots, 0]^{T} .
\end{aligned}
$$

Theorem 7.21. For a primitive polynomial of degree $m$, all parity vector spaces $P_{l}$ for $0 \leq l \leq m-1$ are considered. Let $r$ be the negative displacement between the base block of $P_{(2 i-1)-1}, 1 \leq i \leq\left\lfloor\frac{m-1}{2}\right\rfloor+1$, and the block in $P_{0}$ that would align with the base block in a complete alignment. It follows that the displacement between the base block of $P_{(2 i-1) 2^{j}-1}$ and the corresponding block in $P_{0}$ in a complete alignment equals $r 2^{j}$ for $0 \leq j \leq\left\lfloor\log _{2} \frac{m}{2 i-1}\right\rfloor$.

Proof. Induction is used to prove the statement. The bottommost vector of the base block of parity vector space $P_{l}$ is

$$
[\underbrace{1, \ldots, 1}_{l+1}, 0, \ldots, 0]^{T} .
$$

In a complete alignment of the parity vector spaces, this vector of $P_{l}$ needs to be aligned with vector $M^{l}[1,0, \ldots, 0]^{T}$ of $P_{0}$ according to Subsection 7.2.5 and Subsection 7.2.6. The following statement is therefore to be proven

$$
T^{r 2^{j}} M^{(2 i-1) 2^{j}-1}[1,0, \ldots, 0]^{T}=[\underbrace{1, \ldots, 1}_{(2 i-1) 2^{j}}, 0, \ldots, 0]^{T}
$$

The case for $j=0$ is fulfilled, since according to the premise

$$
T^{r} M^{(2 i-1)-1}[1,0, \ldots, 0]^{T}=[\underbrace{1, \ldots, 1}_{2 i-1}, 0, \ldots, 0]^{T}
$$

It is now assumed that the statement is true for $j$. With Lemma 7.20 it follows

$$
\begin{aligned}
& T^{r 2^{j+1}} M^{(2 i-1) 2^{j+1}-1}[1,0, \ldots, 0]^{T} \\
= & T^{r 2^{j}} M^{(2 i-1) 2^{j}} T^{r 2^{j}} M^{(2 i-1) 2^{j}-1}[1,0, \ldots, 0]^{T} \\
= & T^{r 2^{j}} M^{(2 i-1) 2^{j}} \underbrace{1, \ldots, 1}_{(2 i-1) 2^{j}}, 0, \ldots, 0]^{T} \\
= & T^{r 2^{j}}\left(T^{0}+T^{-(2 i-1) 2^{j}}\right) M^{(2 i-1) 2^{j}-1}[1,0, \ldots, 0]^{T} \\
= & \left(T^{0}+T^{-(2 i-1) 2^{j}}\right) T^{r 2^{j}} M^{(2 i-1) 2^{j}-1}[1,0, \ldots, 0]^{T} \\
= & \left(T^{0}+T^{-(2 i-1) 2^{j}}\right)[\underbrace{1, \ldots, 1}_{(2 i-1) 2^{j}}, 0, \ldots, 0]^{T} \\
= & \underbrace{1, \ldots, 1}_{(2 i-1) 2^{j+1}}, 0, \ldots, 0]^{T} .
\end{aligned}
$$

It is now possible to specify the transformations between the parity vector spaces in a complete alignment for a defining polynomial of degree $m$ as follows. Let $R_{l}$ denote the transformation that establishes the alignment between the vectors of parity level $l$ and the corresponding vectors on parity level zero, i.e. $R_{l}$ maps the block of size $m-l$ of the longest chain on level zero onto the base block on level $l$; and even more precisely

$$
R_{l} M^{l}[1,0, \ldots, 0]^{T}=[\underbrace{1, \ldots, 1}_{l+1}, 0, \ldots, 0]^{T} .
$$

Due to the fact that every nonnegative integer $l$ can be expressed as $l=(2 i+1) 2^{j}-1$ with $1 \leq i$ and $0 \leq j$, Theorem 7.21 permits every $R_{l}$ with odd $l$ from a particular $R_{\bar{l}}$ with even $\bar{l}$ to be derived:

$$
\begin{array}{cccc}
R_{1}=R_{0}^{2^{1}} & R_{3}=R_{0}^{2^{2}} & \cdots & R_{1 \cdot 2^{j}-1}=R_{0}^{2^{j}} \\
R_{5}=R_{2}^{2^{1}} & R_{11}=R_{2}^{2^{2}} & \cdots & R_{3 \cdot 2^{j}-1}=R_{2}^{2^{j}} \\
R_{9}=R_{4}^{2^{1}} & R_{19}=R_{4}^{2^{2}} & \cdots & R_{5 \cdot 2^{j}-1}=R_{4}^{2^{j}} \\
R_{13}=R_{6}^{2^{1}} & R_{27}=R_{6}^{2^{2}} & \cdots & R_{7 \cdot 2^{j}-1}=R_{6}^{2^{j}} \\
\vdots & \vdots & \ddots & \vdots \\
R_{(2 i-1) 2^{1}-1}=R_{(2 i-2)}^{2^{1}} & R_{(2 i-1) 2^{2}-1}=R_{(2 i-2)}^{2^{2}} & \cdots & R_{(2 i-1) 2^{j}-1}=R_{(2 i-2)}^{2^{j}} .
\end{array}
$$

Every parity vector space is trivially aligned with itself, so that $R_{0}=T^{0}$; this implies that $R_{2^{j}-1}=0$ for all $j \geq 0$, and confirms the result of Theorem 7.10. From the definition of the $Q$ transformation in Subsection 7.3.3 it follows that $R_{2}=Q^{-1}$.

### 7.4 Computing Discrete Logarithms

In this section a new approach is presented to compute the discrete logarithm $k$ of a polynomial $s(X)$ to the base $X$, such that $X^{k} \equiv s(X)(\bmod p(X))$, where $p(X)$ is a primitive polynomial of degree $m$ as described in Section 7.1. This approach is based on the newly established properties from Section 7.2. The discrete logarithm $k$ corresponds in this scenario to the position of the corresponding state vector $s$ of $s(X)$ in the sequence of shift register states, since it has been defined in Subsection 7.2.1 that the state at sequence position zero equals $s[0]=[0, \ldots, 0,1]^{T}$.

The main operating principle of the proposed algorithm comprises in the application of repeated linear transformations of the form $T^{x}$ on the initial vector $s$, until a certain designated vector $\bar{s}$ is reached. It is known by how many steps each transformation in the algorithm advances a vector in the sequence of states. Moreover, as the position of the designated vector $\bar{s}$ in the sequence is also known, it is possible to work out the starting position of $s$, once $s$ has been transformed into $\bar{s}$. In what follows, $\bar{s}$ is chosen to be $\bar{s}=[0, \ldots, 0,1]^{T}$, which means that if $k$ steps are necessary to reach the vector from the starting position of $s$, the discrete logarithm will simply correspond to the value $(-k)$.

The transformations that are applied to $s$ to reach $\bar{s}$, revolve around two key functions that are executed repeatedly: reduction and mapping. In the reduction step, $s$ is reduced to a predefined element of its chain on parity level zero as described in Subsection 7.2.5. This element is chosen to be the first element of the chain in what follows, which ensures that $\bar{s}$ can be reached as it is the topmost vector of the biggest odd parity block and, therefore, the first element of the longest chain.

In the mapping step, a linear transformation $T^{x}$ is applied to the reduced vector. The set of reduced vectors has been divided into partitions according to the length of the chain to which they belong. Each partition features its own specific linear transformation that is used for the mapping. To minimise the runtime of the algorithm, it is necessary to find an optimal set of linear transformations.

The algorithm has been split into an initialisation and a computation phase which are explained in more detail below. Vectors all have a length of $m$ if not specified
otherwise. To determine the leading and trailing zeros of a vector, functions $l z()$ and $t z()$ have been introduced.

### 7.4.1 Initialisation

During the initialisation phase, a set of algorithm parameters is computed that is specific to the underlying defining polynomial $p(X)$. The pseudocode for this phase is shown in Algorithm 7.1.

```
Algorithm 7.1 Initialisation pseudocode.
    procedure \(\operatorname{INIT}(p, m, f)\)
        \(T=\left[p \left\lvert\, \frac{I_{m-1}}{0}\right.\right]\)
        for \(i=0\) to \(m-1\) do
            \(p l[i]=0\)
        end for
        \(k=0\)
        \(s=[1,0, \ldots, 0]^{T}\)
        repeat
            \(s=T s\)
            \(k=k+1\)
            if \(\left(s=[0, \ldots, 0,1,1]^{T}\right) p m=k-1\)
            if \(\left(s=[0, \ldots, 0,1, \ldots, 1]^{T}\right)\) then
                \(p l[m-1-l z(s)]=p l[m-1-l z(s)]+k\)
            end if
            if \(\left(s=[0, \ldots, 0,1,0, \ldots, 0,1]^{T}\right)\) then
                \(p l[m-1-l z(s)]=p l[m-1-l z(s)]-k\)
            end if
        until \(s=[1,0, \ldots, 0]^{T}\)
        globalise \(T, m, f, p l, p m\)
    end procedure
```

In a first step, the matrix $T$ is set up according to (7.2), to model the multiplication of a polynomial by $X$ or equivalently a single shift operation of the LFSR.

Subsequently, the sequence of LFSR states is traversed to locate the positions of particular states within the sequence that will define corresponding transformations on the basis of $T$. These include the matrix $M$ as defined by Theorem 7.5, which is modelled as $M=T^{p m}$. Furthermore, one variant of the reduction function of the main code requires the set of base transformations that aligns all the adjacent parity vector spaces. To translate a vector on parity level $l$ into its corresponding counterpart on the aligned parity level $l+1$, the transformation $T^{p l[l+1]}$ is used. These transformations
can be derived from the relative positioning of the base blocks of the two vector spaces as described in Subsection 7.2.6. The base blocks are identified through their kernels according to Theorem 7.4.

Parameter $f$ that is supplied to the initialisation procedure is a vector of size $m-2$ with components ranging from 1 to $2^{m}-2$ to describe the transformations that are used for the mapping function.

### 7.4.2 Main Computation

The algorithm consists essentially of the reduction and mapping steps that are executed in a loop as shown in the pseudocode in Algorithm 7.2 and explained in what follows in more detail. The variable $k$ keeps track of the number of shifts that are

```
Algorithm 7.2 Main function pseudocode.
    function \(\operatorname{LOG}(s)\)
        \(k=0\)
        loop
            \(\{s, k\}=\operatorname{REDUCE}(s, k)\)
            if \((l z(s)=m-1)\) return \(-k\)
            \(\{s, k\}=\operatorname{MAP}(s, k)\)
        end loop
    end function
```

applied to the starting vector $s$ during the reduction and mapping steps. Once the terminating vector $\bar{s}$ has been reached, which is identified by a chain length of $m-1$, the algorithm returns the discrete logarithm. Since the reduction function reduces $s$ to the first vector of the first block of its chain, the length of the chain is simply determined by counting the number of leading zeros of the reduced vector and adding one to the result. Alternatively, $s$ can directly be compared to $\bar{s}$, instead of checking for the appropriate chain length.

## Reduction

The reduction function is based on the chain property of maximum-length shift register sequences as introduced in Subsection 7.2.5. In what follows it will by tailored to reduce an input vector $s \in S^{*}$ to the first vector of the starting block of the chain on parity level zero to which $s$ belongs. Two different implementation variants are presented for this purpose.

The first implementation relies on the fact that each chain element can be regarded as the bottommost element of a stack, on top of which blocks from aligned parity vector spaces of higher parity levels are located as derived in Subsection 7.2.6. It can be further deduced from Theorem 7.9 that the stack grows with the distance from the starting block of the chain. The first block of a chain is always of odd parity and therefore at the same time the terminating block for the first stack. This means that the first stack ends already on parity level zero. If the chain has a second element, for instance, its stack would reach up to parity level one. In this way, the height of a stack is an indicator for the position within the chain. Therefore, a method to locate the starting block of the chain, is to determine the height of the stack to which $s$ belongs by identifying the parity level of the terminating block of the stack. For this purpose it is necessary to know the displacements of the different parity vector spaces against each other, to climb up the stack. Once the height of the stack is determined, the offset to the first vector of the starting block can be computed and $s$ transformed accordingly as demonstrated with the pseudocode in Algorithm 7.3.

```
Algorithm 7.3 Reduction function variant 0 pseudocode.
    function REDUCE_ \(0(s, k)\)
        \(l=0\)
        \(t=s\)
        while \(t \in P_{l, 0}\) do
            \(l=l+1\)
            \(t=T^{p l[l]} t\)
        end while
        \(k=k-l *(p m+1)-t z(s)\)
        \(s=T^{-l *(p m+1)} T^{-t z(s)} s\)
        return \(\{s, k\}\)
    end function
```

The second variant of the reduction function is based on the fact that only the first block of a chain is of odd parity. Through repeated applications of $M^{-1}=T^{-p m}$, it is possible to traverse the chain backwards and test each chain element for odd parity until the starting block has been reached. In a last step, it is necessary to shift the vector to the topmost position within the starting block. The pseudocode for this approach is shown in Algorithm 7.4. This second approach is more efficient than the first one, since it dispenses with the need for climbing up the stack of elements. Furthermore, it only needs to check for even parity on parity level zero, instead of on different parity levels.

```
Algorithm 7.4 Reduction function variant 1 pseudocode.
    function REDUCE_1 \((s, k)\)
        while \(s \in P_{0,0}\) do
            \(s=T^{-p m} s\)
            \(k=k-p m\)
        end while
        \(k=k-t z(s)\)
        \(s=T^{-t z(s)} s\)
        return \(\{s, k\}\)
    end function
```

In both variants of the reduction function, a linear search is performed. The first variant searches for the terminating block of the stack on successive parity levels, whereas the second variant traverses the chain element by element until the starting block is found. It is possible to speed up the process in both cases by performing a binary search instead, for example.

For the first reduction variant, it is possible to use the result of Theorem 7.10 for further simplification. Since all the vector spaces on levels $l=2^{t}-1$ for $1 \leq t \leq$ $\left\lfloor\log _{2} m\right\rfloor$ are intrinsically aligned with the vector space on level zero, the starting vector $s$ can be directly used to test the parity for those specific parity levels, without the need of applying alignment transformations. This is particularly advantageous in a hardware implementation, where the parity of $s$ can be tested for different levels in parallel.

In comparison to the presented two reduction functions, a reduction to the last vector of the chain, for instance, can easily be realised by determining the size of the block and the position within the block to which $s$ belongs. With this information the offset to the last element can directly be computed, since the block sizes within the chain are always decreasing by one as outlined in Subsection 7.2.5. However, this information alone would not permit a deduction of the total chain length, which is required by the mapping function in the next step. For this purpose one of the first two functions would need to be employed in combination as they provide a mechanism for determining the chain length.

It is, for the algorithm, essentially irrelevant to which vector of the chain is reduced, since different chain vectors can be transformed into each other through a linear transformation of the form $T^{x}$, which can be taken care of by the mapping step without any extra cost. Therefore, the second reduction function variant seems to be, at present, the most favourable.

## Mapping

The mapping function applies a simple linear transformation of the form $T^{x}$ to the input vector. This transformation has been made dependent on the length of the chain to which the vector belongs. In the presented mapping function, each input vector is assumed to be reduced to the first vector of its chain. It is therefore possible to determine, easily, the length of the underlying chain, by counting the number of leading zeros of the input vector and adding one to the result. The pseudocode for the mapping function is shown in Algorithm 7.5, where $f[i]$ denotes the number of shifts that is applied to a vector that belongs to a chain of length $i+1$.

```
Algorithm 7.5 Mapping function pseudocode.
    function \(\operatorname{MAP}(s, k)\)
        \(l=l z(s)\)
        \(s=T^{f[l]} s\)
        \(k=k+f[l]\)
        return \(\{s, k\}\)
    end function
```

In the shift register sequence, there exists exactly one chain of length $h=m$ on parity level zero which triggers the algorithm to terminate. It is therefore desirable for the mapping function to map the reduced elements of every chain, in as few steps as possible, onto the longest chain. Interestingly, a chain of length $h=m-1$ does not exist, for there is no odd parity block of size $m-1$. For every shorter chain length $h$ with $1 \leq h \leq m-2$, there are $2^{m-h-2}$ chain instances present, as this is the number of odd parity blocks of size $h$ according to Theorem 7.4. Since there is only one chain of length $h=m-2$, it is possible to map its reduced element, in a single step, directly onto the longest chain, ideally onto its reduced element. This transformation can be computed from the displacement of the two elements. For all other chain lengths, there is more than one chain present, whose reduced elements cannot be mapped in general in a single step onto the longest chain. The mappings depend on each other and it is necessary to find the best set of transformations that minimises the runtime for a given optimisation goal, such as the worst-case runtime.

### 7.4.3 Implementation Details

In an implementation of the algorithm, two obvious and significant improvements can be made to the pseudocode presented in the previous subsections. Firstly, the
transformation $T^{-t z(s)}$ is used to slide the kernel of $s$ to the least significant bit positions of the vector. This operation can easily be realised as a shift operation in software or hardware and does not require a matrix-vector multiplication.

Secondly, the transformation $T$ is used with a finite number of other exponents in the order of the polynomial degree. To be precise, if the second reduction function variant is considered, $m-2$ transformations of the form $T^{f l[i]}$ for $0 \leq i \leq m-3$, together with the transformation $T^{-p m}$, are required. These specific transformations can be precomputed during the initialisation phase then to be used for the main computation.

### 7.4.4 Example

To illustrate the operation of the algorithm with an example, the primitive polynomial $p(X)=X^{6}+X+1$ is considered as the defining polynomial. The underlying reduction function is assumed to reduce each nonzero state vector of the generated field to the first element of its corresponding chain. These chain elements are associated with nodes as listed in Table 7.7, where they are divided according to their chain length.

The algorithm takes an arbitrary nonzero vector as input and aims to transform it in as few steps as possible into the terminating vector, i.e. node 15 , that corresponds to the longest chain, to deduce the position of the initial state vector within the maximum-length sequence of states. For this purpose the starting vector is reduced at first to one of the vectors listed in Table 7.7 with the help of the reduction function. If the terminating vector has not been reached, the mapping function is used to transform the reduced vector to another nonzero vector, which can then again be reduced to one of the vectors in Table 7.7. The application of the mapping and reduction function is repeated until the terminating vector is reached.

The mapping function defines essentially how the vectors in Table 7.7, and thus the nodes, are mapped onto each other. It is desired to use a mapping function that minimises the number of steps that need to be taken to reach the terminating node from any starting node, if the worst-case runtime is considered for instance. An analysis of all possible mapping function configurations has revealed that for the underlying generator polynomial, a maximum number of two steps is required in the best case to reach the terminating node from any other node. There are several mapping function configurations that are optimal from this point of view. One such configuration is $f=[2,13,16,49]$, where each array element indicates by how many steps a reduced vector, whose chain length corresponds to the element index increased

Table 7.7: Reduced nonzero field elements of $\mathbb{Z}_{2}[X] /\left\langle X^{6}+X+1\right\rangle$ in 5-tuple representation with their corresponding chain lengths. Each element is associated with a node.

| Node | State | Chain Length |
| :---: | :---: | :---: |
| 0 | 100011 | 1 |
| 1 | 111011 | 1 |
| 2 | 101001 | 1 |
| 3 | 100101 | 1 |
| 4 | 101111 | 1 |
| 5 | 110111 | 1 |
| 6 | 111101 | 1 |
| 7 | 110001 | 1 |
| 8 | 010011 | 2 |
| 9 | 011001 | 2 |
| 10 | 010101 | 2 |
| 11 | 011111 | 2 |
| 12 | 001011 | 3 |
| 13 | 001101 | 3 |
| 14 | 000111 | 4 |
| 15 | 000001 | 6 |

by one, needs to be advanced in the maximum-length sequence. The graph for this specific configuration is shown in Figure 7.8. It can be seen that every node reaches the terminating node in no more than two steps.

In the considered sample configuration, each node is mapped to a node corresponding to either the same or a longer chain length. This is in general, however, not always the case. Some nodes may be mapped onto nodes with lower chain length.

### 7.5 Evaluation

The question investigated in this section concerns the runtime behaviour of the presented algorithm. Different performance measures can be applied, such as the worst-case or average-case execution time. The worst-case runtime is of particular interest and becomes the focus of attention in what follows.

The algorithm has a number of degrees of freedom that have an impact on the performance. Firstly, the determining primitive polynomial plays a role in the behaviour


Figure 7.8: Sample mapping configuration $f=[2,13,16,49]$ for $p(X)=X^{6}+X+1$. The terminal node is indicated through a dashed circle.
of the algorithm. Different polynomials of the same degree achieve different execution times; it is therefore desirable to evaluate over all possible primitive polynomials of a certain degree.

Secondly, the algorithm is greatly influenced by the configuration of the mapping function; it is necessary to take, for each polynomial, all the potential configurations into account to determine the best worst-case runtime. For a polynomial of degree $m$, the mapping function requires $m-2$ parameters. One of those parameters can easily be derived, as it describes the mapping of the reduced vector that belongs to the single chain of length $h=m-2$, onto a vector that belongs to the longest chain. It suffices therefore to set the parameter to the difference in sequence position between the terminating vector and the reduced vector that belongs to the single chain of length $h=m-2$. The remaining parameters can take values in the range between 1 and $2^{m}-2$. This leads to a maximum search space of $\left(2^{m}-2\right)^{m-3}$ parameter combinations.

However, it is not necessary to evaluate every single combination to find the combination that minimises the worst-case runtime. A single mapping parameter is responsible for the mapping of a certain partition of nodes. If a mapping value implies a closed loop within that partition, it can be discarded, as otherwise not all nodes will be able to reach the terminating node. The remaining mapping values will map every source node onto a node outside the partition in a finite number of steps. In the example in Figure 7.8, nodes [ $0-7$ ], which constitute one partition, are mapped in a finite number of steps by mapping value 2 onto the following nodes outside the

Table 7.8: Best achievable worst-case number of mappings steps. For each polynomial degree the number of primitive polynomials and the worst-case numbers of mapping steps are indicated.

| Degree | Primitive Polynomials | Mapping Steps |
| :---: | :---: | :---: |
| 1 | 1 | $1^{*} 0$ |
| 2 | 1 | $1^{*} 0$ |
| 3 | 2 | $2^{*} 1$ |
| 4 | 2 | $2^{*} 1$ |
| 5 | 6 | $6^{*} 1$ |
| 6 | 6 | $6^{*} 2$ |
| 7 | 18 | $6^{*} 2,12^{*} 3$ |
| 8 | 16 | $16^{*} 3$ |
| 9 | 48 | $48^{*} 4$ |
| 10 | 60 | $60^{*} 5$ |
| 11 | 176 | $176^{*} 6$ |
| 12 | 144 | $144^{*} 8$ |
| 13 | 630 | $1^{*} 10^{\mathrm{a}}$ |

${ }^{\text {a }}$ Only the first of the 630 primitive polynomials of degree 13 has been fully evaluated, i.e. $p(X)=$ $X^{13}+X^{4}+X^{3}+X+1$.
partition $[15,15,15,14,12,9,15,15]$. Two mappings are considered to be equivalent, if they map the same source nodes onto the same nodes outside the partition with the same number of steps. It is thereby irrelevant if the intermediate nodes differ in the two mappings. Furthermore, a mapping is considered to be better than another, if it requires fewer steps than the otherwise equivalent mapping. If a single parameter is considered, then its range of $2^{m}-2$ possible mapping values can be reduced, for instance, by eliminating all those mappings that are already covered by an equivalent or better mapping.

A search with a bounding technique over polynomials of the first degrees has been conducted. The results for fully evaluated polynomials are shown in Table 7.8. More details and results on polynomials of higher degree can be found in Appendix A. The best achievable worst-case running time has been recorded as the number of mapping steps that the algorithm needs to undertake. For all primitive polynomials up to degree 12 and the first polynomials of degree 13 and 14, the number of mapping steps for the worst case does not exceed the degree $m$ of the underlying polynomial.

Interestingly, for the considered polynomials, the number of steps seems to be invariant for polynomials of the same degree, except for the case of $m=7$, where two different numbers of mapping steps are obtained. If the reduction step is performed with a binary search, its worst-case computation time equals $\left\lceil\log _{2} m\right\rceil$. The algorithm starts and stops with a reduction step, so that the overall time complexity of the algorithm for evaluated polynomials up to degree 14 is bound by $(m+1)\left\lceil\log _{2} m\right\rceil$.

Primitive polynomials of higher degree could only been analysed to some extent due to the exponential parameter search space. With the available computing power, only a small fraction of the entire parameter space was searched with the help of a genetic algorithm. Results on the best sets of parameters that were obtained for polynomials up to degree 32 together with the corresponding number of mapping steps are provided in Appendix A. However, it is highly likely that those values can be improved on due to the small fraction of search space covered.

### 7.6 Conclusion

The finite field $G F\left(2^{m}\right)$, represented as the polynomial ring over $G F(2)$ modulo a primitive polynomial of degree $m$ over $G F(2)$ has been considered. A new set of properties has been established for the elements of the field.

Based on some of these properties, a novel approach for the solution of the discrete logarithm in the multiplicative group of the finite field has been proposed. This has led to an algorithm with linearithmetic time requirements in the degree of the defining polynomial, for at least all primitive polynomials of degree up to 12 and the first primitive polynomials of degree 13 and 14 . The algorithm requires a set of parameters (mapping configuration) in the order of the polynomial degree causing the space requirements to be linear with respect to the polynomial degree. Due to the fact that the parameter search space grows exponentially $\left(O\left(2^{m^{2}}\right)\right)$, determining the optimal set of parameters for a specific polynomial is currently impractical and the reason why partial results are provided for considered single polynomials of degree 14 to 32 .

A question that directly follows and remains to be answered is the asymptotic runtime behaviour of the algorithm. It is also interesting to speculate whether it can be proven that loop-free mapping configurations exist for all primitive polynomials of degree $m \geq 13$ as is the case for $m \leq 12$. Moreover, an efficient method to determine an optimal mapping configuration that minimises the worst-case runtime for a given generator polynomial, would also be desirable.

The mapping function has been exclusively optimised against the worst-case execution time. As an interesting alternative optimisation goal, the average-case execution time could be targeted.

The algorithm can be modified and extended in many ways. For instance, it might be possible to use a different partitioning of the reduced set of elements for the mapping function that leads to an improvement in execution time. It is also conceivable that more than one terminating vector could be used to further reduce the running time.

## Part IV

Conclusion

## Chapter 8

## Conclusion

Cyclic codes have gained wide popularity as error-detecting codes due to their inherent algebraic properties that permit easy implementation and effective detection of errors. This thesis supports the position of cyclic codes as a powerful class of error-control code whose properties extend beyond simple error detection. The thesis demonstrates contributions in the generation of efficient, programmable, parallel cyclic code circuits, and in the potential of cyclic codes for efficient error correction. These contributions are described in more detail, together with highlighting possible areas for future exploration within this concluding chapter.

### 8.1 Programmable CRC

A cyclic code is characterised through its generator polynomial which influences the specific error detection and correction capabilities of the code, depending on the length of the data that is to be protected, as outlined in Chapter 2. In addition, different applications may run on the same system with completely different cyclic code requirements, as in the case of SpiNNaker which is described in Chapter 4. It was shown how cyclic code circuits can overcome the limitations of a single generator polynomial by allowing the circuit to be flexibly programmed with any polynomial within the design constraints. In Chapter 5 a new method for computing the transition and control matrix of a parallel cyclic code circuit was presented. This method allows the efficient realisation of programmable parallel circuits that operate at high speeds, reconfigure rapidly to new polynomials, require few implementation resources, and are energy-efficient when compared with alternative schemes.

### 8.1.1 Future Work in Programmable CRC

With an efficient programmable cyclic code circuit that can reconfigure rapidly to new generator polynomials, it is feasible to change the polynomial after each processed word of a data stream. The calculation of the next polynomial can be made dependent on factors including the current input data word and the current state of the circuit, i.e. the calculated redundancy and the polynomial. It would be interesting to investigate if a polynomial adjustment algorithm can be devised that has advantages over fixed cyclic code generator polynomials, from both error detection and correction points of view.

### 8.2 Error Correction

The correction of a single-bit error on the basis of a cyclic code requires the computation of the discrete logarithm in finite cyclic groups, represented as the polynomial ring over the binary field modulo the cyclic code generator polynomial, as outlined in Chapter 2. No efficient algorithm is known for the evaluation of the discrete logarithm in these groups and, moreover, it is also widely believed that no such algorithm can be devised as described in Chapter 3. Nonetheless, this work focused on the exploration of new algorithms in the quest for an efficient calculation of discrete logarithms in relevant groups.

A new approach was developed for calculating discrete logarithms in Chapter 6. For groups that have an order equal to a Mersenne number with an exponent of a power of two, a deterministic generic algorithm was devised based on size differences of cyclotomic cosets. The algorithm requires only constant space and exhibits a worstcase asymptotic running time of the square root of the group order. It was shown that the average- and worst-case running times of the algorithm can be improved for certain cases where the discrete logarithm values occur with unequal probabilities. Furthermore, properties were developed or highlighted for relevant sequences that are considered by the algorithm.

For finite fields with binary characteristic, represented as the polynomial ring over the binary field modulo a primitive polynomial, new properties were developed in Chapter 7. On the basis of a subset of these properties, a novel approach was proposed for computing discrete logarithms in the cyclic multiplicative groups of these fields. It resulted in a deterministic algorithm with linear space and linearithmic time requirements in the degree of the defining polynomial, for at least all polynomials
up to degree 12 and the first polynomials of degree 13 and 14. The algorithm requires a set of parameters in the order of the polynomial degree, where the parameter search space grows exponentially in the polynomial degree. For this reason partial results on the running time for single polynomials of higher degrees up to 32 were provided.

### 8.2.1 Future Work in Error Correction

The research conducted on discrete logarithm algorithms generated a number of open questions that present potential future research opportunities:

- Under the assumption of the existence of an efficient algorithm for the computation of the discrete logarithm in finite cyclic groups represented in the ring of polynomials modulo a polynomial, the efficient correction of single-bit errors based on cyclic code is feasible. It needs to be investigated further to determine to what extent this assumption would also enable the efficient correction of multi-bit errors.
- For the proposed algorithm for discrete logarithms for group orders that equal Mersenne numbers with an exponent of a power of two, it may be possible to use the developed sequence properties to improve the algorithm. It may also be the case that new properties can be found that will enable a speed-up of the algorithm. In particular, it needs to be investigated if the proposed algorithm reduces the initial discrete logarithm problem into smaller subgroups, similar to the Silver-Pohlig-Hellman algorithm, as this permits alternative algorithms to be employed in those subgroups.
- The proposed generic algorithm for discrete logarithms was tailored to group orders of special Mersenne numbers. It remains an open question as to whether the algorithm can be generalised to all group orders and what the resulting execution overheads would be.
- An algorithm, efficient in time and space, was proposed for the computation of discrete logarithms in the multiplicative groups of small finite fields represented in the polynomial ring over the binary field modulo a primitive polynomial. It was shown that the algorithm is applicable at least to all defining polynomials up to degree 12, and the first polynomials of degree 13 and 14 . For single polynomials of degree 15 to 32 , it is known that a mapping configuration exists that allows the algorithm to terminate under all conditions, however it is unclear if one exists that also results in an efficient worst-case execution time. It would be interesting to investigate if, for all polynomials with a degree
exceeding 12 , loop-free mapping configurations exist, and also what would be the best asymptotic worst-case runtime behaviour of the algorithm. A related open question concerns the best achievable average asymptotic runtime of the algorithm.
- The determination of the overall best mapping configuration for a specific polynomial and optimisation goal was achieved through a brute force attack which is impractical for polynomials of higher degree due to the exponential search space. It may be the case that an efficient method can be devised that allows the computation of the mapping configuration for at least the best worstor average-case; such a method may also assist in the analysis of the asymptotic runtime behaviours. Alternatively, it may be possible to develop good heuristics to reduce the search space and employ evolutionary algorithms to find a close approximation for the optimal set of values.
- A number of conjectures were established that need analysis to determine if they can be proven. Proofs for the conjectures concerning the $L$ transformation in particular could lead to further insights into the efficient computation of discrete logarithms in the relevant groups.
- The proposed algorithm to compute discrete logarithms in multiplicative groups of small finite fields, represented in the polynomial ring over the binary field modulo a primitive polynomial, might also be easily applicable to defining polynomials that are not primitive, but irreducible. If the defining polynomial is reducible, then it induces only a cyclic group, for which the algorithm might also be easily employed. Moreover, it should be investigated if the algorithm can be adapted to finite fields with a characteristic other than two.
- It was conjectured that, for a defining primitive polynomial of degree $m$ and a parity level $l$ with $1 \leq l \leq\left\lfloor\frac{m-2}{2}\right\rfloor, 2^{l-1}$ linear transformations of the form $T^{x}$ exist such that each odd parity block of size one on level $l$ is mapped by one of the transformations onto an even parity block element on the same level. It would be interesting to investigate whether, for every level $l$, a set of these $2^{l-1}$ transformations exists, such that the transformations can easily be computed from each other. In particular, it is an open question if the displacement $r$ between the two transformations $L_{2 a}$ and $L_{2 b}$ for level two can easily be determined, such that $L_{2 a}=T^{r} L_{2 b}$.
- It is currently unknown if a simple correlation exists between the displacements of equally-sized blocks on a certain parity level. If the displacements can easily
be computed, alignments of different parity vector spaces can simply be obtained and therefore also transformations such as $E_{0}$ and $E_{1}$.


### 8.3 Summary

The work presented in this thesis provides successful solutions for the addressed research objectives:

## Efficient Programmable CRC Circuits

A novel method was proposed for the efficient realisation of programmable parallel cyclic code circuits. The resulting circuits can rapidly be configured with a generator polynomial, exhibit fast operating speeds, have low resource requirements, and are energy-efficient at the same time when compared to alternative solutions.

## Algorithms for Computing Discrete Logarithms

Two new approaches were developed for computing discrete logarithms to facilitate the correction of single-bit errors based on cyclic codes.

The first approach is generic in nature leading to a deterministic algorithm for group orders that equal a Mersenne number with an exponent of a power of two; this algorithm has constant space requirements and runs in the worst case in the order of the square root of the group order. It was shown how the algorithm can be improved if the discrete logarithm values occur with unequal probabilities and that certain properties hold for the associated sequences.

The second approach for the computation of discrete logarithms is based on a subset of newly developed properties for finite fields of binary characteristic represented as the polynomial ring over the binary field modulo a primitive polynomial. For evaluated small fields, a deterministic efficient algorithm with linear space and linearithmic time requirements in the degree of the defining polynomial was devised.

## Appendix A

## Mapping Configurations

In what follows, the best mapping configurations that have been found for primitive polynomials up to degree 32 for the proposed discrete logarithm algorithm in Chapter 7 are reported. Polynomials of degree 1 and 2 are not listed as they exhibit only one chain and, therefore, require no mapping steps. All primitive polynomials from degree 3 to 12 and the first polynomials of degree 13 and 14 are listed. For all higher degree polynomials, up to degree 31, only the first polynomial is reported in each case. The Ethernet polynomial serves as a representative for polynomials of degree 32, due to its significance in the Ethernet technology.

Table entries are all in decimal notation, except for the mapping configurations, whose values are indicated in hexadecimal notation. The first value of a mapping configuration is always to be applied to chains of length one, the second, if it exists, to chains of length two, and so forth. For the mapping, it is assumed that elements have been reduced to the last element of their corresponding chain, however, the configurations can easily be converted to other scenarios. There are, in general, several configurations that lead to one and the same number of steps for each polynomial. For the number of steps for all polynomials of degrees up to and including 13, the configuration with the lowest values is reported, where the first of the values is considered as the most-significant one.

The steps value indicates the worst-case number of mapping steps that is necessary to map a group element to the longest chain for the indicated configuration. For all polynomials up to degree 12 and the first polynomial of degree 13, the best achievable configuration is reported. The steps for all other polynomials are the best ones that have been obtained through an evolutionary search algorithm; they are indicated with a less-than-or-equal sign to emphasise that better configurations may exist.

| Degree | Polynomial | Steps | Configuration |
| :---: | :---: | :---: | :---: |
| 3 | 3 | 1 | 001 |
| 3 | 5 | 1 | 006 |
| 4 | 3 | 1 | 002 00D |
| 4 | 9 | 1 | 001002 |
| 5 | 5 | 1 | 010005019 |
| 5 | 9 | 1 | 00F 007006 |
| 5 | 15 | 1 | 00A 002 01C |
| 5 | 23 | 1 | 019003 00F |
| 5 | 27 | 1 | 006002010 |
| 5 | 29 | 1 | 015005003 |
| 6 | 3 | 2 | 002 00C 010031 |
| 6 | 27 | 2 | 001013011 01D |
| 6 | 33 | 2 | 008021016 00E |
| 6 | 39 | 2 | 001 03C 006014 |
| 6 | 45 | 2 | 001005 00B 022 |
| 6 | 51 | 2 | 006 00A 006 02B |
| 7 | 3 | 3 | 001015044006055 |
| 7 | 9 | 3 | 001005 OOE 00C 056 |
| 7 | 15 | 3 | 001005004013 04D |
| 7 | 17 | 3 | 001 00B 008 00E 029 |
| 7 | 29 | 2 | 06F 02E 029 05E 03E |
| 7 | 39 | 2 | 061004057 05E 07A |
| 7 | 43 | 3 | 001008 00D 008021 |
| 7 | 57 | 2 | 008037 04C 00D 041 |
| 7 | 63 | 3 | 001 00C 02E 004 02C |
| 7 | 65 | 3 | 001019 02D 04B 02A |
| 7 | 75 | 2 | 032013010021008 |
| 7 | 83 | 2 | 01B 007070004077 |
| 7 | 85 | 3 | 001006 00B 004 05E |
| 7 | 101 | 2 | 00F 008035005005 |
| 7 | 111 | 3 | 001 00B 006 01B 04D |
| 7 | 113 | 3 | 001009071 01A 032 |
| 7 | 119 | 3 | 001002 02F 007032 |
| 7 | 125 | 3 | 001007 01D 005053 |
| 8 | 29 | 3 | 00A 054063 OF2 OB0 06B |


| Degree | Polynomial | Steps | Configuration |
| :---: | :---: | :---: | :---: |
| 8 | 43 | 3 | 009 04E 047 OF7 06A 098 |
| 8 | 45 | 3 | 008024093091 OA7 OAC |
| 8 | 77 | 3 | 00A 024 OBF 00A 011 ODA |
| 8 | 95 | 3 | 00C 009039 05D 035 03F |
| 8 | 99 | 3 | 00B 06B 05C 028 02C 01F |
| 8 | 101 | 3 | 004 0F3 014 09C 077025 |
| 8 | 105 | 3 | 006 09B 04C 036048053 |
| 8 | 113 | 3 | 00B 011 ODF OA5 01D 094 |
| 8 | 135 | 3 | 00D 006 08C 060 01E 05C |
| 8 | 141 | 3 | 005 01E 018008035 OE0 |
| 8 | 169 | 3 | 003097 08F 008 03A 067 |
| 8 | 195 | 3 | 001 05B 00A 017 043 0A3 |
| 8 | 207 | 3 | 00E 055 04D 016 01A 078 |
| 8 | 231 | 3 | 01A 02F 092 04A 027087 |
| 8 | 245 | 3 | 005 04E OC6 OA2 OC3 OCO |
| 9 | 17 | 4 | 004 OOB OA3 OC3 070010 1E1 |
| 9 | 27 | 4 | O0C 039126 OAB 019006 10C |
| 9 | 33 | 4 | 005 02B 019 13F 048051 01E |
| 9 | 45 | 4 | 005044 02B 033010 04B 158 |
| 9 | 51 | 4 | 004045013056095051 OE8 |
| 9 | 89 | 4 | 005 03D 1DA 07B 039004 04E |
| 9 | 95 | 4 | 004 04C 013 0C9 02C 020 1B2 |
| 9 | 105 | 4 | 005 1DF 022064194058 1B1 |
| 9 | 111 | 4 | 006 04B 108 16B 178114011 |
| 9 | 119 | 4 | 007118 OA7 195 OB5 036 05E |
| 9 | 125 | 4 | 001 OB8 OD2 03B 01D 098037 |
| 9 | 135 | 4 | 001040 09C 00D 093024 1F8 |
| 9 | 149 | 4 | 004075065181 18A 027 1DE |
| 9 | 163 | 4 | 004003 OD2 015154073138 |
| 9 | 165 | 4 | 004 01B 173 02F 071078021 |
| 9 | 175 | 4 | 006160 05E OF5 178029 OE5 |
| 9 | 183 | 4 | 00A OAD 011009 O5B 023 OED |
| 9 | 189 | 4 | 001 11A 03B 040 OD3 104 OE4 |
| 9 | 207 | 4 | 001 11A 02F 070083005 OC0 |
| 9 | 209 | 4 | 004148 0C1 0B9 OB5 07A 0A7 |
| 9 | 219 | 4 | 004 OBA OAA 1AE OFF 03A 130 |


| Degree | Polynomial | Steps | Configuration |
| :---: | :---: | :---: | :---: |
| 9 | 245 | 4 | 006 OB1 177157 05C 03C 11B |
| 9 | 249 | 4 | 001036168 01C 03A 050 1C8 |
| 9 | 275 | 4 | 004 1BE 19C 058003 OBD 17F |
| 9 | 277 | 4 | 001 1DE OE5 00E OBB OF6 OC7 |
| 9 | 287 | 4 | 006 1EA 173169089 00F 048 |
| 9 | 291 | 4 | 004 O9C OEF 013042024080 |
| 9 | 305 | 4 | 00505902410716 A 093117 |
| 9 | 315 | 4 | OOE OOB OEF OBC OBO 031130 |
| 9 | 335 | 4 | 005029 OD2 07B 00C 006 05D |
| 9 | 347 | 4 | 005 OA5 022171 1A6 096 OD3 |
| 9 | 353 | 4 | 007047015012071 03C OF3 |
| 9 | 363 | 4 | 003055 0E1 1CA 161 06D 12C |
| 9 | 365 | 4 | 003 12F 13D OC7 068 16F OCF |
| 9 | 371 | 4 | 006037 05C 11B 035 06A OCF |
| 9 | 383 | 4 | 001 1FD 0B8 10E 066009 01F |
| 9 | 389 | 4 | 001 13C 138085078 09C 007 |
| 9 | 399 | 4 | 001087 OA8 043 OA4 04D 0BF |
| 9 | 437 | 4 | 009 OEC 088 14A 00C 011112 |
| 9 | 441 | 4 | 005 05D 1EC 0AO 106 00D 1A1 |
| 9 | 455 | 4 | 001031010 1BC 126013140 |
| 9 | 459 | 4 | 004 05C 18D 13D 060 00A 1A2 |
| 9 | 461 | 4 | 001 OBF 04A 00E 0C5 1F9 13F |
| 9 | 469 | 4 | 001 0E4 12B 019 1D8 OAF 11A |
| 9 | 473 | 4 | 006010 13E 14B 053 01E 1EE |
| 9 | 483 | 4 | 006047 09D 08F 17D ODE 1B7 |
| 9 | 489 | 4 | 004149 OC2 05E 01E 015 04D |
| 9 | 507 | 4 | 001 01E 064095005062 1E0 |
| 10 | 9 | 5 | 001 35C 05D 082 2A5 0D3 015 0DD |
| 10 | 27 | 5 | 006 02E 202131307 2F7 032 3AB |
| 10 | 39 | 5 | 0061 C 2 03B OBE 327280 2A1 33A |
| 10 | 45 | 5 | 00B 07F 3A0 226040050059 37F |
| 10 | 101 | 5 | 005243027171 1E0 07C 060 OF5 |
| 10 | 111 | 5 | 001 1ED 110055 13E 15A 2BA 3D4 |
| 10 | 129 | 5 | 008044 1E3 25B 25F 245 2B6 322 |
| 10 | 139 | 5 | 006021028195012 1CC 014 3DD |
| 10 | 197 | 5 | 012078 2EF 05E 09C OAE 027165 |


| Degree | Polynomial | Steps | Configuration |
| :---: | :---: | :---: | :---: |
| 10 | 215 | 5 | 004 05B 262062045 29B 017307 |
| 10 | 231 | 5 | 0061 AF OB8 026007 OD1 02F 204 |
| 10 | 243 | 5 | 001 ODB 22A OB1 OCF 17F OE6 247 |
| 10 | 255 | 5 | 004 OF5 199 19E ODA 221015 OBF |
| 10 | 269 | 5 | 004 OEB 007 09F 099058053247 |
| 10 | 281 | 5 | 004 OBE 011 28F 1A6 143 11B 29A |
| 10 | 291 | 5 | 001 1E2 06B 0A2 2C7 0D3 2B9 21C |
| 10 | 305 | 5 | 006 OF4 ODA 178 OCE 1BD 032 30A |
| 10 | 317 | 5 | 008 00D 3B0 021 14A 224 OB4 3F1 |
| 10 | 323 | 5 | OOF 176121012 OEC 2AB 03A 072 |
| 10 | 343 | 5 | 011189 11F 019 OA5 2AD 047 04D |
| 10 | 363 | 5 | 007 OC3 063104050 06C OB2 180 |
| 10 | 389 | 5 | 006246 05F 091366 2B0 019 1B8 |
| 10 | 399 | 5 | 005070 3AB OE1 068064014 38E |
| 10 | 407 | 5 | 004 02E 1C0 136264261 04C 2D4 |
| 10 | 417 | 5 | 006216092196076085015080 |
| 10 | 455 | 5 | 008 16C 27D OAF 011140333254 |
| 10 | 485 | 5 | 005 2EE OD1 1FO 130 2A5 OC6 00E |
| 10 | 503 | 5 | 00C 06E 39C 031047267011 1B5 |
| 10 | 507 | 5 | $0050 \mathrm{C8} 103$ 0D8 35C 061017 1D8 |
| 10 | 531 | 5 | 009 OOC 012 1E6 OCE 10E 006087 |
| 10 | 533 | 5 | 016071117 OCB 18E 07A 179 38D |
| 10 | 549 | 5 | 0010650 CO 3673650131461 E 3 |
| 10 | 567 | 5 | 008 18D 15A 39C 017 OFE 014233 |
| 10 | 579 | 5 | $0070861270202 D 7142346378$ |
| 10 | 591 | 5 | 005 1F6 016251 3F5 3C8 05D 313 |
| 10 | 603 | 5 | 007 0A8 16F 2C5 08A 157131 25C |
| 10 | 633 | 5 | 004246 1D5 03E 05A 012049 1B8 |
| 10 | 639 | 5 | 005055038 04B 04F 37A 197150 |
| 10 | 649 | 5 | O0E 016 25E 10C 07F OD7 OC9 022 |
| 10 | 693 | 5 | 009258 OB1 058 OD1 184 34D 27F |
| 10 | 705 | 5 | 00A 1BE 101 37F 3CD 10F 017054 |
| 10 | 723 | 5 | O0E 25B 1C3 102293 1AC 201 1A3 |
| 10 | 735 | 5 | 001 28E 19B 118 ODC 253 3CD 173 |
| 10 | 765 | 5 | 001 2D8 19C OFB OAO 02E 01F 227 |
| 10 | 791 | 5 | 011 OAE 383104068 OOB 014350 |





| Degree | Polynomial | Steps | Configuration |
| :---: | :---: | :---: | :---: |
| 11 | 1111 | 6 | 280036139 OEA 6EB 112016 OA8 38A |
| 11 | 1121 | 6 | 00C 06E 2BF 23C 349311005077793 |
| 11 | 1131 | 6 | 04B 1161 C 7 6BD 1EB 05D 422034 55F |
| 11 | 1139 | 6 | 054 20D 128348613 63D OE0 065 3B6 |
| 11 | 1157 | 6 | 1F5 048 4B3 08F 172070 2F7 07A 7B6 |
| 11 | 1161 | 6 | 016034 11D 544258 38C 48C 013 1BB |
| 11 | 1175 | 6 | O7C OFE 02D 689144 13D 57E 090543 |
| 11 | 1179 | 6 | 492021178 03A OCO 29C 099032027 |
| 11 | 1181 | 6 | OD9 13A 543126 4D1 7EF 1A2 0B3 5EC |
| 11 | 1203 | 6 | 08E 027 ODD 5D4 OD7 ODO 19D 08F 115 |
| 11 | 1215 | 6 | 01A OBA ODF 15C 170028 OBO 519338 |
| 11 | 1223 | 6 | 4B8 32E 1F2 16B 69E 671047339 1CA |
| 11 | 1229 | 6 | 054099 OC6 546 14F 77B 553 OE2 291 |
| 11 | 1235 | 6 | 18B 028 00E 190724 1E6 04C 05C 6EA |
| 11 | 1237 | 6 | 149 OCF 157196 13C 2AE 082 14A 598 |
| 11 | 1251 | 6 | 115 O2E 58C 5D8 427 OAA 15A 036449 |
| 11 | 1257 | 6 | 277070710 7EC 21E 5DD 2F9 134 6FB |
| 11 | 1271 | 6 | 222081 7CF 028178320 17B 1DD 3C6 |
| 11 | 1283 | 6 | 201520 12B 334 OB5 2BE 437084076 |
| 11 | 1295 | 6 | 173122 OD3 OED 3D1 678648216 3F8 |
| 11 | 1309 | 6 | 026 31F 307165249190 21A OFB 4DF |
| 11 | 1319 | 6 | 024261 06C 20D 1BE 1B9 76A 03E 59D |
| 11 | 1325 | 6 | 06E 104 7B3 446 7E7 4C4 1D9 02B 6FA |
| 11 | 1345 | 6 | 104176 32C 192 7EF 230 OEC 08C 318 |
| 11 | 1351 | 6 | 135209659 18F 7B6 4DF 17C 092 5F5 |
| 11 | 1365 | 6 | 034 3A1 417284 19D 279 08B 270042 |
| 11 | 1369 | 6 | 012 4C3 6F8 188 7F8 4A1 0EE 078 33B |
| 11 | 1379 | 6 | 057224173 32A 728 12D OCA 070 2A0 |
| 11 | 1391 | 6 | 188 2DF OC1 13F 23C 0CA 1F7 09D 51F |
| 11 | 1393 | 6 | 356 06D 4DA 036 6F3 411 43C 6AA 791 |
| 11 | 1427 | 6 | 115026 1F2 65C 436 OCF 575 OE7 7D8 |
| 11 | 1439 | 6 | 147 06B OB2 3C8 325039 33A OC5 793 |
| 11 | 1449 | 6 | 175 33C 2A1 0B8 77F 4FB 203 02D 324 |
| 11 | 1467 | 6 | 029191 1BF 6A7 5A6 557 6F6 14C 46C |
| 11 | 1469 | 6 | 067 03E 121011 35F 0F5 03F 200 57F |
| 11 | 1481 | 6 | 104 04B OBB 585 64C 5E4 601 3A4 0D0 |











| Degree | Polynomial | Steps | Configuration |
| :---: | :---: | :---: | :---: |
| 28 | 9 | $\leq 8056$ | 03D1B1B6 0581C8A8 01C6B20F 0AB12C82 |
|  |  |  | 0190091A 01327485 080B80DA 04933FDD |
|  |  |  | 0FAC6F2D 09928229 05A3CABF 0C9CD60C |
|  |  |  | 0685578B 022D57D2 00A1C913 049A91BA |
|  |  |  | OB5EC368 0BA2C228 OF16BC50 0E8A8E35 |
|  |  |  | 0F095DE9 02F01D6E 004572A9 0DE3BB1F |
|  |  |  | 04FA6941 04F03280 |
| 29 | 5 | $\leq 9569$ | 08735A86 0AADC14D OECAEA07 110A2F37 |
|  |  |  | 087B7A96 11F7FE1D 12B547A5 1AD81694 |
|  |  |  | 1CF20334 09A9CF23 062AA3A4 0C34124F |
|  |  |  | 10 C 71018 0ABBB01F OD9CF109 0E26ABF0 |
|  |  |  | 1464EB09 146D8DCA 1840FA7C 0CA804F4 |
|  |  |  | 1751221A 08A69531 1E4AB34C 0E1FD109 |
|  |  |  | 02F7B23E 130FAF35 00FB4F3A |
| 30 | 83 | $\leq 10563$ | 3A60662A 2B31E568 293E5022 316F0BF8 |
|  |  |  | 084A9BD0 103089A5 163CB93D 340DCADA |
|  |  |  | ODA71080 35F52834 22A5ACDA 100B0F1E |
|  |  |  | 32EBA53C 24E26D52 10791B3C 3049EF69 |
|  |  |  | 1D2B6168 096B107E 21231947 3C569F92 |
|  |  |  | 3DD1E40F 12DDBF91 3B5C3F4A 1092BFA9 |
|  |  |  | 0E9C70FC 06D9DE07 119571BF 000BC8D2 |
| 31 | 9 | $\leq 21120$ | 6CBBED58 4EEE9371 64511DEC 7362449A |
|  |  |  | 4A04EFE4 OCE28EA9 141BF300 753A2E5F |
|  |  |  | 3F3C890B 217E13D4 28EA7DC3 0B67A71B |
|  |  |  | 01F1628D 335D676F 63109B3F 6D9054D7 |
|  |  |  | 15656497 29DF15BF 030A394F 2CE35F05 |
|  |  |  | 3CE70663 4F2420DA 5F213707 400B55B0 |
|  |  |  | 18773B43 67F6DBE2 68253F43 06A5CA14 |
|  |  |  | 000BFFDE |


| Degree | Polynomial | Steps | Configuration |
| :---: | :---: | :---: | :--- | :--- |
| 32 | 79764919 | s28404 | 29F631DB 69B4AB1E 67B69DF4 263A31FE |
|  | (Ethernet) |  | 4AFB7882 6E433C07 7F66BBF3 32AEBA18 |
|  |  |  | 7663EABE 4F3E5EA1 77AF3DA3 74274A6D |
|  |  |  | 32C01345 621BA771 1C5CBB85 3278ACCE |
|  |  |  | 3819536E 34FCCC0A 61C0187E 74976060 |
|  |  |  | 5BF7155F 7EB1D250 6B954F8D 1D891372 |
|  |  |  | 448D2221 70DF6506 716AA6D2 1DF650D1 |
|  |  |  | 4BA13234 D21F45B0 |

## Bibliography

[AD93] Leonard M. Adleman and Jonathan DeMarrais. "A subexponential algorithm for discrete logarithms over all finite fields". In: Mathematics of Computation 61.203 (Sept. 1993), pp. 1-15. IssN: 0025-5718. Doi: 10.1090/ S0025-5718-1993-1225541-3.
[AD94] Leonard M. Adleman and Jonathan DeMarrais. "A Subexponential Algorithm for Discrete Logarithms over All Finite Fields". In: Advances in Cryptology - CRYPTO '93. Ed. by Douglas R. Stinson. Vol. 773. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, July 1994, pp. 147-158. ISBN: 978-3-540-57766-9. DoI: 10.1007/3-540-48329-2_13.
[Adl79] Leonard M. Adleman. "A subexponential algorithm for the discrete logarithm problem with applications to cryptography". In: 20th Annual Symposium on Foundations of Computer Science. Los Alamitos, CA, USA: IEEE Computer Society, Oct. 1979, pp. 55-60. Dor: 10.1109/SFCS.1979.2.
[Adl94] Leonard M. Adleman. "The function field sieve". In: First International Symposium on Algorithmic Number Theory. Vol. 877. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer-Verlag, 1994, pp. 108-121. ISBN: 3-540-58691-1. DOI: 10.1007/3-540-58691-1_48.
[AH99] Leonard M. Adleman and Ming-Deh A. Huang. "Function Field Sieve Method for Discrete Logarithms over Finite Fields". In: Information and Computation 151.1-2 (May 1999), pp. 5-16. Issn: 08905401. Doi: 10.1006/ inco.1998.2761.
[AS90] G. Albertengo and R. Sisto. "Parallel CRC generation". In: IEEE Micro 10.5 (Oct. 1990), pp. 63-71. IssN: 0272-1732. Doi: 10.1109/40.60527.
[Ber65] Elwyn Berlekamp. "On decoding binary Bose-Chadhuri-Hocquenghem codes". In: IEEE Transactions on Information Theory 11.4 (Oct. 1965), pp. 577-579. IsSN: 0018-9448. Doi: 10.1109/TIT.1965.1053810.
[BF02] J. Bainbridge and S. Furber. "Chain: a delay-insensitive chip area interconnect". In: IEEE Micro 22.5 (Sept. 2002), pp. 16-23. issn: 0272-1732. Doi: 10.1109/MM. 2002.1044296.
[BM77] Garrett Birkhoff and Saunders Mac Lane. A Survey of Modern Algebra. 4th ed. Macmillan Publishing Co., Inc., 1977. ISBN: 0-02-310070-2.
[Bra+96] Michael Braun, Jörg Friedrich, Thomas Grün, and Josef Lembert. "Parallel CRC Computation in FPGAs". In: FPL '96: Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers. Ed. by Reiner Hartenstein and Manfred Glesner. Vol. 1142. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 1996, pp. 156-165. IsbN: 978-3-540-61730-3. DoI: 10.1007/3-540-61730-2_16.
[BRC60] R.C. Bose and D.K. Ray-Chaudhuri. "On a class of error correcting binary group codes". In: Information and Control 3.1 (Mar. 1960), pp. 68-79. IssN: 0019-9958. DoI: 10.1016/S0019-9958(60)90287-4.
[BS96] Eric Bach and Jeffrey Outlaw Shallit. Algorithmic Number Theory, Volume I: Efficient Algorithms. Foundations of Computing. Cambridge, Massachusetts, USA: The MIT Press, 1996. Isbn: 0-262-02405-5.
[Buc01] Johannes Buchmann. Computing discrete logarithms in the multiplicative group of a finite field. Tech. rep. CRYPTREC, Dec. 2001.
[Chi64] Robert T. Chien. "Cyclic Decoding Procedures for Bose-ChaudhuriHocquenghem Codes". In: IEEE Transactions on Information Theory 10.4 (Oct. 1964), pp. 357-363. IsSN: 0018-9448. DOI: 10.1109 / TIT . 1964 . 1053699.
[CMP03] Richard E. Crandall, Ernst W. Mayer, and Jason S. Papadopoulos. "The twenty-fourth Fermat number is composite". In: Mathematics of Computation 72.243 (Dec. 2003), pp. 1555-1572. IsSN: 0025-5718. Doi: 10.1090/ S0025-5718-02-01479-5.
[Cop84] Don Coppersmith. "Fast evaluation of logarithms in fields of characteristic two". In: IEEE Transactions on Information Theory 30.4 (July 1984), pp. 587594. ISSN: 0018-9448. Doi: 10.1109/TIT.1984.1056941.
[CP06] C. Cheng and K.K. Parhi. "High-Speed Parallel CRC Implementation Based on Unfolding, Pipelining, and Retiming". In: IEEE Transactions on Circuits and Systems II: Express Briefs 53.10 (Oct. 2006), pp. 1017-1021. IsSN: 1057-7130. Doi: 10.1109/TCSII . 2006.882213.
[CPR03] G. Campobello, G. Patane, and M. Russo. "Parallel CRC realization". In: IEEE Transactions on Computers 52.10 (Oct. 2003), pp. 1312-1319. ISSN: 0018-9340. Doi: 10.1109/TC.2003.1234528.
[CS09] Junho Cho and Wonyong Sung. "Efficient Software-Based Encoding and Decoding of BCH Codes". In: IEEE Transactions on Computers 58.7 (2009), pp. 878-889. IsSN: 0018-9340. Doi: 10.1109/TC. 2009.27.
[CW94] Douglas W. Clark and Lih-Jyh Weng. "Maximal and near-maximal shift register sequences: efficient event counters and easy discrete logarithms". In: IEEE Transactions on Computers 43.5 (May 1994), pp. 560-568. ISSN: 0018-9340. Doi: 10.1109/12.280803.
[DA01] Peter Dayan and L. F. Abbott. Theoretical neuroscience: computational and mathematical modeling of neural systems. Computational neuroscience. The MIT Press, 2001. Isbn: 0-262-04199-5.
[Der01] J.H. Derby. "High-speed CRC computation using state-space transformations". In: GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270). IEEE, 2001, pp. 166-170. ISBN: 0-7803-7206-9. Doi: 10.1109/GLOCOM. 2001.965100.
[DH76] W. Diffie and M. Hellman. "New directions in cryptography". In: IEEE Transactions on Information Theory 22.6 (Nov. 1976), pp. 644-654. ISSN: 0018-9448. Dor: 10.1109/TIT. 1976.1055638.
[Elg85] T. Elgamal. "A public key cryptosystem and a signature scheme based on discrete logarithms". In: IEEE Transactions on Information Theory 31.4 (July 1985), pp. 469-472. ISSN: 0018-9448. Doi: 10.1109/TIT .1985.1057074.
[FB09] S. Furber and A. Brown. "Biologically-Inspired Massively-Parallel Architectures - Computing Beyond a Million Processors". In: ACSD'09: Proceedings of the 9th International Conference on Application of Concurrency to System Design. 2009, pp. 3-12. ISBN: 978-0-7695-3697-2. DOI: 10.1109/ACSD. 2009. 17.
[FT07] Steve Furber and Steve Temple. "Neural systems engineering". In: Journal of the Royal Society, Interface / the Royal Society 4.13 (Apr. 2007), pp. 193206. ISSN: 1742-5689. DOI: 10.1098/rsif .2006.0177.
[Fur+12] Steve B. Furber, David R. Lester, Luis A. Plana, Jim D. Garside, Eustace Painkras, Steve Temple, and Andrew D. Brown. "Overview of the SpiNNaker System Architecture". In: IEEE Transactions on Computers (2012). Preprint. ISSN: 0018-9340. DOI: 10.1109/TC.2012.142.
[Fur12] Steve B. Furber. "To build a brain". In: IEEE Spectrum 49.8 (Aug. 2012), pp. 45-49. ISSN: 0018-9235. DOI: 10.1109/MSPEC. 2012.6247562.
[Gal02] Joseph A. Gallian. Contemporary Abstract Algebra. 5th ed. Houghton Mifflin, 2002. ISBN: 0-618-12214-1.
[Gal12] Steven D. Galbraith. Mathematics of public key cryptography. Cambridge University Press, 2012. ISBN: 1-107-01392-5.
[GF11] Martin Grymel and Steve B. Furber. "A Novel Programmable Parallel CRC Circuit". In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19.10 (2011), pp. 1898-1902. ISSN: 1063-8210. DOI: 10 . 1109 / TVLSI. 2010. 2058872.
[GG05] Solomon W. Golomb and Guang Gong. Signal Design for Good Correlation: For Wireless Communication, Cryptography, and Radar. Cambridge, U.K.: Cambridge University Press, 2005. ISBN: 978-0-521-82104-9.
[Gol82] Solomon W. Golomb. Shift Register Sequences. Revised ed. Laguna Hills, CA, USA: Aegean Park Press, 1982. IsbN: 0-89412-048-4.
[Gor93] Daniel M. Gordon. "Discrete Logarithms in $G F(p)$ using the Number Field Sieve". In: SIAM Journal on Discrete Mathematics 6.1 (Jan. 1993), p. 124. ISSN: 0895-4801. DOI: $10.1137 / 0406010$.
[Gra+04] R. Granger, A. Holt, D. Page, N. Smart, and F. Vercauteren. "Function Field Sieve in Characteristic Three". In: Algorithmic Number Theory. Ed. by Duncan Buell. Vol. 3076. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2004, pp. 223-234. ISBN: 978-3-540-22156-2. DoI: 10. 1007/978-3-540-24847-7_16.
[Hoc59] Alexis Hocquenghem. "Codes correcteurs d'erreurs". French. In: Chiffres 2 (Sept. 1959), pp. 147-156.
[JL02] Antoine Joux and Reynald Lercier. "The Function Field Sieve Is Quite Special". In: Algorithmic Number Theory. Ed. by Claus Fieker and David Kohel. Vol. 2369. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2002, pp. 343-356. Doi: 10.1007/3-540-45455-1_34.
[JL03] Antoine Joux and Reynald Lercier. "Improvements to the general number field sieve for discrete logarithms in prime fields. A comparison with the gaussian integer method". In: Mathematics of Computation 72.242 (Nov. 2003), pp. 953-968. ISSN: 0025-5718. DoI: 10 . 1090/S0025-5718-02-01482-5.
[JL06] Antoine Joux and Reynald Lercier. "The Function Field Sieve in the Medium Prime Case". In: Advances in Cryptology - EUROCRYPT 2006. Ed. by Serge Vaudenay. Vol. 4004. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2006, pp. 254-270. IsbN: 978-3-540-34546-6. Doi: 10.1007/11761679_16.
[Jou+06] Antoine Joux, Reynald Lercier, Nigel Smart, and Frederik Vercauteren. "The Number Field Sieve in the Medium Prime Case". In: Advances in Cryptology - CRYPTO 2006. Ed. by Cynthia Dwork. Vol. 4117. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2006, pp. 326344. ISBN: 978-3-540-37432-9. DoI: 10.1007/11818175_19.
[Kan+12] Eric R. Kandel, James H. Schwartz, Thomas M. Jessell, Steven A. Siegelbaum, and A.J. Hudspeth. Principles of neural science. 5th ed. New York: McGraw-Hill Medical, 2012. ISBN: 0-07-139011-1.
[Kas64] Tadao Kasami. "A decoding procedure for multiple-error-correcting cyclic codes". In: IEEE Transactions on Information Theory 10.2 (Apr. 1964), pp. 134-138. IsSN: 0018-9448. Doi: 10.1109/TIT.1964.1053649.
[KC04] Philip Koopman and Tridib Chakravarty. "Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks". In: DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and Networks. Washington, DC, USA: IEEE Computer Society, 2004, pp. 145-154. ISBN: 0-7695-2052-9. DOI: 10.1109/DSN . 2004. 1311885.
[Kel] Wilfrid Keller. Prime factors $k \cdot 2^{n}+1$ of Fermat numbers $F_{m}$ and complete factoring status. URL: http://www.prothsearch.net/fermat.html.
[KLS02] Michal Křížek, Florian Luca, and Lawrence Somer. 17 Lectures on Fermat Numbers: From Number Theory to Geometry. Springer, Feb. 2002. isbn: 0-387-95332-9.
[Knu97] Donald E. Knuth. The Art of Computer Programming, Volume II: Seminumerical Algorithms. 3rd ed. Reading, Massachusetts: Addison Wesley, 1997. ISBN: 0-201-89684-2.
[Koo02] Philip Koopman. "32-bit cyclic redundancy codes for Internet applications". In: DSN '02: Proceedings of the 2002 International Conference on Dependable Systems and Networks. Washington, DC, USA: IEEE Computer Society, 2002, pp. 459-468. ISBN: 0-7695-1597-5. Doi: 10.1109/DSN. 2002. 1028931.
[KRM08] Christopher Kennedy and Arash Reyhani-Masoleh. "High-speed parallel CRC circuits". In: Proceedings of the 42nd Asilomar Conference on Signals, Systems and Computers. Oct. 2008, pp. 1823-1829. ISBN: 978-1-4244-2940-0. DOI: 10.1109/ACSSC. 2008.5074742.
[KRM09] Christopher Kennedy and Arash Reyhani-Masoleh. "High-speed CRC computations using improved state-space transformations". In: 2009 IEEE International Conference on Electro/Information Technology. IEEE, June 2009, pp. 9-14. ISBN: 978-1-4244-3354-4. DoI: 10 . 1109 / EIT . 2009 . 5189575.
[LC83] Shu Lin and Daniel J. Costello. Error Control Coding. Prentice Hall, 1983. ISBN: 0-13-283796-X.
[Len+93] A. Lenstra, H. Lenstra, M. Manasse, and J. Pollard. "The number field sieve". In: The development of the number field sieve. Ed. by Arjen Lenstra and Hendrik Lenstra. Vol. 1554. Lecture Notes in Mathematics. Springer

Berlin / Heidelberg, 1993, pp. 11-42. ISBN: 978-3-540-57013-4. DOI: 10 . 1007/BFb0091537.
[Len91] H. W. Lenstra. "Finding isomorphisms between finite fields". In: Mathematics of Computation 56.193 (Jan. 1991), pp. 329-347. IssN: 0025-5718. doi: 10.1090/S0025-5718-1991-1052099-2.
[LN86] Rudolf Lidl and Harald Niederreiter. Introduction to finite fields and their applications. Revised ed. New York, NY, USA: Cambridge University Press, 1986. ISBN: 0-521-46094-8.
[Lov92] Renet Lovorn. "Rigorous, subexponential algorithms for discrete logarithms over finite fields". PhD thesis. University of Georgia, June 1992.
[LP98] Renet Lovorn Bender and Carl Pomerance. "Rigorous discrete logarithm computations in finite fields via smooth polynomials". In: Computational Perspectives on Number Theory: Proceedings of a Conference in Honor of A.O.L. Atkin. Ed. by Duncan A. Buell and Jeremy T. Teitelbaum. Vol. 7. AMS/IP Studies in Advanced Mathematics. Providence, Rhode Island, USA: American Mathematical Society, 1998, pp. 221-232. ISBn: 0-8218-0880-X.
[Mao+01] Bu-Qing Mao, Farid Hamzei-Sichani, Dmitriy Aronov, Robert C Froemke, and Rafael Yuste. "Dynamics of Spontaneous Activity in Neocortical Slices". In: Neuron 32.5 (Dec. 2001), pp. 883-898. IssN: 08966273. DoI: 10.1016/S0896-6273(01)00518-9.
[Mas69] James Massey. "Shift-register synthesis and BCH decoding". In: IEEE Transactions on Information Theory 15.1 (Jan. 1969), pp. 122-127. IssN: 0018-9448. DoI: 10.1109/TIT. 1969.1054260.
[McC90] Kevin S. McCurley. "The discrete logarithm problem". In: Cryptology and Computational Number Theory (Proceedings of Symposia in Applied Mathematics). Ed. by Carl Pomerance. Vol. 42. Providence, Rhode Island, USA, 1990. Isbn: 0-8218-4310-9.
[Meg61] J. Meggitt. "Error correcting codes and their implementation for data transmission systems". In: IEEE Transactions on Information Theory 7.4 (Oct. 1961), pp. 234-244. ISSN: 0018-9448. DOI: 10 . 1109 /TIT . 1961. 1057659.
[Mos96] Michele Mosca. "Discrete logarithms in finite fields". MA thesis. Wolfson College, University of Oxford, 1996.
[MVO96] Alfred J. Menezes, Scott A. Vanstone, and Paul C. Van Oorschot. Handbook of Applied Cryptography. 1st ed. Boca Raton, FL, USA: CRC Press, Inc., 1996. ISBN: 0-8493-8523-7.
[Nec94] V. I. Nechaev. "Complexity of a deterministic algorithm for the discrete logarithm". In: Mathematical Notes 55.2 (Feb. 1994), pp. 165-172. IssN: 0001-4346. Doi: 10.1007/BF02113297.
[Ngu09] Gam D. Nguyen. "Fast CRCs". In: IEEE Transactions on Computers 58.10 (2009), pp. 1321-1331. ISSN: 0018-9340. DoI: 10.1109/TC.2009.83.
[Odl00] Andrew M. Odlyzko. "Discrete Logarithms: The Past and the Future". In: Designs, Codes and Cryptography 19.2-3 (Mar. 2000), pp. 129-145. IssN: 0925-1022. Doi: 10.1023/A:1008350005447.
[Odl85] Andrew M. Odlyzko. "Discrete logarithms in finite fields and their cryptographic significance". In: Advances in Cryptology: Proc. of Eurocrypt 84, Lecture Notes in Computer Science. Ed. by Thomas Beth, Norbert Cot, and Ingemar Ingemarsson. Vol. 209. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 1985, pp. 224-314. IsBN: 978-3-540-16076-2. doi: 10.1007/3-540-39757-4_20.
[PB61] W.W. Peterson and D.T. Brown. "Cyclic Codes for Error Detection". In: Proceedings of the IRE. Vol. 49. 1. Jan. 1961, pp. 228-235. Doi: 10.1109/ JRPROC.1961.287814.
[Pet60] W.W. Peterson. "Encoding and error-correction procedures for the BoseChaudhuri codes". In: IEEE Transactions on Information Theory 6.4 (Sept. 1960), pp. 459-470. IsSN: 0018-9448. DoI: 10.1109/TIT. 1960.1057586.
[PH78] S. Pohlig and M. Hellman. "An improved algorithm for computing logarithms over $G F(p)$ and its cryptographic significance (Corresp.)" In: IEEE Transactions on Information Theory 24.1 (Jan. 1978), pp. 106-110. Issn: 0018-9448. DoI: 10.1109/TIT.1978. 1055817.
[Pla+07] Luis A. Plana, Steve B. Furber, Steve Temple, Mukaram Khan, Yebin Shi, Jian Wu, and Shufan Yang. "A GALS Infrastructure for a Massively Parallel Multiprocessor". In: IEEE Design \& Test of Computers 24.5 (Sept. 2007), pp. 454-463. ISSN: 0740-7475. DOI: 10.1109/MDT. 2007.149.
[Pla+11] Luis A. Plana, David Clark, Simon Davidson, Steve Furber, Jim Garside, Eustace Painkras, Jeffrey Pepper, Steve Temple, and John Bainbridge. "SpiNNaker: Design and Implementation of a GALS Multicore System-onChip". In: ACM Journal on Emerging Technologies in Computing Systems 7.4 (Dec. 2011), pp. 1-18. ISSN: 1550-4832. DoI: 10.1145/2043643.2043647.
[Pol78] J. M. Pollard. "Monte Carlo Methods for Index Computation (mod p)". In: Mathematics of Computation 32.143 (July 1978), p. 918. IsSN: 0025-5718. DOI: $10.2307 / 2006496$.
[Pom87] Carl Pomerance. "Fast, rigorous factorization and discrete logarithm algorithms". In: Discrete Algorithms and Complexity. Ed. by D. S. Johnson, T. Nishizeki, A. Nozaki, and H. S. Wilf. Vol. 15. Orlando, Florida: Academic Press Inc, Apr. 1987, pp. 119-143. ISBN: 0-12-386870-X.
[Pra57] Eugene Prange. Cyclic Error-Correcting Codes in two symbols. Tech. rep. No. AFCRC-TN-57-103, ASTIA Document No. AD133749. Bedford, Massachusetts, USA: Electronics Research Directorate, Air Force Cambridge Research Center, Air Research and Development Command, United States Air Force, Sept. 1957.
[PW72] W.W. Peterson and E.J. Weldon. Error-Correcting Codes. 2nd ed. The MIT Press, 1972. isbn: 0-262-16-039-0.
[PZ92] T.-B. Pei and C. Zukowski. "High-speed parallel CRC circuits in VLSI". In: IEEE Transactions on Communications 40.4 (Apr. 1992), pp. 653-657. ISSN: 0090-6778. Doi: 10.1109/26.141415.
[RF89] T. R. N. Rao and Eiji Fujiwara. Error-Control Coding for Computer Systems. Prentice Hall, 1989, p. 524. ISBN: 0-13-284068-5.
[RG88] Tenkasi V. Ramabadran and Sunil S. Gaitonde. "A tutorial on CRC computations". In: IEEE Micro 8.4 (Aug. 1988), pp. 62-75. IssN: 0272-1732. Doi: 10.1109/40.7773.
[Rie94] Hans Riesel. Prime Numbers and Computer Methods for Factorization. 2nd ed. Birkhäuser Boston, 1994. ISBN: 0-8176-3743-5.
[RM64] L. Rudolph and M. Mitchell. "Implementation of decoders for cyclic codes (Corresp.)" In: IEEE Transactions on Information Theory 10.3 (July 1964), pp. 259-260. ISSN: 0018-9448. DOI: 10.1109/TIT .1964.1053672.
[Ros93] Kenneth H Rosen. Elementary Number Theory and Its Applications. 3rd ed. Addison-Wesley Publishing Company, 1993. ISBN: 0-201-57889-1.
[Sch+96] Oliver Schirokauer, Damian Weber, Thomas Denny, and Henri Cohen. "Discrete logarithms: The effectiveness of the index calculus method". In: Algorithmic Number Theory. Ed. by Henri Cohen. Vol. 1122. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996, pp. 337-361. Isbn: 978-3-540-61581-1. Doi: 10.1007/3-540-615814_66.
[Sch02] Oliver Schirokauer. "The Special Function Field Sieve". In: SIAM fournal on Discrete Mathematics 16.1 (2002), p. 81. IsSN: 0895-4801. Doi: $10.1137 /$ S0895480100372668.
[Sch05] Oliver Schirokauer. "Virtual logarithms". In: Journal of Algorithms 57.2 (Nov. 2005), pp. 140-147. ISSN: 0196-6774. Dor: 10.1016/j.jalgor . 2004. 11.004.
[Sch08] Oliver Schirokauer. "The impact of the number field sieve on the discrete logarithm problem in finite fields". In: Algorithmic Number Theory. Ed. by Joseph P. Buhler and Peter Stevenhagen. Vol. 44. MSRI Publications. Cambridge University Press, 2008, pp. 397-420. ISBN: 0-521-80854-5.
[Sch10] Oliver Schirokauer. "The number field sieve for integers of low weight". In: Mathematics of Computation 79.269 (Jan. 2010), pp. 583-583. IssN: 0025-5718. DoI: 10.1090/S0025-5718-09-02198-X.
[Sch91] C.P. Schnorr. "Efficient signature generation by smart cards". In: Journal of Cryptology 4.3 (1991), pp. 161-174. ISsN: 0933-2790. DoI: 10 . 1007 / BF00196725.
[Sch93] O. Schirokauer. "Discrete Logarithms and Local Units". In: Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 345.1676 (Nov. 1993), pp. 409-423. Issn: 1364-503X. Doi: 10.1098/ rsta. 1993.0139.
[Sha48a] Claude E. Shannon. "A Mathematical Theory of Communication (Part I)". In: Bell System Technical fournal 27.3 (July 1948), pp. 379-423.
[Sha48b] Claude E. Shannon. "A Mathematical Theory of Communication (Part II)". In: Bell System Technical fournal 27.4 (Oct. 1948), pp. 623-656.
[Sha71] Daniel Shanks. "Class Number, a Theory of Factorization, and Genera". In: Proceedings of Symposia in Pure Mathematics. Ed. by Donald J. Lewis. Vol. 20. Providence, Rhode Island, USA: American Mathematical Society, 1971, pp. 415-440.
[Shi+01] M.-D. Shieh, M.-H. Sheu, C.-H. Chen, and H.-F. Lo. "A systematic approach for parallel CRC computations". In: Journal of Information Science and Engineering 17.3 (May 2001), pp. 445-461.
[Sho97] Victor Shoup. "Lower Bounds for Discrete Logarithms and Related Problems". In: Advances in Cryptology EuroCrypt 97. Ed. by Walter Fumy. Vol. 1233. Lecture Notes in Computer Science. Springer-Verlag, 1997, pp. 256-266. ISBN: 978-3-540-62975-7. DOI: 10. 1007/3-540-690530_18.
[Spr01] M. Sprachmann. "Automatic generation of parallel CRC circuits". In: IEEE Design \& Test of Computers 18.3 (May 2001), pp. 108-114. Issn: 0740-7475. Doi: 10.1109/54.922807.
[SPW09] Bianca Schroeder, Eduardo Pinheiro, and Wolf-Dietrich Weber. "DRAM errors in the wild". In: SIGMETRICS'09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems. New York, New York, USA: ACM Press, 2009, pp. 193-204. ISBN: 978-1-60558-511-6. DOI: 10.1145/1555349.1555372.
[Sti02] Douglas Robert Stinson. Cryptography : theory and practice. 2nd ed. Boca Raton, FL, USA: Chapman \& Hall/CRC, 2002. ISBN: 1-58488-206-9.
[Tes00] Edlyn Teske. "On random walks for Pollard's rho method". In: Mathematics of Computation 70.234 (Feb. 2000), pp. 809-826. ISSN: 0025-5718. DOI: 10.1090/S0025-5718-00-01213-8.
[TFM96] Simon Thorpe, Denis Fize, and Catherine Marlot. "Speed of processing in the human visual system." In: Nature 381.6582 (June 1996), pp. 520-522. ISSN: 0028-0836. DOI: 10.1038/381520a0.
[Toa+09] Ciaran Toal, Kieran McLaughlin, Sakir Sezer, and Xin Yang. "Design and Implementation of a Field Programmable CRC Circuit Architecture". In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17.8 (Aug. 2009), pp. 1142-1147. ISSN: 1063-8210. DOI: 10.1109/TVLSI 2008. 2008741.
[TW11] Andrew Tanenbaum and David J. Wetherall. Computer Networks. 5th ed. Boston, Massachusetts: Pearson Education, 2011. IsbN: 0-13-255317-1.
[Zie+96] J. F. Ziegler et al. "IBM experiments in soft fails in computer electronics (1978-1994)". In: IBM Journal of Research and Development 40.1 (Jan. 1996), pp. 3-18. IsSN: 0018-8646. DoI: 10.1147/rd.401.0003.
[Zie74] Neal Zierler. "A conversion algorithm for logarithms on $G F\left(2^{n}\right)$ ". In: Journal of Pure and Applied Algebra 4.3 (1974), pp. 353-356. IssN: 00224049. DoI: 10.1016/0022-4049(74) 90016-4.
[Zie96] J. F. Ziegler. "Terrestrial cosmic rays". In: IBM Journal of Research and Development 40.1 (Jan. 1996), pp. 19-39. IssN: 0018-8646. Doi: $10.1147 /$ rd.401.0019.
[ZL79] J. F. Ziegler and W. A. Lanford. "Effect of cosmic rays on computer memories." In: Science (New York, N.Y.) 206.4420 (Nov. 1979), pp. 776-788. ISSN: 0036-8075. DoI: 10.1126/science.206.4420.776.

