Data preprocessing for a vehicle-based localization system used in road traffic applications

This paper presents a fixed-point implementation of the preprocessing using a field programmable gate array (FPGA), which is required for a multipath joint angle and delay estimation (JADE) used in road traffic applications. This paper lays the foundation for many model-based parameter estimation methods. Here, a simulation of a vehicle-based localization system application for protecting vulnerable road users, which were equipped with appropriate transponders, is considered. For such safety critical applications, the robustness and real-time capability of the localization is particularly important. Additionally, a motivation to use a fixedpoint implementation for the data preprocessing is a limited computing power of the head unit of a vehicle. This study aims to process the raw data provided by the localization system used in this paper. The data preprocessing applied includes a wideband calibration of the physical localization system, separation of relevant information from the received sampled signal, and preparation of the incoming data via further processing. Further, a channel matrix estimation was implemented to complete the data preprocessing, which contains information on channel parameters, e.g., the positions of the objects to be located. In the presented case of a vehicle-based localization system application we assume an urban environment, in which multipath propagation occurs. Since most methods for localization are based on uncorrelated signals, this fact must be addressed. Hence, a decorrelation of incoming data stream in terms of a further localization is required. This decorrelation was accomplished by considering several snapshots in different time slots. As a final aspect of the use of fixed-point arithmetic, quantization errors are considered. In addition, the resources and runtime of the presented implementation are discussed; these factors are strongly linked to a practical implementation.


Introduction
The objective of a localization system is to determine the relative position of an object to be located.Different applications have brought forth several localization methods (van der Veen et al., 1997), most of which are theoretical.These studies focus on evaluating the performance of these localization methods; however, they neglect the implementation aspect and the impact of hardware components within the localization system.These aspects are strongly linked to a practical system design, which is motivated by the application of a vehicle-based localization system for protecting vulnerable road users.These vulnerable road users are equipped with appropriate transponders.The detection of the vehicle environment is an important step toward autonomous driving.For such safety critical applications, the robustness and realtime capability of the localization is the focus.In addition, the computing power of the head unit of a vehicle is limited.In a real system, such as that introduced above, some nonidealities exist.Here, the focus is on the localization unit.Each localization system must be calibrated, and this calibration is one of the steps used to compensate the non-idealities of the hardware components that compose the system.The location system can be considered as a single-input multipleoutput system (cf.Sect.2).In practice, individual channels of a multi-channel receiver within the localization unit are different in their electrical length and attenuation.This deviation from the model affects the subsequent localization.In addition, in the case of wideband localization, these impacts depend on frequency.Moreover, mutual coupling of the an-T.Patelczyk et al.: Data preprocessing for a vehicle-based localization system tenna elements have a negative impact on the angle estimation within the joint angle and delay estimation (JADE); thus, it is advisable to consider coupling in the model (Friedlander and Weiss, 1991).In addition to the aforementioned effects, other challenges, including detection of the desired signal from the continuously sampled received data, must also be addressed efficiently.Furthermore, it is necessary to estimate the channel from the incoming data; numerous approaches have been proposed for this purpose.In Cozzo and Hughes (2003), expectation maximization-based channel estimation is considered; other approaches for channel estimation can be found in Pal (1992), Yang et al. (2001), Van der Veen et al. (1995) and Biguesh and Gershman (2006).The channel information lays the foundation for JADE, and thus, for the localization of an object such as a vulnerable road user.
The aforementioned challenges must be effectively implemented.In contrast to previous studies, this study focuses on a practical implementation of data preprocessing required for JADE, focusing on a safety critical vehicle-based localization system used in road traffic applications.
The paper is organized as follows: first, a transponderbased localization system for protecting vulnerable road users is presented.Here, the boundary conditions are defined, such as the required throughput of the localization defined by the application.Next, we study the localization unit.System errors, which occur in a practical application in contrast to theoretical considerations, are identified.These errors can be taken into account within the signal processing.For this purpose, a preprocessing step will be introduced to account for the chosen identified non-idealities.This preprocessing step is the focus of this paper.Subsequent localization methods can build on this step.A hardware architecture for compensating these non-idealities is presented.A field programmable gate array (FPGA) is used as the hardware platform.We discuss the key components as well as resource requirement and run time of the architecture.The architecture are assessed in terms of the safety critical application.Finally, a localization method is applied which is based on the presented fixed-point preprocessing.Thus it is possible to make a statement on the impact of fixed-point implementation on the subsequent localization result.

Problem formulation
The urban environment is described by a multipath propagation channel, as shown in Fig. 1.The channel model is derived from van der Veen et al. (1997) and Jaafar et al. (2005), and is represented by a complex channel matrix H, which contains informations about the position of objects.For convenience, we consider a single-user mode here; however, this case is without loss of generality because each user is placed in its own time slot.The access time is denoted by t a .One user is equipped with a transponder, which responds to a localization request via the localization unit.The transponder unit transmits a digital modulated sequence s[n] of n symbols over the wireless channel.The received baseband signal x m (t) at an mth element antenna array at time t is given by the convolution of the transmitted digital sequence s[n] with the channel impulse response h(t).The channel impulse response includes r discrete paths.Each path is parameterized by a path attenuation β, a path delay τ and a direction of arrival .
The sampled received signal is represented in a compact form by where X ∈ C M•P ×N , H ∈ C M•P ×L , and S ∈ R L×N contain the transmitted sequence in the form of a Toeplitz matrix.P denotes an oversampling factor and L denotes the physical channel length.Additive white Gaussian noise (AWGN) is modeled by N. A direct estimate of the channel matrix is given by where (•) † denotes the pseudo-inverse of the signal matrix.This approach ignores noise, but is still chosen because it can be implemented easily in hardware.Furthermore the effect of the noise termes is reduced by averaging several snapshots.The pseudo-inverse of the signal matrix is required because, in general, L = N. Equation ( 2) lays the foundation for further considerations.Finally, the system requirements are considered.The system should be designed for at least 20 users, as such a number reflects the urban environment.In addition, an update of 50 Hz is assumed, which results in an access time of t a = 1 ms.The data preprocessing must be performed at least at this time slot.The subsequent localization can be performed in the background while other users are queried.

System design
This section presents the system design of a cooperative sensor system, focusing on an application out of the localization field.First, a block diagram of the system is shown.The function of each component is summarized, and differences with respect to a theoretical consideration are identified.Finally, an efficient preprocessing implementation is presented.Figure 2 shows a block diagram of the localization unit of the localization system, operating at a center frequency of f 0 = 5.8 GHz.The application follows the IEEE 802.11p standard, which is intended for traffic applications.On a localization request, the transponder unit responds with a digital modulated transmission sequence s[k].This is received by an uniform linear antenna array.The elements are arranged at a distance λ 0 2 .The radio frequency signal is converted to the baseband.In this case, a complex demodulation approach used.frequency f s is reduced in contrast to the sample rate, which is required at an intermediate frequency (IF) demodulation.The baseband signals are sampled synchronously.Finally, the sampled signals are passed to a digital signal processing unit, which is realized by an FPGA of the Virtex-6 family (Xilinx, 2012b).
In contrast to the theoretical considerations, the localization unit consists of non-ideal components.Generally, antenna elements do not have omnidirectional characteristics.In addition, coupling exists between individual antenna elements (Weiss and Friedlander, 1989).The radio frequency front end results in frequency-selective phase and amplitude errors due to individual delays and attenuation of individual receiving channels.The ADC discretizes the baseband signal in amplitude and time (12 bit, 65 MSps).Due to a finite word length, quantization errors occur.The DSP operates with a finite word length as well (processing word width of 12 bit).The fixed-point arithmetic is more efficient than the floatingpoint arithmetic in terms of resources.These non-idealities impact the quality of a further localization result, and thus, must be considered in a preprocessing approach.
Next, an implementation of data preprocessing for a model-based parameter estimation is presented, which lays the foundation for further model-based estimation methods.
Figure 3 shows an implementation of the detector unit.The objective is the separation of the relevant information from the sampled data stream x ref ; this process is denoted by step 1.The concept is taken from Schaffer (2014).
The detection is based on a sample-wise comparison of a received input sequence x ref [k] with a copy of the digital modulated transmission sequence s[k].First, the samples are evaluated according to their signs: x ref [k] ≤ 0 → 0. The comparator COMP compares the rated input sequence with the copy of the digital modulated transmission sequence.The number of matches is accumulated in the accumulator ACC.When this sum exceeds a threshold T , the relevant information is detected, which is indicated by the output RDY.The transmit sequence is assumed to be known.This step is performed in real time.The implementation effort is limited to simple logic elements.One ANDlogic is necessary to detect the sign based on the first bit of the sampled data stream (sign bit), and the other is used as a comparator COMP.The accumulator is a simple full adder composed of basic logic elements.The comparison of the sum with the threshold value T is also implemented using the AND-logic blocks.
Next, the calibration of the physical system is considered, denoted by step 2. Figure 4 presents an implementation of such a calibration in the case of wideband signals.The objective is the consideration of frequency-selective phase and amplitude errors, that result from the nature of a real receiving channel compared to the theoretical one.Generally, a distinction between computing calibration weights (EN = 0) and the actual calibration process (EN = 1) is made; this selection is provided by a multiplexer.Both parts are marked.First, the focus is on computing calibration weights denoted by step 2.1; these weights are summarized in a calibration matrix C ∈ C M×K .In terms of the application, this calibration matrix is not critical during run time.We assume that thermal fluctuations can be neglected during run time.The localization system is in thermal equilibrium.In addition, it is assumed that the drift of the local oscillator can be neglected as well.Therefore, a resource-saving approach, instead of a run time saving approach is chosen.For this purpose, a second multiplexer is used.The control input CH sel is used to select a single channel.Thus, it is possible to calculate the weights sequentially, channel by channel.Sampled values are available in the frequency domain.Frequency-selective amplitude and phase values are determined using the Coordinate Rotation Digital Computer (CORDIC) algorithm (Volder, 1959).This algorithm is an efficient implementation of an iterative algorithm for computing trigonometric functions and enables the formation of the magnitude of a complex number.The calculated calibration weights refer to a reference channel, which is initially set arbitrarily (here x ref = CH 1 ).Finally, the determined values are converted into component repre- The calibration matrix is stored in the random-access memory (RAM).Next, the actual calibration process is considered, denoted by step 2.2.Applying calibration weights C on an input data stream is critical in run time.A parallel processing is proposed.First, the uncalibrated values X are transformed (Xilinx, 2012a).The weighted frequency bins can be transformed back simultaneously.Different approaches exist for an efficient implementation.The Cooley-Tukey algorithm (Amirfattahi, 2013), also called the radix-2 algorithm, is the most common FFT algorithm.A further optimization is the use of a radix-p algorithm; an implementation of this algorithm is presented in Uzun (2003).Next, two architectures are compared: radix-2 and radix-4.The frequency-selective weighting is performed simultaneously with the output of the FFT-Block.An inverse FFT transforms the corrected data stream back into the time domain.It is advisable to use the same hardware resources for the FFT and its inverse.These parameters differ in the argument of the exponential by a sign.Thus, the real-time capability is ensured.
Finally, the channel matrix is estimated, denoted by step 3.The starting point forms Eq. (2).A channel matrix estimation by a single complex multiplier requires at least 2•M •P •L = 448 DSP48E1 -blocks (cf.Virtex-6 (XC6VLX75T) 288 DSP48E1 available, for parameter see Sect. 4).The factor of 2 takes into account a semi-complex multiplier (X Cal.∈ C, S † ∈ R) consisting of two real multiplier blocks (DSP48E1).At this point, a compromise is proposed.Figure 5 illustrates the block diagram of a semi-parallel implementation of the channel matrix estimation approach.The rearranged signal matrix Ṡ ∈ R P •L×P •N is assumed to be known.This exists in the form of a Toeplitz matrix.During run time, no calculation is necessary.The required pseudo-inverse signal matrix Ṡ † is initially stored in the RAM (for clarity, the control logic and RAM are not illustrated).
A multiplexer selects the mth row of X Cal. by EN sel .At the inputs of the complex multiplier block bank, the mth row Ṡ † (m, :) is available.The multiplication result is written to the mth row of the buffered variable Ḣtemp .The pointer (denoted by 2) specifies the location of the current calculation; this points to the mth row.The calculation result of each sample is accumulated in Ḣtemp .According to this principle, the channel matrix is calculated sample-wise for one snapshot.There are different approaches for the implementation of a matrix multiplication on a parallel operating hardware platform (Mahendra Vucha, 2011;Qasim et al., 2010;Kumar and Tsai, 1991).
It is assumed that the complex fading amplitudes are constant during one snapshot, and that they vary from one snapshot to the next.This is valid for the assumed scenario.Thus, decorrelation is performed by averaging multiple channel matrices from different snapshots, which are uncorrelated in terms of complex fading.Alternatively, additional techniques must be applied, e.g., spatial -and temporal smoothing (Pillai and Kwon, 1989).These techniques require no additional hardware (except for a counter logic, which controls the input RESET av for resetting Ḣtemp .).After an initial set, the number of desired snapshots, Ḣtemp is reset by RESET av (RDY → 1), which results in channel matrix Ĥav .This re-  sult forms the starting point for model-based algorithms for JADE.

System performance
In this section, the system performance is discussed.The results are summarized in Table 1.As an implementation platform, a Virtex-6 FPGA (Xilinx, 2012b) is assumed.The input data stream is quantized by 12 bit in amplitude (ADC), and the internal processing word width is also assumed to be 12 bit.The results are specified for an operating frequency of f max = 310 MHz; this frequency is defined by the design.Furthermore a 64-bit training sequence is assumed.Generally, a linear relationship exists between the resource requirements and the input parameters M, P and L as well as latency.In the following the input parameters are assumed to be M = 8, P = 4 and L = 7, and the FFT is designed using a radix-2 architecture (K = P • N ).
In terms of the application, the system design is real timecapable.The FFT, which is used in step 2, determines the latency and the run time of the presented data preprocessing.If required, a part of the fast hardware multiplier DSP48E1 can be replaced by using look-up tables (LUTs).However, this approach has a decreased clock frequency f max .
In Table 2, different architectures of FFT are compared, focusing on the impact on resources and run time.
The speed advantage of using a radix-4 FFT architecture, compared to a radix-2, architecture results in an increased resource requirement, which increases disproportionately.In this case, the application cost and benefits are out of proportion, since the runtime requirement is complied with both architectures (t radix 1 ms).

Application
This section focuses on the application of the presented data preprocessing for the case of model-based parameter estimation.For this purpose, the incoming data X are applied by frequency-selective phase and amplitude errors; these data are discrete in time and amplitude.One approach for source localization in a multipath environment is JADE-multiple signal classification (MUSIC).Similar to MUSIC, which was published by Schmidt in 1986(Schmidt, 1986), the orthogonality between the noise subspace E n and the signal subspace E s of the correlation matrix Rhh = h • h H is exploited, where H denotes the Hermitian operator.By the formation of the correlation matrix, there is an association between spatial samples in different times of the estimated channel matrix response vector ĥ = vec( Ĥ).Assuming a space-time manifold u( , τ ), the two-dimensional spectral function results in The local maxima provide information regarding the position ˆ , τ of the object to be located.Figure 6 shows the two-dimensional spectral function of the JADE-MUSIC algorithm.
Data preprocessing is performed as described above.In this case, a direct path and an additional multipath are assumed.The object is located at −10 • , 2 • T s and the multipath is located at 20 • , 2.5 • T .An eight-element antenna array and a 64-bit training sequence are assumed.The signalto-noise ratio (SNR) is assumed to be 20 dB.This is consistent with the practice measurements.The multipath component is attenuated by 3 dB in terms of the direct path.In the case of a separate angle and delay estimation, a resolution of the multipath component, i.e., an additional benefit of joint parameter estimation, is not possible.Thus, the paths can be resolved by using the additional angle information.The quantization by the presented design can be neglected.These results are obtained via a Monte Carlo simulation (500 runs) for the presented scenario above.In one case, the spectrum was computed using the presented fixed-point preprocessing; alternatively, a 64 bit floating-point implementation was used.The mean square errors (MSE) between these implementation variants are MSE ˆ fixed-point , ˆ floating-point ≈ 0 and MSE dfixed-point , dfloating-point ≈ 0. For the spectrum a computation grid of = 1.0 • and = 1.0 m was assumed.Generally the impact of the modulation bandwidth of the sequence s[n] or the impact of the number of antenna elements in terms of the resolution of the multipaths is much larger than the quantization errors obtained using the presented preprocessing step.

Conclusions
In this study, a real time-capable system design of the data preprocessing for a vehicle-based localization system used in road traffic applications was presented.As an implementation platform, an FPGA was used.The objective was the compensation of hardware imperfections compared to the theoretical consideration of a localization system.This preprocessing approach includes a separation of relevant information from the incoming data stream, a wideband calibration of the physical system and a channel estimation, as well as a decorrelation of the correlated signal components, which are contained in the incoming data stream because of a multipath assumption of the wireless channel.In this paper, the implementation aspects, including resources and run time aspects, were discussed.It was demonstrated that FFT, which is used for a frequency-selective calibration, is computationally intensive.For this purpose, two architectures (radix-2 and radix-4) were compared.Depending on the requirements, a balance between the resources and run time was found to be required.Using the presented architecture real-time capability was achieved.Each snapshot incurred a processing time of approximately 9.2 µs, which was well below the requirements.
Due to the digital signal processing in fixed-point arithmetic (processing word width of 12 bit), quantization errors occurred.The used word width was found to be sufficient for a subsequent localization.The proposed implementation was demonstrated to be well-suited for the application of protecting vulnerable road users.As a next step, it will be useful to account for the errors originating from the antenna array, for example, the consideration of mutual coupling.This will also impact the channel matrix estimation and localization results.

Data availability
Data used in this paper is available upon request to Timo Patelczyk (timo.patelczyk@tum.de).

Figure 1 .
Figure 1.Multipath propagation channel for location applications in the urban environment.

Figure 2 .
Figure 2. Block diagram of a localization unit as a part of a cooperative sensor system for protection of vulnerable road users.

Figure 3 .
Figure 3. Block diagram of an implementation of a detector for sequence separation for a localization system.

Figure 4 .
Figure 4. Block diagram of an implementation of a wideband calibration for a localization system.

Figure 5 .
Figure 5. Block diagram of the implementation of an efficient channel matrix estimation approach for a localization system.(1) Complex hardware multiplier.(2) Line selection pointer.

Table 1 .
Performance evaluation for the presented implementation of a real-time capable data preprocessing for model-based parameter estimation.

Table 2 .
Comparison different FFT architectures in terms of a performance evaluation of the presented implementation of a realtime capable data preprocessing for model-based parameter estimation.
Figure 6.Spectral Function of JADE-MUSIC in the case of the presented data preprocessing approach.