Gbit / s real-time test environment for integrated photonic DQPSK receivers

Abstract. In this paper an FPGA-based test system for high-speed transmission experiments with integrated photonic receivers is presented. Pseudorandom binary sequences are generated inside the FPGA and encoded as either differential quadrature phase shift keying (DQPSK) or quadrature phase shift keying (QPSK) signals. The DQPSK encoder uses a 64-fold parallel-prefix-layers architecture for real-time operation which allows for a maximum internal encoder data rate of 64 Gbit/s. Two-fold parallel data streams of I and Q signals suitable for driving an optical IQ-modulator can be transmitted and received by four 12.5 Gbit/s transceivers. Integrated bit error testers are used to determine bit error rates in real-time.


Introduction
To response the ever increasing bandwidth requirements in optical transmission links, data rate per channel is continuously increased as well as the aggregated data rate in one fiber, e.g. by using many channels in wavelength division multiplexing systems.The modulation scheme in most installed commercial transmission lines is still on-off keying (OOK), as simple photodetectors can be used for the optoelectric conversion at the end of the optical link.However, future data links require 40 Gbit/s, 100 Gbit/s or even higher data rate per channel, which is an issue in OOK transmission systems as the pulses become more susceptible for dispersion effects.In particular polarization mode dispersion randomly disturbs the signal which makes it difficult for compensation.Furthermore, high power densities in the short pulses give rise to disturbing non-linear effects.Using high-order modulation formats like quadrature phase shift keying, data rates can be increased while the symbol rate is kept con-stant.However as the phase of the optical signal has to be recovered, the receiver architecture becomes more complicated.Due to the constant progress in semiconductor processing integrated silicon photonic devices can now be fabricated with sufficient accuracy.This platform offers the possibility to integrate complex interferometer-based phase detectors even in combination with polarization and wavelength division multiplexers in a CMOS process, perspectively integrated together with electronic signal processing circuitry (Knoll, 2014).In many cases it is sufficient to characterize single photonic devices with continuous-wave transmission measurements.Some devices need to be embedded into test benches or being cascaded for determining their characteristics with high accuracy.For more complex circuits, like in the case of integrated receivers, high-speed data transmission experiments are desired to verify the operation and to attain device properties from a system level of view.This kind of experiment is more complex as further components such as fast optical modulators and photodiodes are needed as well as high-speed electronic transceivers and modulator drivers.Many of today's field programmable gate arrays (FPGA) offer several Gbit/s transceivers together with a large number of programmable logic cells, making them a cost effective solution for Gbit/s transceivers in transmission experiments.

Integrated photonic DQPSK receiver
Differential quadrature phase shift keying is a variation of the standard QPSK where two data bits b 0 (i) and b 1 (i) are coded in the phase transitions between consecutive symbols instead of being coded in the absolute phase position of the signal.Figure 1 (left) shows a part of the DQPSK constellation diagram together with the phase transitions.The phase of the i th symbol can be described with the status bits d 0 (i) and d 1 (i), lay line interferometer with 3-dB couplers or one delay line interferometer with a 90 • -hybrid (Seimetz, 2009).A block diagram with the principal components of the latter receiver is depicted in Fig. 2. The passive part has been realized as integrated receiver at the Institute for Electrical and Optical Communications Engineering on a silicon-on-insulator (SOI) wafer.

Published by Copernicus
Figure 2 shows a chip photo of a fabricated device.A grating coupler on the left hand side is used for the coupling between a standard single mode fiber and the planar photonic circuit.The 3-dB coupler is realized as symmetric multimode interference coupler (MMI) which distributes the incoming signal equally to two branches.A delay line is introduced in one of the branches, delaying the signal by one symbol period.Therefore the receiver has to be designed to work at a fixed data rate.The physical length of the delay line also depends on the waveguide geometry, i.e. the waveguide width and thickness.The simulated group index of a typical The actual detection of the phase difference between consecutive bits is done in the 2x4 90 • -hybrid, that is realized as MMI.This device has two input and four ou ports.The input signals are split equally and interfere a output ports with a phase difference of 0, 0.5π, π and 1 95 plus a constant offset angle ϕ MMI that depends on the dev The signals are coupled out again with grating couplers.opto-electric conversion can be done with external phot odes.However, future receivers will include integrated p todiodes.The signal and the delayed signal interfere at 100 four 90 • -hybrid outputs with a phase difference of ϕ + where n is an integer between 0 and 3.The constant p ϕ = ϕ MMI + ϕ ext is composed of the intrinsic MMI p offset and an external phase offset defined by the exact le difference between the two interferometer arms.When 105 anced photodiodes are used and ϕ is set to 3 4 π, the rece provides two output signals that correspond to the trans ted bits {b 0 , b 1 } according to Fig. 1 without requiring fur logic operations (Fischer, 2013).This fine adjustment ca achieved e.g. by using heaters on top of the delay line wit  that directly correspond to the in-phase (I) and quadrature (Q) signal for an optical IQ-Modulator.
While QPSK modulated signals require a coherent receiver, DQPSK coded symbols can be retrieved with a direct receiver.Possible implementations consists of either two delay line interferometer with 3-dB couplers or one delay line interferometer with a 90 • -hybrid (Seimetz, 2009).A block diagram with the principal components of the latter receiver is depicted in Fig. 2. The passive part has been realized as integrated receiver at the Institute for Electrical and Optical Communications Engineering on a silicon-on-insulator (SOI) wafer.
Figure 3 shows a chip photo of a fabricated device.A grating coupler on the left hand side is used for the coupling between a standard single mode fiber and the planar photonic circuit.The 3-dB coupler is realized as symmetric multimode interference coupler (MMI) which distributes the incoming signal equally to two branches.A delay line is introduced in one of the branches, delaying the signal by one symbol period.Therefore the receiver has to be designed to work at a fixed data rate.The physical length of the delay line also depends on the waveguide geometry, i.e. the waveguide width and thickness.The simulated group index of a typical nanowaveguide with a cross-section of 400 nm × 230 nm is about n g = 4.40 at a wavelength of λ = 1550 nm.For a target data rate of 25 Gbaud a length of about 2.7 mm is needed.
The actual detection of the phase difference between two consecutive bits is done in the 2 × 4 90 • -hybrid, that is also realized as MMI.This device has two input and four output ports.The input signals are split equally and interfere at the output ports with a phase difference of 0, 0.5π, π and 1.5π plus a constant offset angle ϕ MMI that depends on the device.The signals are coupled out again with grating couplers.The opto-electric conversion can be done with external photodiodes.However, future receivers will include integrated photodiodes.The signal and the delayed signal interfere at the four 90 • -hybrid outputs with a phase difference of ϕ + n π 2 , where n is an integer between 0 and 3.The constant phase Corresponding representation of phase states and transitions with arithmetic numbers for the modulo-4 based encoder.lay line interferometer with 3-dB couplers or one delay line interferometer with a 90 • -hybrid (Seimetz, 2009).A block 70 diagram with the principal components of the latter receiver is depicted in Fig. 2. The passive part has been realized as integrated receiver at the Institute for Electrical and Optical Communications Engineering on a silicon-on-insulator (SOI) wafer.

75
Figure 2 shows a chip photo of a fabricated device.A grating coupler on the left hand side is used for the coupling between a standard single mode fiber and the planar photonic circuit.The 3-dB coupler is realized as symmetric multimode interference coupler (MMI) which distributes the 80 incoming signal equally to two branches.A delay line is introduced in one of the branches, delaying the signal by one symbol period.Therefore the receiver has to be designed to work at a fixed data rate.The physical length of the delay line also depends on the waveguide geometry, i.e. the waveguide 85 width and thickness.The simulated group index of a typical nano-waveguide with a cross-section of 400 nm x 230 nm is about n g = 4.40 at a wavelength of λ = 1550 nm.For a target data rate of 25 Gbaud a length of about 2.7 mm is needed.lay line interferometer with 3-dB couplers or one delay line interferometer with a 90 • -hybrid (Seimetz, 2009).A block 70 diagram with the principal components of the latter receiver is depicted in Fig. 2. The passive part has been realized as integrated receiver at the Institute for Electrical and Optical Communications Engineering on a silicon-on-insulator (SOI) wafer.

75
Figure 2 shows a chip photo of a fabricated device.A grating coupler on the left hand side is used for the coupling between a standard single mode fiber and the planar photonic circuit.The 3-dB coupler is realized as symmetric multimode interference coupler (MMI) which distributes the 80 incoming signal equally to two branches.A delay line is introduced in one of the branches, delaying the signal by one symbol period.Therefore the receiver has to be designed to The actual detection of the phase difference between two consecutive bits is done in the 2x4 90 • -hybrid, that is also realized as MMI.This device has two input and four output ports.The input signals are split equally and interfere at the output ports with a phase difference of 0, 0.5π, π and 1.5π 95 plus a constant offset angle ϕ MMI that depends on the device.The signals are coupled out again with grating couplers.The opto-electric conversion can be done with external photodiodes.However, future receivers will include integrated photodiodes.The signal and the delayed signal interfere at the 100 four 90 • -hybrid outputs with a phase difference of ϕ + n π 2 , where n is an integer between 0 and 3.The constant phase ϕ = ϕ MMI + ϕ ext is composed of the intrinsic MMI phase offset and an external phase offset defined by the exact length difference between the two interferometer arms.When bal-105 anced photodiodes are used and ϕ is set to 3 4 π, the receiver provides two output signals that correspond to the transmitted bits {b 0 , b 1 } according to Fig. 1 without requiring further logic operations (Fischer, 2013).This fine adjustment can be achieved e.g. by using heaters on top of the delay line without 110 high power dissipation due to the large waveguide length.

FPGA-based test environment
The principal component of the test environment for the integrated optical receiver is the electrical DQPSK transceiver, where the signals are created and the received signals are 115 evaluated.
A block diagram of the realized transceiver is shown in Fig. 4. The principle functional blocks are the DQPSK encoder and the so-called GTX interfaces that include pseudorandom binary sequence (PRBS) generators, high-speed 120 electrical interfaces and the bit error tester.Further components such as an overall control block for selecting the PRBS sequence or starting and stopping the transmission, a serial ϕ = ϕ MMI +ϕ ext is composed of the intrinsic MMI phase offset and an external phase offset defined by the exact length difference between the two interferometer arms.When balanced photodiodes are used and ϕ is set to 3 4 π, the receiver provides two output signals that correspond to the transmitted bits {b 0 , b 1 } according to Fig. 1 without requiring further logic operations (Fischer, 2013).This fine adjustment can be achieved e.g. by using heaters on top of the delay line without high power dissipation due to the large waveguide length.

FPGA-based test environment
The principal component of the test environment for the integrated optical receiver is the electrical DQPSK transceiver, where the signals are created and the received signals are evaluated.
A block diagram of the realized transceiver is shown in Fig. 4. The principle functional blocks are the DQPSK encoder and the so-called GTX interfaces that include pseudorandom binary sequence (PRBS) generators, high-speed electrical interfaces and the bit error tester.Further components such as an overall control block for selecting the PRBS sequence or starting and stopping the transmission, a serial interface for the communication with a PC, memory and a reference decoder complete the transceiver design.are used to send and receive the data.For the data generation the GTX built-in PRBS generators are used.Different sequences like PRBS-7, PRBS-15, PRBS-23 and PRBS-31 can be generated and are shift out as 64 Bit vectors.In this implementation two pattern generators are used, one for the in-phase component and one for the quadrature component.Together they compose vectors with 64 two bit symbols.A multiplexer allows for bypassing the DQPSK encoder, so that a standard QPSK is transmitted.

DQPSK encoder 140
The coding of data bits b 0 (i) and b 1 (i) into phase transitions can be seen in the DQPSK constellation diagram in Fig. 1 (left).The transmission of the data 00, 10, 11 and 01 corresponds to a phase shift of 0, π 4 , π 2 , and 3π 4 .Starting with a reference phase the encoder therefore has to cal-145 culate the new phase position for each symbol, which is a function of the previous phase and the bits to be transmitted.There are different ways to implement this function.One approach with logic operations (Seimetz, 2009) requires very fast logic gates and flip-flops with a low propagation delay, 150 as the status bits must be fed back within one symbol duration.Another way is the use of arithmetic modulo-4 additions.In this approach a different representation of the phase positions and transitions is chosen, as shown in Fig. 1 (right).The status bits d are coded to phase bits z that represent the 155 phase as arithmetic numbers clockwise ascending from 0 to 3. The phase transitions are also coded as arithmetic numbers c, representing the phase transition directly as multiple of 90 • .With initial phase bits z(0), the phase bits of the n th symbol can then be calculated as (1) This calculation can be parallelized in parallel prefix network adders (PPN) (Harris, 2003), which moderates the timing requirements of the feedback path.The scheme of a PPN adder is shown in Fig. 5 (left).The scheme is limited to an 8-fold 165 parallel structure for clarity.Modulo-4 adders are marked with squares.
In the first stage, the sum of c(i) and its successor c(i + 1) are calculated for each even index i.With these results, all possible sums of c(i) and its successors c(i + 1), c(i + 2) 170 and c(i + 3) are calculated.This can be continued further up to the last stage.A n-fold parallel prefix network adder therefore needs log 2 (n) stages.At the end, the state z(n) of the last transmitted symbol has to be added to c(0) to maintain the phase difference to the first transmitted sym-175 bol.The feedback path is the critical path in this architecture.The propagation on the Kintex 7 in the range of about 270 ps to 440 ps, depending on the cell type, then still limit the maximum achievable bit rate.To attain a minimum of 50 Gbit/s at least 256-fold parallelization is needed.The 180 architecture of a parallel prefix layers adder (PPL), as presented in (Zhongxia, 2011), allows a more efficient use of the FPGA resources.The scheme of such an adder is shown in Fig. 5 (right).The main difference to a PPN adder are the flipflops at each stage.Thus the state z(n) has only to be added 185 to all the sums of c(0) to c(i) at the end of the adder structure, which shortens the feedback path and allows for higher clock rates.Comparing both architectures, similar data rates can be attained at a lower degree of parallelization in the PPL architecture due to higher possible clock frequencies at the 190 expense of a higher number of registers and adders.However, the extra effort for the additional cells is lower than that of a higher parallelized adder structure.Based on the typical propagation delays a 64-fold parallel PPL adder is sufficient to achieve a bit rate higher than 100 Gbit/s, leaving suffi-195 cient headroom for additional line delays in the final con- random-access memory is used.One of the GTX interfaces is available as electrical contact on the Xilinx evaluation board.Four more are accessible on a separate expansion board and are used to send and receive the data.For the data generation the GTX built-in PRBS generators are used.Different sequences like PRBS-7, PRBS-15, PRBS-23 and PRBS-31 can be generated and are shift out as 64 Bit vectors.In this implementation two pattern generators are used, one for the in-phase component and one for the quadrature component.Together they compose vectors with 64 two bit symbols.A multiplexer allows for bypassing the DQPSK encoder, so that a standard QPSK is transmitted.

DQPSK encoder
The coding of data bits b 0 (i) and b 1 (i) into phase transitions can be seen in the DQPSK constellation diagram in Fig. 1 (left).The transmission of the data 00, 10, 11 and 01 corresponds to a phase shift of 0, π 4 , π 2 , and 3π 4 .Starting with a reference phase the encoder therefore has to calculate the new phase position for each symbol, which is a function of the previous phase and the bits to be transmitted.There are different ways to implement this function.One approach with logic operations (Seimetz, 2009) requires very fast logic gates and flip-flops with a low propagation delay, as the status bits must be fed back within one symbol duration.Another way is the use of arithmetic modulo-4 additions.In this approach a different representation of the phase positions and transitions is chosen, as shown in Fig. 1 (right).The status bits d are coded to phase bits z that represent the phase as arithmetic numbers clockwise ascending from 0 to 3. The phase transitions are also coded as arithmetic numbers c, representing the phase transition directly as multiple of 90 • .With initial phase bits z(0), the phase bits of the n th symbol can then be calculated as (1) This calculation can be parallelized in parallel prefix network adders (PPN) (Harris, 2003), which moderates the timing requirements of the feedback path.The scheme of a PPN adder is shown in Fig. 5 (left).The scheme is limited to an 8-fold parallel structure for clarity.Modulo-4 adders are marked with squares.In the first stage, the sum of c(i) and its successor c(i + 1) are calculated for each even index i.With these results, all possible sums of c(i) and its successors c(i + 1), c(i + 2) and c(i + 3) are calculated.This can be continued further up to the last stage.A n-fold parallel prefix network adder therefore needs log 2 (n) stages.At the end, the state z(n) of the last transmitted symbol has to be added to c(0) to maintain the phase difference to the first transmitted symbol.The feedback path is the critical path in this architecture.The propagation delays, on the Kintex 7 in the range of about 270 ps to 440 ps, depending on the cell type, then still limit the maximum achievable bit rate.To attain a minimum of 50 Gbit/s at least 256-fold parallelization is needed.The architecture of a parallel prefix layers adder (PPL), as presented in (Zhongxia, 2011), allows a more efficient use of the FPGA resources.The scheme of such an adder is shown in Fig. 5 (right).The main difference to a PPN adder are the flipflops at each stage.Thus the state z(n) has only to be added to all the sums of c(0) to c(i) at the end of the adder structure, which shortens the feedback path and allows for higher clock rates.Comparing both architectures, similar data rates can be attained at a lower degree of parallelization in the PPL architecture due to higher possible clock frequencies at the expense of a higher number of registers and adders.However, the extra effort for the additional cells is lower than that of a higher parallelized adder structure.Based on the  2) c( 3) c( 4) c( 5) c( 6) c( 7) c( 0) c(0:1) c( 2) c(2:3) c( 4) c(4:5) c( 6 register stage c(0) c( 1) c( 2) c( 3) c( 4) c( 5) c( 6) c( 7) c( 0) c(0:1) c( 2) c(2:3) c( 4) c(4:5) c( 6 needed due to the lower clock frequency, occupying about 20 times more resources in terms of look-up tables (Winzer, 2008).The complete encoder is shown in Fig. 6.First the data bits are coded to the bits c by means of logic functions and 210 buffered in registers.The next stage is the PPL adder that calculates the bits z.In the last stage the phase bits are translated to the driving signals for IQ-modulator, again using logic functions.
Parallel-prefix-layers-adder (64-fold parallel) The synthesized encoder is tested in combination with a 215 slow reference decoder, with alternating encoding and decoding runs.A BER measurement of the final encoder layout is shown in Fig. 7. Up to a system clock of 500 MHz no bit errors are detected in a sequence of 10 7 bits.As 64 two-bit words are encoded in each cycle a total data rate of 220 64 Gbit/s is achieved.The architecture can be expanded to higher degrees of parallelization, as the encoder only uses a small portion of FPGA resources.For real-time transmission experiments the GTX built-in bit error testers are used.No hardware decoder is needed, as the integrated photonic re-225 ceiver already decodes the data inherently.The received data is checked against the selected PRBS sequence in blocks of 64 bit, i.e. a bit error counter does increment if one or more bits in one block are erroneous.For actual small bit error (BER) rates the measured BER is in the worst case by a fac-230 tor of 64 higher, depending on whether random or burst errors occur.
The serial interfaces are limited to a maximum data rate of 12.5 Gbit/s, therefore two transmitters are used for the I and Q signal, respectively.The signals are interleaved with exter-235 nal multiplexers at the transmitter side and de-interleaved at the receiver side with demultiplexers.For this purpose commercial Hittite ICs are used on printed circuit boards with Taconic high frequency substrate.flops.Regarding the hardware resources needed, the encoder core with PPL architecture needs at least 289 slices out of 200 about 50.000 slices available.Achieving similar performance with a PPN architecture would require a 512-fold parallelization, which would occupy about ten times more slices.In an implementation for 100 GBit/s with PPN architecture on a former upper-class FPGA a 1280-fold parallel encoder is 205 needed due to the lower clock frequency, occupying about 20 times more resources in terms of look-up tables (Winzer, 2008).
The complete encoder is shown in Fig. 6.First the data bits are coded to the bits c by means of logic functions and 210 buffered in registers.The next stage is the PPL adder that calculates the bits z.In the last stage the phase bits are translated to the driving signals for IQ-modulator, again using logic functions.
Parallel-prefix-layers-adder (64-fold parallel) The synthesized encoder is tested in combination with a 215 slow reference decoder, with alternating encoding and decoding runs.A BER measurement of the final encoder layout is shown in Fig. 7. Up to a system clock of 500 MHz no bit errors are detected in a sequence of 10 7 bits.As 64 two-bit words are encoded in each cycle a total data rate of 220 64 Gbit/s is achieved.The architecture can be expanded to higher degrees of parallelization, as the encoder only uses a small portion of FPGA resources.For real-time transmission experiments the GTX built-in bit error testers are used.No hardware decoder is needed, as the integrated photonic re-225 ceiver already decodes the data inherently.The received data is checked against the selected PRBS sequence in blocks of 64 bit, i.e. a bit error counter does increment if one or more bits in one block are erroneous.For actual small bit error (BER) rates the measured BER is in the worst case by a fac-230 tor of 64 higher, depending on whether random or burst errors occur.
The serial interfaces are limited to a maximum data rate of 12.5 Gbit/s, therefore two transmitters are used for the I and Q signal, respectively.The signals are interleaved with exter-235 nal multiplexers at the transmitter side and de-interleaved at the receiver side with demultiplexers.For this purpose commercial Hittite ICs are used on printed circuit boards with Taconic high frequency substrate.typical propagation delays a 64-fold parallel PPL adder is sufficient to achieve a bit rate higher than 100 Gbit/s, leaving sufficient headroom for additional line delays in the final configuration.The Kintex 7 logic cells are organized in slices, with one slice containing four look-up tables and eight flip-flops.Regarding the hardware resources needed, the encoder core with PPL architecture needs at least 289 slices out of about 50,000 slices available.Achieving similar performance with a PPN architecture would require a 512-fold parallelization, which would occupy about ten times more slices.In an implementation for 100 GBit/s with PPN architecture on a former upper-class FPGA a 1280-fold parallel encoder is needed due to the lower clock frequency, occupying about 20 times more resources in terms of look-up tables (Winzer, 2008).
The complete encoder is shown in Fig. 6.First the data bits are coded to the bits c by means of logic functions and buffered in registers.The next stage is the PPL adder that cal-  2) c( 3) c( 4) c( 5) c( 6) c( 7) c( 0) c(0:1) c( 2) c(2:3) c( 4) c(4:5) c( 6 register stage c(0) c( 1) c( 2) c( 3) c( 4) c( 5) c( 6) c( 7) c( 0) c(0:1) c( 2) c(2:3) c( 4) c(4:5) c( 6 The synthesized encoder is tested in combination with a 215 slow reference decoder, with alternating encoding and decoding runs.A BER measurement of the final encoder layout is shown in Fig. 7. Up to a system clock of 500 MHz no bit errors are detected in a sequence of 10 7 bits.As 64 two-bit words are encoded in each cycle a total data rate of 220 64 Gbit/s is achieved.The architecture can be expanded to higher degrees of parallelization, as the encoder only uses a small portion of FPGA resources.For real-time transmission experiments the GTX built-in bit error testers are used.No hardware decoder is needed, as the integrated photonic re-225 ceiver already decodes the data inherently.The received data is checked against the selected PRBS sequence in blocks of 64 bit, i.e. a bit error counter does increment if one or more bits in one block are erroneous.For actual small bit error (BER) rates the measured BER is in the worst case by a fac-230 tor of 64 higher, depending on whether random or burst errors occur.
The serial interfaces are limited to a maximum data rate of 12.5 Gbit/s, therefore two transmitters are used for the I and Q signal, respectively.The signals are interleaved with exter-235 nal multiplexers at the transmitter side and de-interleaved at the receiver side with demultiplexers.For this purpose commercial Hittite ICs are used on printed circuit boards with Taconic high frequency substrate.the bits z.In the last stage the phase bits are translated to the driving signals for IQ-modulator, again using logic functions.
The synthesized encoder is tested in combination with a slow reference decoder, with alternating encoding and decoding runs.A BER measurement of the final encoder layout is shown in Fig. 7. Up to a system clock of 500 MHz no bit errors are detected in a sequence of 10 7 bits.As 64 two-bit words are encoded in each cycle a total data rate of 64 Gbit/s is achieved.The architecture can be expanded to higher degrees of parallelization, as the encoder only uses a small portion of FPGA resources.For real-time transmission experiments the GTX built-in bit error testers are used.No hardware decoder is needed, as the integrated photonic receiver already decodes the data inherently.The received data is checked against the selected PRBS sequence in blocks of 64 bit, i.e. a bit error counter does increment if one or more bits in one block are erroneous.For actual small bit error (BER) rates the measured BER is in the worst case by a factor of 64 higher, depending on whether random or burst errors occur.
The serial interfaces are limited to a maximum data rate of 12.5 Gbit/s, therefore two transmitters are used for the I and Q signal, respectively.The signals are interleaved with external multiplexers at the transmitter side and de-interleaved at A common issue in FPGA-based transmitters with several 240 parallel output channels is the synchronization.After each reset, the interfaces initialize and generally start to transmit the data at a random time, thus data streams are shifted against each other by a multiple of the bit duration.Furthermore, sub-bit delays can occur when the interfaces are clocked with 245 different clock sources.In this design, one clock source is used for all the four GTX transceivers.However, different lengths in the clock paths result in sub-bit delays that have to be compensated externally with time delays.As a common clock source is used, the phase relations do not change 250 even after a reset.Therefore this compensation has to be done only once, whereas the bit-wise synchronization has to be performed after each system reset.The complete transmission system with its main components is shown in Fig. 8, with the FPGA test environment 255 acting as transmitter and receiver.External multiplexers and demultiplexers allow for doubling the symbol rate.The transmitted signals have to be amplified to drive the IQ-modulator and are then sent over an optical fiber link to the integrated receiver.After the opto-electric conversion the data is fed back 260 to the FPGA board where the BER is measured.

Conclusions
This work presents the possibility to use an FPGA as costeffective test environment for high-speed optical transmission systems.A 64-fold parallel real-time DQPSK encoder 265 has been implemented together with electrical interfaces and bit error testers.It works stable up to a clock frequency of about 500 MHz, allowing for 64 Gbit/s operation.The encoder PPL structure works more efficient than the more common PPN architecture and occupies ten times less resources 270 for the encoder core.Even higher data rates can be attained with higher degree of parallelization, external multiplexers and an FPGA board with more electrical GTX interfaces.The external multiplexers are connected to the FPGA by the GTX interfaces and are followed by amplifiers to provide sufficient signal swing to drive the IQ-modulator.The signal is transmitted over a fiber link to the input of the integrated receiver.After the opto-electric conversion in the balanced photodetectors, the signal is de-interlaced by the demultiplexers and fed back to the transceiver by the GTX interfaces.
the receiver side with demultiplexers.For this purpose commercial Hittite ICs are used on printed circuit boards with Taconic high frequency substrate.
A common issue in FPGA-based transmitters with several parallel output channels is the synchronization.After each reset, the interfaces initialize and generally start to transmit the data at a random time, thus data streams are shifted against each other by a multiple of the bit duration.Furthermore, sub-bit delays can occur when the interfaces are clocked with different clock sources.In this design, one clock source is used for all the four GTX transceivers.However, different lengths in the clock paths result in sub-bit delays that have to be compensated externally with time delays.As a common clock source is used, the phase relations do not change even after a reset.Therefore this compensation has to be done only once, whereas the bit-wise synchronization has to be performed after each system reset.
The complete transmission system with its main components is shown in Fig. 8, with the FPGA test environment acting as transmitter and receiver.External multiplexers and demultiplexers allow for doubling the symbol rate.The transmitted signals have to be amplified to drive the IQ-modulator are then over an optical fiber link to the integrated receiver.After the opto-electric conversion the data is fed back to the FPGA board where the BER is measured.

Conclusions
This work presents the possibility to use an FPGA as cost-effective test environment for high-speed optical transmission systems.A 64-fold parallel real-time DQPSK encoder has been implemented together with electrical interfaces and bit error testers.It works stable up to a clock frequency of about 500 MHz, allowing for 64 Gbit/s operation.The encoder PPL structure works more efficient than the more common PPN architecture and occupies ten times less resources for the encoder core.Even higher data rates can be attained with higher degree of parallelization, external multiplexers and an FPGA board with more electrical GTX interfaces.
Edited by: D. Killat Reviewed by: R. Kunkel and two anonymous referees

Fig. 1 .Fig. 2 .
Fig. 1.Left: Part of a DQPSK constellation diagram with initial phase status {d 0 , d 1 } = 10 and corresponding phase transitions for the transmission of the symbol {b 0 , b 1 } to neighboring states.Right: Corresponding representation of phase states and transitions with arithmetic numbers for the modulo-4 based encoder.

Fig. 3 .
Fig. 3. Photograph of an integrated DQPSK receiver in silicon insulator without photodetectors.The footprint is about 250 µ 2000 µm, mainly due to the delay line and the 500 µm lon pers between the grating couplers and single-mode wavegu Smaller footprints can be achieved with focusing grating cou and meander-shaped delay lines.
110 high power dissipation due to the large waveguide length 3 FPGA-based test environment The principal component of the test environment for the tegrated optical receiver is the electrical DQPSK transce where the signals are created and the received signals 115 evaluated.A block diagram of the realized transceiver is show Fig. 4. The principle functional blocks are the DQPSK coder and the so-called GTX interfaces that include p dorandom binary sequence (PRBS) generators, high-sp 120 electrical interfaces and the bit error tester.Further com nents such as an overall control block for selecting the PR sequence or starting and stopping the transmission, a s interface for the communication with a PC, memory a reference decoder complete the transceiver design.A 125 inx Kintex-7 FPGA with 16 high data rate GTX interfa

Figure 1 .
Figure 1.Left: Part of a DQPSK constellation diagram with initial phase status {d 0 , d 1 } = 10 and corresponding phase transitions for the transmission of the symbol {b 0 , b 1 } to neighboring states.Right: Corresponding representation of phase states and transitions with arithmetic numbers for the modulo-4 based encoder.

Fig. 2 .
Fig. 2. diagram of a direct receiver with delay line and 90 •hybrid for DQPSK signals.As the optical signal has to be delayed by one symbol duration T Sym , the length of the delay line depends on the desired data rate.

Figure 2 .Fig. 1 .Fig. 2 .
Figure 2. Block diagram of a direct receiver with delay line and 90 • -hybrid for DQPSK signals.As the optical signal has to be delayed by one symbol duration T Sym , the length of the delay line depends on the desired data rate.2 T. Föhn et al.: 50 Gbit/s real-time test environment for integrated photonic DQPSK receivers

Fig. 3 .
Fig. 3. Photograph of an integrated DQPSK receiver in silicon-oninsulator without photodetectors.The footprint is about 250 µm × 2000 µm, mainly due to the delay line and the 500 µm long tapers between the grating couplers and single-mode waveguides.Smaller footprints can be achieved with focusing grating couplers and meander-shaped delay lines.

Figure 3 .
Figure 3. Photograph of an integrated DQPSK receiver in silicon-on-insulator without photodetectors.The footprint is about 250 µm×2000 µm, mainly due to the delay line and the 500 µm long tapers between the grating couplers and single-mode waveguides.Smaller footprints can be achieved with focusing grating couplers and meander-shaped delay lines.

Fig. 4 .
Fig. 4. Basic structure of the DQPSK transceiver including the basic functional blocks of the FPGA design.The transceiver can be controlled externally over a serial interface.The input and output data is transmitted by the physical GTX interfaces.

Figure 4 .
Figure 4. Basic structure of the DQPSK transceiver including the basic functional blocks of the FPGA design.The transceiver can be controlled externally over a serial interface.The input and output data is transmitted by the physical GTX interfaces.

Fig. 5 .
Fig. 5. Example of an 8-fold parallel prefix network (PPN) adder (left) and parallel prefix layers (PPL) adder (right).2-bit adders are represented by squares.The long feedback path in the PPN architecture over three adder stages is significantly shortened in the PPL architecture.

Fig. 6 .Fig. 7 .
Fig. 6.Complete architecture of the 64-fold parallel PPL DQPSK encoder.The conversion from binary data to arithmetic increments is done in the first stage.Phase differences are added up in the following PPL adder and are converted to I and Q signals in the last stage.Two register stages are inserted for buffering.

Figure 5 .
Figure 5. Example of an 8-fold parallel prefix network (PPN) adder (left) and parallel prefix layers (PPL) adder (right).2-bit adders are represented by squares.The long feedback path in the PPN architecture over three adder stages is significantly shortened in the PPL architecture.

Fig. 6 .Fig. 7 .
Fig. 6.Complete architecture of the 64-fold parallel PPL DQPSK encoder.The conversion from binary data to arithmetic increments is done in the first stage.Phase differences are added up in the following PPL adder and are converted to I and Q signals in the last stage.Two register stages are inserted for buffering.

Figure 6 .
Figure 6.Complete architecture of the 64-fold parallel PPL encoder.The conversion from binary data to arithmetic increments is done in the first stage.Phase differences are added up in the following PPL adder and are converted to I and Q signals in the last stage.Two register stages are inserted for buffering.

Fig. 5 .Fig. 6 .Fig. 7 .
Fig. 5. Example of an 8-fold parallel prefix network (PPN) adder (left) and parallel prefix layers (PPL) adder (right).2-bit adders are represented by squares.The long feedback path in the PPN architecture over three adder stages is significantly shortened in the PPL architecture.

Figure 7 .
Figure 7. Measured bit error rate of the 64-fold parallel DQPSK encoder for different encoder clock frequencies.The BER increases suddenly at about 500 MHz, due to timing issues inside the design.

Fig. 8 .
Fig.8.Block diagram of the transmission system.The external multiplexers are connected to the FPGA by the GTX interfaces and are followed by amplifiers to provide sufficient signal swing to drive the IQ-modulator.The optical signal is transmitted over a fiber link to the input of the integrated receiver.After the opto-electric conversion in the balanced photodetectors, the signal is de-interlaced by the demultiplexers and fed back to the transceiver by the GTX interfaces.

Figure 8 .
Figure 8. Block diagram of the transmission system.The external multiplexers are connected to the FPGA by the GTX interfaces and are followed by amplifiers to provide sufficient signal swing to drive the IQ-modulator.The signal is transmitted over a fiber link to the input of the integrated receiver.After the opto-electric conversion in the balanced photodetectors, the signal is de-interlaced by the demultiplexers and fed back to the transceiver by the GTX interfaces.