Application Modelling and Mapping onto Multiprocessor System-on-Chip Plattforms

MPSoCMap

Application Modelling and Mapping onto Multiprocessor System-on-Chip Plattforms

''Dr.-Ing. Leandro Soares Indrusiak'' ''Prof. Dr. Dr. h.c. mult. Manfred Glesner''

Summary
The amount of embedded systems which are composed by multiple interconnected processing units (PU) is currently increasing, mainly because of two distinct facts: first, the need for spatial distribution of the subsystems in order to cope with the requirements of ubiquitous computing; and second, the limits of CMOS scaling, which also limit how much faster integrated circuits can operate and thus require increased PU parallelism as a compensation in order to deliver the performance improvements expected by the electronic systems market. As the current trend for both facts is unlikely to change anytime soon, it is expected that communication-centric multiprocessor embedded systems will be even more common in the near future. While plenty of research was done in the last decades on general-purpose multiprocessing system (such as symmetric multiprocessor systems and clusters), there is still a lack of languages, methods and techniques supporting the specification, design and optimization of multiprocessor systems that are targeted specifically to a given application (or family of applications). Furthermore, a programming model that would allow developers to deal with the system as a whole (instead of dealing with each processing unit individually) is still mostly an open question.

MPSoCMap aims to deliver a framework supporting the specification and design of application-specific systems based on multiple interconnected processing units. The framework will be based on the joint application of actor-orientation and type systems (materialized through the use of UML diagrams). While taking advantage of state-of-the-art techniques, the project goal is to produce a designer-centric approach that can be easily extended and eventually adopted by industry in a short time frame. A basic programming model will be proposed, based on behavioral patterns, and will be used within the delivered framework to allow the validation of different system properties under realistic application-specific scenarios.

To show the applicability of the proposed approach, the project will apply the proposed programming model on the design systems composed of multiple processing units interconnected through packet switched network, specifically multiprocessor systems based on networks-on-chip. The programming model, together with its modeling and execution framework, will support the analysis of properties which are relevant to this type of systems (e.g. throughput, average and worst case latency, power consumption) and will also evaluate the impact of using heterogeneous and reconfigurable PUs on the overall system performance by exploring polymorphical type systems.

System Specification and Design
The specification of a system comprehends the creation of models that describe its functionality and support a trade-off analysis that can help designers better architect such functionality in order to satisfy specific functional constraints. The system specification activity lies on top of a design flow that allows for the transformation of the specification models into less abstract ones, towards an implementation. Due to the complexity of current hardware/software systems, such design flow is based on a stack of well defined abstraction levels, so one can argue that the system model at a given level is the specification model of the subsequent level. In the scope of this proposal, the concept of system specification is restricted to the most abstract layer of the design flow where the functionality of the system is modeled in an implementation-independent way, as reported in [1] and [2]. Likewise, the design process considered here is restricted to the model transformation methods used to map such functional specifications into implementation-dependent models that are supported by commercially available design automation tools (for instance retargetable compilers or RTL level HDL synthesis tools), so that an implementation of the specified system can be produced. The following paragraphs present research initiatives that share the forementioned concepts of specification and design but address the fundamental problems of modeling, model analysis and model transformation in different ways.

SystemC, which comprehends a set of conceptual extensions to the C++ language and is organized as a class library [3], is currently the most widely accepted system specification language [4]. Its modeling constructs support the description of hardware subsystems at the RT and behavioral level which can be executed together with testbenches and models of the software subsystems implemented in plain C++, thus allowing for higher productivity. Furthermore, significant simulation performance improvements can be achieved in comparison to other HDLs like VHDL and Verilog, as the SystemC simulation model – which is based on a discrete-event semantics – is actually compiled into native machine code before execution. A number of SystemC synthesis tools are already commercially available, but they support only RTL level descriptions and a limited set of behavioral specification constructs. Further shortcomings of the core language, which have been partially addressed on the upcoming version 3.0 [5] as well as in independently developed extensions [6], include the lack of facilities for process control (for example, primitives to suspend, resume, kill, reset processes) which are critical to support the modeling of operating systems and hardware-dependent software.

One way to support the efficient creation and automatic synthesis of SystemC specification models at higher levels is to restrict the modeling style to match standardized models of computation (MoC), so that the concurrent execution of SystemC modules and the communication activities they perform are unambiguously defined. By relying on formalized models of computation, the automatic translation and mapping of specification models into implementation-dependent scheduling, synchronization and communication structures can be made easier. However, current systems are too complex and heterogeneous to be efficiently specified using a single model of computation [7, 8]. For instance, at the earlier stages of the design it may be desirable to use an untimed MoC allowing fully asynchronous communication among subsystems based on unbounded FIFOs, while later when many implementation trade-offs are already resolved it may be necessary to use a cycle-accurate MoC where some communication channels are still asynchronous, but based on finite-sized FIFOs, while other channels may communicate synchronously as they are targeted to be implemented within the same clock domain. Besides the advantages of using different MoCs at different stages of the design flow, there may be cases where the nature of the application may be better suited for a given MoC. A typical example is the design of signal processing subsystems using synchronous dataflow semantics, which allows for implementations that are statically schedulable and that have guaranteed bounded buffers at the inputs of each module. The possibility of creating SystemC specifications using multiple MoCs is the target of state-of-the-art research presented in [9] and [4]. In [9], Patel and Shukla (Virginia Tech, USA) propose additional simulation kernels for the SystemC library, so that SystemC models can be executed according to the semantics of synchronous dataflows (SDF), finite state machines (FSM) and communicating sequential processes (CSP). They also provide extensions to support the interoperability among the proposed extensions and the original discrete-event simulation kernel. The approach presented in [4] by Herrera and Villar (University of Cantabria, Spain) follows a different direction and tries to support multiple MoCs without changing the original SystemC simulation kernel. Instead, they focus on providing an additional library including channel models that support the communication style of different MoCs, like rendesvouz communication for CSP models or channels based on infinite FIFOs for process networks (PN).

Besides SystemC, other emerging executable specification languages also follow the trend of increased abstraction aiming to higher productivity, shorter design cycles and faster simulation runs. Previously regarded only as design visualization interfaces, block-based visual modeling tools are gaining acceptance within industry and academy for the design of dataflow-dominated systems-on-chip in application domains such as digital signal processing (DSP). Examples include design tools such as SystemGenerator from Xilinx, SPW from CoWare, Advanced Design System from Agilent and Synplify DSP from Synplicity, as well as the general purpose simulation platform Simulink from MathWorks. Such tools follow a simple scheme for system modeling, using hierarchical blocks that can be interconnected with other blocks through ports. Rich block libraries are provided, so designers can build complex systems by instantiating and connecting library elements. The underlying simulation semantics of such environments is often transparent to the users, which concentrate mainly on structuring and parametrizing the system. However, as referred before to [7,8], a particular simulation semantics can facilitate or prevent the efficient and accurate modeling of a given system, specially when it comes to model concurrency. To allow for full exploration of different simulation semantics, Lee (UC Berkeley, USA) [10] extended the concept of actor orientation - which was previously coined by Hewitt and refined by Agha (MIT, USA) [11] -and proposed that the simulation semantics should be also part of the model, allowing for complex models using distinct semantics on different subsystems. The components in actor-oriented models are objects which are conceptually concurrent and communicate via message passing (messages are modeled as immutable objects called tokens). A distinct feature of actor-oriented models is the formalization of the concurrent behavior of model components using well defined MoCs, which are implemented within components called directors. While many actor-oriented tools like Simulink or LabVIEW rely on a single MoC, the PtolemyII project [12] has provided a platform for experimenting with heterogeneous models combining different MoCs. It allows the full exploration of actor-based models in domains other than the dataflow-oriented ones mentioned earlier in this paragraph, like in control-flow oriented systems and event-based systems. Furthermore, the support to multiple MoCs in a single system model allows also for the coexistence of different abstraction levels, for instance a physical subsystem that may be represented using continuous time semantics and a digital hardware that controls that physical subsystem, which may be modeled using discrete event semantics. A number of academic and commercial initiatives has explored the features of actor orientation on the design of hardware/software systems. A design flow is presented in [13], relying on the multi-MoC nature of actor orientation to support design based on partial refinement of the system. The presented approach points out that subsystems can be isolated and refined from a high abstraction model, so that they are modeled in more detail and are less abstract, but can still be validated together with the rest of the system through multi-MoC simulation. Several case studies following such flow were performed, such as those reported in [14]. Further academic work was reported on the design of neural network hardware [15] and communication systems [16]. Commercial tool vendors also provide rich actor-oriented modeling environments such as VisualSim [17] or design capabilities to existing environments such as the Xilinx System Generator [18] plug-in for the Mathworks Simulink simulator.

Unlike the previous approaches, the Unified Modeling Language (UML) is not originally an executable/simulatable specification language. It comprehends a family of visual notations [19] derived from a number of previously existing modeling languages from the software engineering domain [20]. Due to the unification of different modeling approaches, which include several structural and behavioral views, the language achieved an expressiveness level that allowed its usage in application domains other than software design. Furthermore, UML was designed to be extensible through stereotypes and tagged values (which can be grouped in domain-specific profiles) so that it can be adapted to unforeseen modeling possibilities.

Most of the state-of-the-art approaches using UML as a hardware/software system-level specification language address either static analysis or code generation methods. Oliveira et al [21] use static analysis to evaluate performance, memory footprint and power consumption of alternative embedded software behavioral patterns modeled using UML sequence diagrams. In [22], UML class diagrams are used as templates for design space delimitation and exploration. Different alternatives for code generation techniques based on UML diagrams can be found in a number of research initiatives (mainly driven by industry). Many of them advocate for the joint usage of UML and SystemC. In [23], a team of researchers from University of Catania and ST Microelectronics stated that UML should be seen as high level modeling language and SystemC as a low level system language. In order to allow the interoperability of both languages, they proposed a set of stereotypes that allow the modeling of SystemC concepts using UML diagrams. The stereotypes were grouped together in the so-called UML 2.0 profile for SystemC. Similar approaches – with different strategies on the definition of stereotypes – were presented by researchers from Politecnico de Milano and Siemens ICM [24] and Fujitsu [25]. In all three cases, the stereotypes were created to support to code generators, which should be able to generate SystemC code out of UML models. Code generation is also explored in [26] and [27]. The former approach targets the automatic generation of component wrappers according to UML models, while the later, developed at C-Lab/Paderborn University, addresses modeling and code generation targeting a subset of the C programming language that be synthesized as a configuration of a FPGA-based execution platform.

Aiming to better fit the current practice in systems design, which allows specification models to be simulated and profiled over time, Mellor and Balcer (Mentor Graphics) [28] introduced executable UML (xUML). It adopts only three of the constructs from the UML 2.0 specification: class diagram, statecharts and actions, materialized through a non-standardized action language. By relying on those constructs, developers should be able to build models that can be automatically compiled into executable code tailored to a given execution platform. The availability of model compilers following this approach is still small, which restricts the adoption of xUML in different domains. Furthermore, the problem of co-simulating xUML models with legacy code or other system models is still an open issue, preventing advanced application within a system-level design flow as system designers very often depend on the reuse of previously developed intellectual property cores.

Multi-Processor Systems-on-Chip (MPSoC)
The ever increasing functionality density in SoC platforms currently allows designers to create multiprocessing systems on a single chip. Multiprocessing systems demand an optimized communication architecture in order to ensure different levels of quality-of-service (QoS) according to the application demands [35]. Besides that, constraints such as performance, latency and power must be observed and requirements such as reliability, fault tolerance, correctness (data ordering) and completion (no data loss) must be complied. Several research groups have reported the application of Networks on Chip (NoC) as a promising approach to the problem of interconnecting the heterogeneous set of components in multiprocessor systems [36]. As a simplistic definition, a NoC is a network-like structure where data packets are routed through nodes, allowing the system components connected to those nodes to communicate asynchronously in a message-passing fashion. Such architecture is suitable to the design of multiprocessor systems because of its potential to tackle quality-of-service issues by allowing the definition of protocol-based communication between processing elements. Furthermore, NoCs have a high potential of reusability and scalability, which are particularly important features on the design of product families integrating processor cores, memory, reconfigurable logic, DSP cores and application-specific logic. Finally, on-chip networks enforce communication on a local basis, effectively implementing Globally Asynchronous, Locally Synchronous (GALS) communication schemes, which can reduce clock distribution problems which are common in deep submicron technologies [37].

The application of communication-centric architectures such as NoCs incurs on a number of challenges regarding the programming model. Some of the open questions addressed by state-of-the-art research include how to partition the application among multiple processing units, how to share data among them [38] and which models of computation are adequate to deal with the latency imposed by the packet-switched communication [39]. While similar problems were addressed by the distributed and parallel computing community, the strict efficiency requirements and the possibility to explore application-specific trade-offs put the problems under a different light. As pointed out by Jerraya et al (TIMA Grenoble, France) in [40], parallel computing solutions such as the Message Passing Interface (MPI) abstract the hardware/software interfaces with a standard API, allowing for the software components to be developed targeting the API and thus being portable to a number of hardware platforms that implement the API. In such cases, the overall system performance can be only assessed after the hardware/software integration and depends on the particular implementation of the API on the chosen hardware platform. On the other hand, they state that application-specific systems based on on-chip multiprocessor solutions need a better matching between application and platform, requiring customized hardware/software interfaces that provide access to specific hardware features that can ensure that the requirements on performance are met for that particular application.

System-level design based on UML and Actor-orientation
The previous work done by the system-level design group at the MES Institute was the basis for the identification of the needs, techniques and requirements that will be addressed by the proposed project. The group was created to address design techniques for SoCs and realized that the current and state-of-the-art design flows had shortcomings when addressing critical issues like the heterogeneity of the modules integrated on a single chip and the need to validate the system using realistic application scenarios. The most evident approach to overcome such shortcomings is to support a higher level of abstraction for the system specification, as this is important to tackle the complexity of the system being designed and is specially critical to properly model the application scenario in which the system will be deployed. A second approach investigated by the group aimed to support the heterogeneity in the system specification and design: complex SoC must integrate different subsystems that are better described using different languages and tools. More importantly, such languages and tools do not always share the same notions of modularization, composition, time, concurrency, etc. Thus, a novel design flow had to be introduced in order to support heterogeneous design specification at higher levels, keeping in mind that such flow must be integrated with current flows for hardware synthesis and embedded software compilation. A number of case studies were performed in order to identify, validate and refine such flow. The first two addressed SoCs in the telecommunications domain. The first was based on an actor-oriented specification within Ptolemy II and co-simulation with a HDL simulator for the design and analysis of a WCDMA receiver [IPG2005]. As the system model was implemented in a high level of abstraction, a detailed analysis could be done regarding the different usage modes for an equalizer according to the different wireless channel conditions (Doppler effects, multi-user interference, etc.) By extending Ptolemy II to support HDL co-simulation, the actual cycle-accurate RTL-level implementation of the receiver could be validated together with the same system model and the same kind of analysis could be done [RP2006]. The second case study concentrated on the usage of commercial tools, aiming to design a reconfigurable noise generator by using Mathworks Simulink and Xilinx System Generator tools [ESJ2005]. In this case, the facilities for model transformations were already available, so the design flow was much simpler, but on the other hand Simulink provides a single model of time and concurrency, rendering this approach suitable for homogeneous systems only. As a complement to the first two case studies, another extension to Ptolemy II was added to support the hardware-in-the-loop simulation of subsystems prototyped in FPGA platforms [JO2005]. The experiences obtained from the reported case studies were then collected and organized as a well defined – though still experimental – design flow [LSI2006] that allows the successive refinement of an actor-oriented model of the complete system into a final implementation in RTL level HDL, which in turn can be validated together with testbenches (also modeled as actors) through co-simulation or emulated within FPGA platforms. Based on the identified design flow, a number of additional experiments and improvements were performed. [JCP2006] applied the flow to support the analysis of coding techniques aiming to reduce power consumption in network-on-chip architectures. In [HZ2006], the flow was used to investigate out-of-order execution in microprocessors containing one or more reconfigurable functional units. To simplify the transition from actor-based models into cycle-accurate HDL models, which had to be done by hand in some of the previous case studies, a first step on implementing code generation techniques was taken on [FM2006]. Concurrently to the development of the reported actor-based design flow, the MES system-level design group also investigated the possibilities of using UML to support the increase of abstraction on the system. UML was being considered by many industry and academic researchers as a promising SoC specification language, and the group already had previous experience by using UML for internal software development and for SoC design space exploration [IGK2004]. The first case studies produced mixed results: while the expressiveness of UML supported well the modeling of multiple aspects of complex SoC systems, its lack of execution/simulation semantics rendered it difficult to integrate within a simulation-based design flow (current research reviewed in section 2.1.1 relies on code generation out of UML models in order to support functional validation). Furthermore, the adoption of UML by SoC designers used to validate functionality through simulation is a potential problem that was already hinted by the performed case studies. While UML lacks execution semantics, actor orientation has the coexistence of multiple execution semantics as its major feature. It is then natural to foresee a joint approach of such techniques, so that the shortcomings of one of them are compensated by the strengths of the other. Furthermore, another reason to combine both approaches is the fact that in many companies there has been the need for integrating different departments in order to cope with the complexity of the design of state-of-the-art systems. In such cases, as it is nowadays the rule in systems design, the new designs will reuse a large quantity of previously developed solutions, and in such integrated departments a number of solutions may be implemented in UML while others follow the actor oriented paradigm. Keeping that in mind, some initial research was performed on the possibilities of joint usage of UML and actor orientation. The encapsulation of UML sequence diagrams as actors – which is one of the basic ideas behind the framework to be developed within the proposed project – was explored in [AT2006]. Early results in a case study with a UML sequence diagram modeling the Chandra-Toueg distributed consensus algorithm [63] within a distributed sensing application modeled as a set of actors were already reported in [ITG2007]. Furthermore, the initial ideas, results and potential of using UML, actor orientation and the combination of both in system-level design were already presented as invited talks in events like NORCHIP 2005 [LSI2005], SoC 2006 [LSI2006a] and BEC 2006 [LSI2006b], which may be an indicator of the potential, relevance and need of such techniques in the industry and academic circles.

NoC-based MPSoCs


The design space of MPSoC architectures based on NoCs is very large, allowing designers to customize and optimize a large number of aspects such as buffering schemes, routing, data packet size and format, data coding, transmission modes and QoS strategies. Most of the research activities done within the MES Institute address ways to explore such design space and validate possible solutions under different application requirements, as the ones reported in [IGK2004] for routing and arbitration and in [ZZHG2005] for buffering.

A number of achievements were done in the design and validation of irregular NoC platforms. Such platforms are needed when the processing cores that are needed by a given application cannot be placed evenly in a mesh-like structure. [HLM2003] presented a general outline of such a platform, together with a strategy to support the testing of the processing cores by reusing the NoC infrastructure. In [SHZG2005], novel routing and placement algorithms were explored aiming to support deadlock-free communication among processing cores in such irregular platforms.

Besides the inherent design complexity, one of the major factors preventing the wide adoption of NoC-based architectures is the large power consumption overhead they cause. Relying on previous research that pointed out the correlation between the signal transition activity and the power consumption in interconnects [GO2004], a number of case studies assessed the potential of reducing the power consumption by coding the data packets sent across the network so that the transition activity on the interconnects is minimized [PIM2006a]. Promising results were found for very simple coding schemes, because the power consumption increase due to coders and decoders was sufficiently low to be compensated by the savings granted by the reduction of the transition activity [PIM2006b]. The effectiveness of such schemes tend to grow bigger as networks increase in size (the more hops a packet has to do to achieve its destination, the larger are the power savings per hop because the coding/decoding is done only on the first and last switches) and as the dimensions of the interconnects shrink (because the capacitances of neighboring wires increase).

Further activities include the close cooperation with two internationally active groups in this area: LIRMM, University of Montpellier II (France) and the GAPH group at the Catholic University of Rio Grande do Sul (Brazil). The cooperation with LIRMM is partially funded by the DFG within an international Graduate School and by DAAD within German-French cooperation projects, while the cooperation with GAPH is funded by DAAD/CNPq scholarships. The tri-national cooperation addresses regular mesh-like networks and most of the development is done on top of the HERMES platform initially developed by the GAPH group and currently extended by all partners. For instance, the power reduction approach of [PIM2006a,b] was already ported to that platform successfully. Current and future work, which are very relevant to this proposal, include the hardware infrastructure to support migration of software and hardware tasks across processing nodes interconnected by a NoC. To do that, a combined solution for an on-chip distributed operating system and for software-controlled node reconfiguration is being pursued. Early results were already shown in [SZA2006] and in some of the presentations of the ReCoSoC workshop series [GS2005,GS2006].

Publications
[AT2006]	A. Thuy, L. S. Indrusiak, and M. Glesner. Applying Communication Patterns to Actor-Oriented Models with UML Sequence Diagrams. In: ECSI Forum on Design Languages (FDL), 2006.

[EO2006]	E. Ochirsuren. Programmability support in a LEON2-based wireless sensor network node. Master Thesis, International Master Program in Information and Communication Engineering, TU Darmstadt, 2006.

[ESJ2005]	E. C. D. Silva Junior, L. S. Indrusiak, and M. Glesner. Non-Linear Addressing Scheme for a Lookup-Based Transformation Function in a Reconfigurable Noise Generator. In: Symp. on Integrated Circuits and Systems Design, 2005, ACM Press, p. 242-247.

[FM2006]	F. Markert. Unterstützung paralleler Befehlausführung in rekonfigurierbarer Hardware durch die Verwendung codegenerierender Actor-Bibliotheken. Diplomarbeit, Elektrotechnik und Informationstechnik, TU Darmstadt, 2006.

[GHIZ2004]	M. Glesner, T. Hollstein, L. S. Indrusiak, P. Zipf, T. Pionteck, M. Petrov, H. Zimmer, and T. Murgan. Reconfigurable Platforms for Ubiquitous Computing. In: Proc. ACM Conf. on Computing Frontiers, 2004, p. 377-389.

[GO2004]	A. Garcia Ortiz. Stochastic Data Models for Power Estimation at High Levels of Abstraction. Doctoral Dissertation. Shaker Verlag, 2004.

[GS2005]	G. Sassatelli, M. Glesner, L. Torres, L. S. Indrusiak, T. Hollstein (Eds.). Proc. 1st Int. Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2005.

[GS2006] 	G. Sassatelli, L. S. Indrusiak, M. Glesner, L. Torres (Eds.). Proc. 2nd Int. Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2006.

[HH2005]	H. Hinkelmann, T. Pionteck, O. Kleine, and M. Glesner. Prozessorintegration und Speicheranbindung dynamisch rekonfigurierbarer Funktionseinheiten. In: 18th Int. Conf. Architecture of Computing Systems (ARCS 2005), Workshop on Dynamically Reconfigurable Systems, Innsbruck, 2005.

[HH2006a]	H. Hinkelmann, P. Zipf, and M. Glesner. Design Concepts for a Dynamically Reconfigurable Wireless Sensor Node. In: Proc. NASA/ESA Conf. on Adaptive Hardware and Systems (AHS 2006), Istanbul, 2006.

[HH2006b]	H. Hinkelmann, A. Gunberg, P. Zipf, L. S. Indrusiak, and M. Glesner. Multitasking Support for Dynamically Reconfigurable Systems. In: Proc.16th Int. Conf. on Field Programmable Logic and Applications (FPL 2006), Madrid, 2006.

[HLM2003]	T. Hollstein, R. Ludewig, C. Mager, P. Zipf, and M. Glesner. A hierarchical generic approach for on-chip communication, testing and debugging of SoCs. In: Proc. IFIP Int. Conf. on VLSI-SoC, 2003. p. 44–49.

[HZ2006]	H. Zhong, L. S. Indrusiak, H. Hinkelmann, and M. Glesner. Exploring Functional Unit Parallelism in Reconfigurable Computing Platforms. In: Int. Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2006, p. 160-167.

[IGK2004]	L. S. Indrusiak, M. Glesner, M. E. Kreutz, A. A. Susin, and R. A. L. Reis. UML-Driven Design Space Delimitation and Exploration: A Case Study on Networks-on-Chip. In: Proc. IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, 2004, p. 5-12.

[IPG2005]	L. S. Indrusiak, R. B. Prudencio, and M. Glesner. Modeling and Prototyping of Communication Systems using Java: a Case Study. In: IEEE Int. Workshop on Rapid System Prototyping, 2005, p. 225-231.

[ITG2007]	L. S. Indrusiak, A. Thuy, and M. Glesner. Executable system-level specification models containing UML-based behavioral patterns. In: IEEE/ACM Design Automation and Test in Europe (DATE), 2007. (to appear)

[JB2001]	J. Becker and M. Glesner. A Parallel Dynamically Reconfigurable Architecture for Flexible Application-Tailored Hardware/Software Systems in Future Mobile Communication, Journal of Supercomputing, Kluwer Academic Publishers, 2001.

[JCP2006]	J. C. S. Palma, L. S. Indrusiak, F. G. Moraes, A. Garcia Ortiz, M. Glesner, R. A. L. Reis. Adaptive Coding in Networks-on-Chip: Transition Activity Reduction Versus Power Overhead of the Codec Circuitry. Lecture Notes in Computer Science, v. 4148, p. 603-613, 2006.

[JO2005]	D. F. Jimenez Orostegui, L. S. Indrusiak, and M. Glesner. Proxy-based Integration of Reconfigurable Hardware within Simulation Environments. In: IEEE Int. Conf. on Microelectronic Systems Educ., 2005, p. 59-60.

[LSI2005]	L. S. Indrusiak. A Pragmatic Perspective on UML for System-on-Chip Design. In: 23rd IEEE Norchip Conference, Finland, 2005, p. 169-171.

[LSI2006]	L. S. Indrusiak and M. Glesner. An Actor-Oriented Model-Based Design Flow for Systems-on-Chip. In: Tagungsband des Dagstuhl-Workshops Modellbasierte Entwicklung eingebetteter Systeme II (MBEES, Schloß Dagstuhl, Germany), 2006. p. 65-73.

[LSI2006a]	L. S. Indrusiak. Exploring Application-level Concurrency in SoC Design. In: International Symposium on System-on-Chip, 2006, Tampere, Finland. Piscataway: IEEE, 2006. p. 69-72.

[LSI2006b]	L. S. Indrusiak and M. Glesner. SoC Specification using UML and Actor-Oriented Modeling. In: International Baltic Electronics Conference, 2006, Tallinn, Estonia. Proceedings. Piscataway: IEEE, 2006. p. 31-36.

[MH2005]	M. N. Huda. Real-time Operating System Support for LEON based Reconfigurable Hardware. Master Thesis, International Master Program in Information and Communication Engineering, TU Darmstadt, 2005.

[MPM2004a]	T. Murgan, M. Petrov, M. Majer, P. Zipf, M. Glesner, and U. Heinkel. Flexible Overhead Processing Architectures for G.709 Optical Transport Networks. In: GI/ITG/GMM Workshop on "Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen", 2004.

[MPM2004a]	T. Murgan, M. Petrov, M. Majer, P. Zipf, M. Glesner, U. Heinkel, J. Pleickhardt, and B. Bleisteiner. Adaptive Architectures for an OTN Processor: Reducing Design Costs Through Reconfigurability and Multiprocessing. In: Proc. ACM Conf. on Computing Frontiers, 2004.

[PIM2006a]	J. C. S. Palma, L. S. Indrusiak, F. G. Moraes, A. Garcia Ortiz, M. Glesner and R. A. L. Reis. Evaluating the Impact of Data Encoding Techniques on the Power Consumption in Networks-on-Chip. In: IEEE Comp. Soc. Annual Symposium on VLSI (ISVLSI), 2006. p. 426-427.

[PIM2006b]	J. C. S. Palma, L. S. Indrusiak, F. G. Moraes, A. Garcia Ortiz, M. Glesner and R. A. L. Reis. Adaptive Coding in Networks-on-Chip: Transition Activity Reduction Versus Power Overhead of the Codec Circuitry. In: Int. Workshop on Power And Timing Modeling Optimization and Simulation (PATMOS), 2006. Lecture Notes on Computer Science 4148, Springer. p. 603-613.

[RP2006]	R. B. Prudencio, L. S. Indrusiak, and M. Glesner. An efficient hardware implementation of a self-adaptable equalizer for WCDMA downlink UMTS standard. In: Proc. IEEE CS Annual Symp. on VLSI (ISVLSI), 2006. p. 77-81.

[SHZG2005]	M. K. F. Schafer, T. Hollstein, H. Zimmer, and M. Glesner. Deadlock-free routing and component placement for irregular mesh-based networks-on-chip. In: IEEE/ACM Int. Conf. Comp. Aided Design (ICCAD), 2005. p. 238-245.

[SIG2006]	C. Spies, L. S. Indrusiak, and M. Glesner. Comparative Analysis of Multitask Scheduling Algorithms for Reconfigurable Computing regarding Context Switches and Configuration Cache Usage. In: Proc. 3rd Southern Conf. on Programmable Logic (SPL), Mar del Plata, 2007.

[SZA2006]	S. Z. Ahmed. Automatic Placement and Routing on Self Reconfigurable Coarse Grained Reconfigurable Architectures. Master Thesis, International Master Program in Information and Communication Engineering, TU Darmstadt, 2006.

[ZZHG2005]	H. Zimmer, S. Zink, T. Hollstein, and M. Glesner. Buffer-Architecture Exploration for Routers in a Hierarchical Network-on-Chip. In: IEEE Int. Parallel and Distributed Processing Symposium (IPDPS) - Reconfigurable Architecture Workshop, 2005.