AV日韩欧观看,www.573096.cn,国产精彩刺激真实伦对白

新聞中心

這里有您想知道的互聯(lián)網(wǎng)營銷解決方案

淺嘗.NET 4并行計算效率已大幅提升

隨著Visual Studio 2010正式版的發(fā)布，我們已經(jīng)可以用上.NET 4的所有功能。那么對于并行計算的嘗試，是本文的重點。向您推薦《F#函數(shù)式編程語言》，以便于您全方位了解.NET 4中不同的部分。

我們都知道CPU的性能至關(guān)重要，但主頻已經(jīng)越來越難以提升，縱向發(fā)展受限的情況下，橫向發(fā)展成為必然——核心數(shù)開始越來越多。然而多核心的利用、并行計算一直是編程中的難題，大的不說，就說代碼的編寫，程序員大多都有過痛苦的經(jīng)歷：多線程的程序代碼量大，編寫復(fù)雜，容易出錯，并且實際運行效率是否理想也較難保證。

為改善這種狀況，.NET 4.0中引入了 TPL(任務(wù)并行庫)，關(guān)于TPL，MSDN的簡介是：

任務(wù)并行庫 (TPL) 的設(shè)計是為了能更簡單地編寫可自動使用多處理器的托管代碼。使用該庫，您可以非常方便地用現(xiàn)有序列代碼表達潛在并行性，這樣序列代碼中公開的并行任務(wù)將會在所有可用的處理器上同時運行。通常這會大大提高速度。

簡而言之，TPL提供了一系列的類庫，可以使編寫并行運算的代碼更簡單和方便。

說起來很簡單，我們來看點例子：

 
 
 
   
  
  
  void ThreadpoolMatrixMult(int size, double[,] m1, double[,] m2,     
  
  
       double[,] result)    
  
  
   {    
  
  
     int N = size;                               
  
  
  int P = 2 * Environment.ProcessorCount; // assume twice the procs for     
  
  
                                         // good work distribution    
  
  
     int Chunk = N / P;                  // size of a work chunk    
  
  
     AutoResetEvent signal = new AutoResetEvent(false);     
  
  
     int counter = P;                      // use a counter to reduce     
  
  
                                          // kernel transitions        
  
  
    for (int c = 0; c < P; c++) {         // for each chunk    
  
  
      ThreadPool.QueueUserWorkItem(delegate(Object o)    
  
  
      {    
  
  
        int lc = (int)o;    
  
  
    for (int i = lc * Chunk;           // iterate through a work chunk    
  
  
       i < (lc + 1 == P ? N : (lc + 1) * Chunk); // respect upper     
  
  
                                                 // bound    
  
  
             i++) {    
  
  
          // original inner loop body    
  
  
          for (int j = 0; j < size; j++) {    
  
  
            result[i, j] = 0;    
  
  
            for (int k = 0; k < size; k++) {    
  
  
              result[i, j] += m1[i, k] * m2[k, j];    
  
  
            }    
  
  
          }    
  
  
        }    
  
  
     if (Interlocked.Decrement(ref counter) == 0) { // use efficient     
  
  
                                                  // interlocked     
  
  
                                                 // instructions          
  
  
          signal.Set();  // and kernel transition only when done    
  
  
        }    
  
  
      }, c);     
  
  
    }    
  
  
    signal.WaitOne();    
  
  
  }

很眼熟但同時看著也很心煩的代碼吧。在換用TPL后，上面的代碼就可以簡化為：

   
 
 
     
  
  
  void ParMatrixMult(int size, double[,] m1, double[,] m2, double[,] result)      
  
  
   {      
  
  
     Parallel.For( 0, size, delegate(int i) {      
  
  
       for (int j = 0; j < size; j++) {      
  
  
        result[i, j] = 0;      
  
  
         for (int k = 0; k < size; k++) {      
  
  
           result[i, j] += m1[i, k] * m2[k, j];      
  
  
         }      
  
  
       }      
  
  
    });      
  
  
  }

舒服多了吧？具體的內(nèi)容請見MSDN的文章優(yōu)化多核計算機的托管代碼。

裝好正式版的VS2010以后，寫了段代碼來測試下，TPL究竟好不好用。

代碼很簡單，拿一條字符串和一堆字符串里的每一條分別用LevenshteinDistance算法做字符串相似程度比對。先用傳統(tǒng)的順序執(zhí)行的代碼跑一遍，記錄下時間；再換用TPL的并行代碼跑一遍，記錄下時間。然后比對兩次運行的時間差異。

 
 
 
   
  
  
  using System;    
  
  
    using System.Collections.Generic;    
  
  
    using System.Linq;    
  
  
    using System.Text;    
  
  
    using System.Threading.Tasks;    
  
  
    using System.Diagnostics;    
  
  
        
  
  
    namespace ParallelLevenshteinDistance    
  
  
    {    
  
  
       class Program    
  
  
       {    
  
  
           static void Main(string[] args)    
  
  
           {    
  
  
               Stopwatch sw;    
  
  
       
  
  
               int length;    
  
  
               int count;    
  
  
               string[] strlist;    
  
  
               int[] steps;    
  
  
               string comparestring;    
  
  
       
  
  
               Console.WriteLine("Input string lenth:");    
  
  
               length = int.Parse(Console.ReadLine());    
  
  
       
  
  
               Console.WriteLine("Input string list count:");    
  
  
               count = int.Parse(Console.ReadLine());    
  
  
       
  
  
               comparestring = GenerateRandomString(length);    
  
  
               strlist = new string[count];    
  
  
               steps = new int[count];    
  
  
       
  
  
               // prepare string[] for comparison    
  
  
               Parallel.For(0, count, delegate(int i)    
  
  
               {    
  
  
                   strlist[i] = GenerateRandomString(length);    
  
  
               });    
  
  
       
  
  
               Console.WriteLine("{0}Computing...{0}", Environment.NewLine);    
  
  
       
  
  
               // sequential comparison    
  
  
               sw = Stopwatch.StartNew();    
  
  
               for (int i = 0; i < count; i++)    
  
  
               {    
  
  
                   steps[i] = LevenshteinDistance(comparestring, strlist[i]);    
  
  
               }    
  
  
               sw.Stop();    
  
  
               Console.WriteLine("[Sequential] Elapsed:");    
  
  
               Console.WriteLine(sw.Elapsed.ToString());    
  
  
       
  
  
               // parallel comparison    
  
  
               sw = Stopwatch.StartNew();    
  
  
               Parallel.For(0, count, delegate(int i)    
  
  
               {    
  
  
                   steps[i] = LevenshteinDistance(comparestring, strlist[i]);    
  
  
               });    
  
  
               sw.Stop();    
  
  
               Console.WriteLine("[Parallel] Elapsed:");    
  
  
               Console.WriteLine(sw.Elapsed.ToString());    
  
  
                               
  
  
               Console.ReadLine();    
  
  
           }    
  
  
       
  
  
           private static string GenerateRandomString(int length)    
  
  
           {    
  
  
               Random r = new Random((int)DateTime.Now.Ticks);    
  
  
               StringBuilder sb = new StringBuilder(length);    
  
  
               for (int i = 0; i < length; i++)    
  
  
               {    
  
  
                  int c = r.Next(97, 123);    
  
  
                   sb.Append(Char.ConvertFromUtf32(c));    
  
  
               }    
  
  
               return sb.ToString();    
  
  
           }    
  
  
       
  
  
           private static int LevenshteinDistance(string str1, string str2)    
  
  
           {    
  
  
               int[,] scratchDistanceMatrix = new int[str1.Length + 1, str2.Length + 1];    
  
  
               // distance matrix contains one extra row and column for the seed values                
  
  
               for (int i = 0; i <= str1.Length; i++) scratchDistanceMatrix[i, 0] = i;    
  
  
               for (int j = 0; j <= str2.Length; j++) scratchDistanceMatrix[0, j] = j;    
  
  
       
  
  
               for (int i = 1; i <= str1.Length; i++)    
  
  
               {    
  
  
                   int str1Index = i - 1;    
  
  
                   for (int j = 1; j <= str2.Length; j++)    
  
  
                   {    
  
  
                       int str2Index = j - 1;    
  
  
                       var cost = (str1[str1Index] == str2[str2Index]) ? 0 : 1;    
  
  
       
  
  
                       int deletion = (i == 0) ? 1 : scratchDistanceMatrix[i - 1, j] + 1;    
  
  
                       int insertion = (j == 0) ? 1 : scratchDistanceMatrix[i, j - 1] + 1;    
  
  
                 int substitution = (i == 0 || j == 0) ? cost : scratchDistanceMatrix[i - 1, j - 1] + cost;    
  
  
       
  
  
                       scratchDistanceMatrix[i, j] = Math.Min(Math.Min(deletion, insertion), substitution);    
  
  
       
  
  
                       // Check for Transposition    
  
  
    if (i > 1 && j > 1 && (str1[str1Index] == str2[str2Index - 1]) && (str1[str1Index - 1] == str2[str2Index]))    
  
  
                       {    
  
  
  scratchDistanceMatrix[i, j] = Math.Min(scratchDistanceMatrix[i, j], scratchDistanceMatrix[i - 2, j - 2] + cost);    
  
  
                      }    
  
  
                  }    
  
  
              }    
  
  
     
  
  
              // Levenshtein distance is the bottom right element    
  
  
              return scratchDistanceMatrix[str1.Length, str2.Length];    
  
  
          }    
  
  
     
  
  
      }    
  
  
  }

這里只用了最簡單的 Parallel.For 方法，代碼很簡單和隨意，但是看看效果還是可以的。

測試機找了不少，喜歡硬件的朋友興許也能找到你感興趣的:P

Intel Core i7 920 (4物理核心8邏輯核心，2.66G) + DDR3 1600 @ 7-7-7-24

AMD Athlon II X4 630 (4物理核心，2.8G) + DDR3 1600 @ 8-8-8-24

AMD Athlon II X2 240 (2物理核心，2.8G) + DDR2 667

Intel Core E5300 (2物理核心，2.33G) + DDR2 667

Intel Atom N270 (1物理核心2邏輯核心，1.6G) + DDR2 667

還在VM workstation里跑過，分別VM了XP和WIN7，都跑在上述i7的機器里，各自分配了2個核心。

程序設(shè)置每個字符串長1000個字符，共1000條字符串。

每個機器上程序都跑了3遍，取平均成績，得到下表：

CPU	Core	Time_Sequential(s)	Time_Parallel(s)	S/P(%)
Intel Core i7 920	4 Cores, 8 Threads, 2.6G	55.132634	14.645687	376.44%
AMD AthlonII X4 630	4 Cores, 4 Threads, 2.8G	58.10592	17.152494	338.76%
AMD AthlonII X2 240	2 Cores, 2 Threads, 2.8G	66.159735	32.293972	204.87%
Intel E5300	2 Cores, 2 Threads, 2.3G	70.827157	38.50654	183.94%
Intel Atom N270	1 Cores, 2 Threads, 1.6G	208.47852	157.27869	132.55%
VMWin7(2 logic core)	2 Cores, 2 Threads	56.965068	33.069084	172.26%
VMXP(2 logic core)	2 Cores, 2 Threads	59.799399	35.35805	169.13%

可見，在多核心處理器上，并行計算的執(zhí)行速度都得到了大幅提升，即便是在單核心超線程出2個邏輯核的Atom N270上亦縮短了32.55%的運行時間。在A240上并行計算的效率竟然是順序計算的204.87% ？！而同樣是4核心，i7 920在超線程的幫助下，并行執(zhí)行效率提升明顯高過A630。最后VM里的測試，是否也可以在某種程度上佐證在多核心的調(diào)度上，Win7要強過XP呢（純猜測）？順帶可以看到，同樣是i7的硬件環(huán)境，單線程宿主OS(Win7)里執(zhí)行花費55.133秒，VM(Win7)里56.965秒，速度上有約3%的差異。

另外，針對性能較強的i7處理器，加大程序中的2個變量后再做測試，并行執(zhí)行的效率比得到了進一步的提升。應(yīng)該是因為創(chuàng)建/管理/銷毀多線程的開銷被進一步的攤平的緣故。例如在每字符串2000個字符，共2000條字符串的情況下，順序執(zhí)行和并行執(zhí)行的時間分別是07:20.9679066和01:47.7059225，消耗時間比達到了409.42%。

來幾張截圖：

從截圖中可以發(fā)現(xiàn)，這段測試程序在順序執(zhí)行的部分，內(nèi)存占用相對平穩(wěn)，CPU則大部分核心處在比較空閑的狀態(tài)。到了并行執(zhí)行部分，所有核心都如預(yù)期般被調(diào)動起來，同時內(nèi)存占用開始出現(xiàn)明顯波動。附圖是在每字符串2000個字符，共2000條字符串的情況下得到的。

這里只是非常局部和簡單的一個測試，目的是希望能帶來一個直觀的體驗。微軟已經(jīng)提供了一組不錯的例子可供參考 Samples for Parallel Programming with the .NET Framework 4

文章名稱：淺嘗.NET 4并行計算效率已大幅提升
本文網(wǎng)址：http://www.dlmjj.cn/article/cocicgs.html

日本综合一区二区|亚洲中文天堂综合|日韩欧美自拍一区|男女精品天堂一区|欧美自拍第6页亚洲成人精品一区|亚洲黄色天堂一区二区成人|超碰91偷拍第一页|日韩av夜夜嗨中文字幕|久久蜜综合视频官网|精美人妻一区二区三区

新聞中心

其他資訊