「マイクロソフト系技術情報 Wiki」は、「Open棟梁Project」,「OSSコンソーシアム .NET開発基盤部会」によって運営されています。
以下のチュートリアルを実施してみた。
Get started in 10 minutes
≒ .NET for Apache Spark 101-α
≒ .NET for Apache Spark ガイド-α
≠ github.com...README.md#get-started
C:\prog\dev\spark\spark-2.4.1-bin-hadoop2.7
C:\prog\dev\spark\Microsoft.Spark.Worker-1.0.0
C:\prog\dev\spark\spark-2.4.1-bin-hadoop2.7\bin
C:\prog\dev\spark\spark-3.0.1-bin-hadoop2.7
C:\prog\dev\spark\Microsoft.Spark.Worker-2.0.0
application-jar>のバージョンを上げておく。
microsoft-spark-2-4_2.11-1.0.0.jar → microsoft-spark-3-0_2.12-2.0.0.jar
setx /M HADOOP_HOME C:\prog\dev\spark\spark-2.4.1-bin-hadoop2.7\ setx /M SPARK_HOME C:\prog\dev\spark\spark-2.4.1-bin-hadoop2.7\ setx /M PATH "%PATH%;%HADOOP_HOME%;%SPARK_HOME%bin"※ 注意:\binとすると、spark-3.0.1-bin-hadoop2.7\\binとなって動かない。
>spark-submit --version Welcome to / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.1 /_/ Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_201 Branch Compiled by user on 2019-03-26T22:44:44Z Revision Url Type --help for more information.
setx /M DOTNET_WORKER_DIR C:\prog\dev\spark\Microsoft.Spark.Worker-1.0.0
\MySparkApp\bin\Debug\netcoreapp3.1>spark-submit ^ --class org.apache.spark.deploy.dotnet.DotnetRunner ^ --master local ^ microsoft-spark-2-4_2.11-1.0.0.jar ^ dotnet MySparkApp.dll input.txt ... 20/11/19 12:39:29 INFO CodeGenerator: Code generated in 10.4659 ms +------+-----+ | word|count| +------+-----+ | .NET| 3| |Apache| 2| | app| 2| | This| 2| | Spark| 2| | World| 1| |counts| 1| | for| 1| | words| 1| | with| 1| | Hello| 1| | uses| 1| +------+-----+ 20/11/19 12:39:29 INFO SparkUI: Stopped Spark web UI at http://nishi.mshome.net:4040 ...
>spark-submit ^ --class org.apache.spark.deploy.dotnet.DotnetRunner ^ --master local ^ microsoft-spark-2-4_2.11-1.0.0.jar ^ dotnet mySparkBatchApp.dll projects_smaller.csv ... +----+--------------------+--------+--------------------+--------------------+--------+---------------+-----------+-------+-------------------+ | id| url|owner_id| name| descriptor|language| created_at|forked_from|deleted| updated_at| +----+--------------------+--------+--------------------+--------------------+--------+---------------+-----------+-------+-------------------+ | 1|https://api.githu...| 1| ruote-kit|RESTful wrapper f...| Ruby|12/8/2009 10:17| 2| 0| 11/5/2015 1:15| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| | 4|https://api.githu...| 24| basemap| null| C++|6/14/2012 14:14| 3| 1|0000-00-00 00:00:00| | 5|https://api.githu...| 28| cocos2d-x|Port of cocos2d-i...| C++|3/12/2012 16:48| 6| 0| 10/22/2015 17:36| |null| null| null| null| null| null| null| null| null| null| | 7|https://api.githu...| 42| cocos2d-x|Port of cocos2d-i...| C|4/23/2012 10:20| 6| 0| 11/1/2015 17:32| |null| null| null| null| null| null| null| null| null| null| | 9|https://api.githu...| 66| rake-compiler|Provide a standar...| Ruby| 8/1/2012 18:33| 14556189| 0| 11/3/2015 19:30| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| | 12|https://api.githu...| 70|heroku-buildpack-...| null| Shell| 8/2/2012 12:50| 11| 0| 11/2/2015 7:34| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| |null| null| null| null| null| null| null| null| null| null| +----+--------------------+--------+--------------------+--------------------+--------+---------------+-----------+-------+-------------------+ only showing top 20 rows +--------------------+--------------------+-----------+----------------+-----------+-------+-------------------+ | name| descriptor| language| created_at|forked_from|deleted| updated_at| +--------------------+--------------------+-----------+----------------+-----------+-------+-------------------+ | ruote-kit|RESTful wrapper f...| Ruby| 12/8/2009 10:17| 2| 0| 11/5/2015 1:15| | cocos2d-x|Port of cocos2d-i...| C++| 3/12/2012 16:48| 6| 0| 10/22/2015 17:36| | cocos2d-x|Port of cocos2d-i...| C| 4/23/2012 10:20| 6| 0| 11/1/2015 17:32| | rake-compiler|Provide a standar...| Ruby| 8/1/2012 18:33| 14556189| 0| 11/3/2015 19:30| | cobertura-plugin|Jenkins cobertura...| Java| 7/26/2012 18:46| 193522| 0| 11/1/2015 19:55| | scala-vs-erlang|A performance tes...| Erlang|12/25/2011 13:51| 1262879| 0| 10/22/2015 4:50| | opencv|OpenCV GitHub Mirror| C++| 8/2/2012 12:50| 29| 0| 10/26/2015 6:44| | redmine_git_hosting|A ChiliProject/Re...| Ruby| 7/30/2012 12:53| 42| 0| 10/28/2015 10:54| | redmine_git_hosting|A ChiliProject/Re...| Ruby|10/26/2011 23:17| 207450| 0| 10/27/2015 22:43| | OpenCV-iOS|This project is a...|Objective-C| 8/2/2012 12:55| 39| 1|0000-00-00 00:00:00| | browserid|A secure, distrib...| JavaScript| 6/30/2012 22:35| 1589| 0| 10/10/2015 0:34| | protobuf-cmake|CMake build suppo...| \N| 8/2/2012 14:35| 61| 0| 10/31/2015 1:22| | loso|Chinese segmentat...| Python| 8/2/2012 12:57| 67| 1|0000-00-00 00:00:00| | yui3| YUI 3.x Source Tree| JavaScript| 7/13/2012 14:48| 55| 1|0000-00-00 00:00:00| | doctag_java|Java library for ...| Java| 8/2/2012 12:57| 70| 1|0000-00-00 00:00:00| |willdurand.github...| My new blog!| JavaScript| 8/2/2012 12:06| 84| 0| 11/4/2015 9:15| | manaserv|A flexible 2D MMO...| C++| 8/1/2011 17:05| 90| 0| 10/10/2015 4:42| | manaserv|A flexible 2D MMO...| C++| 3/24/2011 17:38| 90| 0| 10/16/2015 18:29| | libuv|platform layer fo...| C| 8/2/2012 12:57| 74| 0| 10/31/2015 8:21| | cucumber-js|Pure Javascript C...| JavaScript| 5/28/2012 15:47| 10457| 1|0000-00-00 00:00:00| +--------------------+--------------------+-----------+----------------+-----------+-------+-------------------+ only showing top 20 rows +------------+------------------+ | language| avg(forked_from)| +------------+------------------+ | Racket| 1.2550711E7| | Makefile| 3611689.0| |ActionScript| 2474502.75| | Erlang| 1262879.0| | C#| 914767.625| | PHP| 617219.4333333333| | C++| 448911.1538461539| | Ruby|349311.23214285716| | Perl| 298380.0| | Puppet| 253680.5| | JavaScript| 240718.6494845361| | Java| 149923.71875| | Python| 76043.26190476191| | Shell| 13276.6| | Objective-C|2281.4761904761904| | \N| 1995.095238095238| | C|1576.4705882352941| | Scala| 1358.5| | Groovy| 1049.0| | HaXe| 829.0| +------------+------------------+ only showing top 20 rows [2021-06-25T05:26:32.6714551Z] [NISHI] [Debug] [ConfigurationService] Using the environment variable to construct .NET worker path: ...\spark\Microsoft.Spark.Worker-1.0.0\Microsoft.Spark.Worker.exe. +--------------------+--------------------+-----------+----------------+-----------+-------+-------------------+----------+ | name| descriptor| language| created_at|forked_from|deleted| updated_at|datebefore| +--------------------+--------------------+-----------+----------------+-----------+-------+-------------------+----------+ | ruote-kit|RESTful wrapper f...| Ruby| 12/8/2009 10:17| 2| 0| 11/5/2015 1:15| true| | cocos2d-x|Port of cocos2d-i...| C++| 3/12/2012 16:48| 6| 0| 10/22/2015 17:36| true| | cocos2d-x|Port of cocos2d-i...| C| 4/23/2012 10:20| 6| 0| 11/1/2015 17:32| true| | rake-compiler|Provide a standar...| Ruby| 8/1/2012 18:33| 14556189| 0| 11/3/2015 19:30| true| | cobertura-plugin|Jenkins cobertura...| Java| 7/26/2012 18:46| 193522| 0| 11/1/2015 19:55| true| | scala-vs-erlang|A performance tes...| Erlang|12/25/2011 13:51| 1262879| 0| 10/22/2015 4:50| true| | opencv|OpenCV GitHub Mirror| C++| 8/2/2012 12:50| 29| 0| 10/26/2015 6:44| true| | redmine_git_hosting|A ChiliProject/Re...| Ruby| 7/30/2012 12:53| 42| 0| 10/28/2015 10:54| true| | redmine_git_hosting|A ChiliProject/Re...| Ruby|10/26/2011 23:17| 207450| 0| 10/27/2015 22:43| true| | OpenCV-iOS|This project is a...|Objective-C| 8/2/2012 12:55| 39| 1|0000-00-00 00:00:00| false| | browserid|A secure, distrib...| JavaScript| 6/30/2012 22:35| 1589| 0| 10/10/2015 0:34| false| | protobuf-cmake|CMake build suppo...| \N| 8/2/2012 14:35| 61| 0| 10/31/2015 1:22| true| | loso|Chinese segmentat...| Python| 8/2/2012 12:57| 67| 1|0000-00-00 00:00:00| false| | yui3| YUI 3.x Source Tree| JavaScript| 7/13/2012 14:48| 55| 1|0000-00-00 00:00:00| false| | doctag_java|Java library for ...| Java| 8/2/2012 12:57| 70| 1|0000-00-00 00:00:00| false| |willdurand.github...| My new blog!| JavaScript| 8/2/2012 12:06| 84| 0| 11/4/2015 9:15| true| | manaserv|A flexible 2D MMO...| C++| 8/1/2011 17:05| 90| 0| 10/10/2015 4:42| false| | manaserv|A flexible 2D MMO...| C++| 3/24/2011 17:38| 90| 0| 10/16/2015 18:29| false| | libuv|platform layer fo...| C| 8/2/2012 12:57| 74| 0| 10/31/2015 8:21| true| | cucumber-js|Pure Javascript C...| JavaScript| 5/28/2012 15:47| 10457| 1|0000-00-00 00:00:00| false| +--------------------+--------------------+-----------+----------------+-----------+-------+-------------------+----------+ only showing top 20 rows ...
$ nc -lk 9999 abcdefg
>spark-submit ^ --class org.apache.spark.deploy.dotnet.DotnetRunner ^ --master local ^ microsoft-spark-2-4_2.11-1.0.0.jar ^ dotnet mySparkStreamingApp.dll localhost 9999 ... ------------------------------------------- Batch: x ------------------------------------------- +---------+ | col| +---------+ | abcdefg| |abcdefg 7| +---------+
ココから以降、Microsoft.Sparkのバージョンを2.0.0に上げている。
GPU Service not found. Falling back to CPU AutoML Service. | Trainer MicroAccuracy MacroAccuracy Duration #Iteration | |1 AveragedPerceptronOva 0.7885 0.7924 5.0 1 | |2 SdcaMaximumEntropyMulti 0.7966 0.7675 7.2 2 | |3 LightGbmMulti 0.7442 0.7365 11.4 3 | |4 SymbolicSgdLogisticRegressionOva 0.7115 0.7144 3.4 4 | ===============================================Experiment Results================================================= ------------------------------------------------------------------------------------------------------------------ | Summary | ------------------------------------------------------------------------------------------------------------------ |ML Task: multiclass-classification | |Dataset: ...\MLSparkModel\yelptrain.csv | |Label : Sentiment | |Total experiment time : 27.0050593 Secs | |Total number of models explored: 4 | ------------------------------------------------------------------------------------------------------------------ | Top 4 models explored | ------------------------------------------------------------------------------------------------------------------ | Trainer MicroAccuracy MacroAccuracy Duration #Iteration | |1 SdcaMaximumEntropyMulti 0.7966 0.7675 7.2 1 | |2 AveragedPerceptronOva 0.7885 0.7924 5.0 2 | |3 LightGbmMulti 0.7442 0.7365 11.4 3 | |4 SymbolicSgdLogisticRegressionOva 0.7115 0.7144 3.4 4 | ------------------------------------------------------------------------------------------------------------------ Code Generated
>spark-submit ^ --class org.apache.spark.deploy.dotnet.DotnetRunner ^ --master local ^ microsoft-spark-3-0_2.12-2.0.0.jar ^ dotnet MLSparkModelML.ConsoleApp.dll yelptest.csv MLModel.zip ... +--------------------+---------+ | ReviewText|Sentiment| +--------------------+---------+ |Waitress was swee...| 1| |I also had to tas...| 1| |I'd rather eat ai...| 0| |Cant say enough g...| 1| |The ambiance was ...| 1| |The waitress and ...| 1| |I would not recom...| 0| |Overall I wasn't ...| 0| |My gyro was basic...| 0| | Terrible service!| 0| |Thoroughly disapp...| 0| |I don't each much...| 1| |Give it a try, yo...| 1| |By far the BEST c...| 1| |Reasonably priced...| 1| |Everything was pe...| 1| |The food is very ...| 1| |it was a drive to...| 0| |At first glance i...| 1| |Anyway, I do not ...| 0| +--------------------+---------+ only showing top 20 rows※ 色々と試したが、動かなかった。
- UDF仕込んでshowするとエラーになったり、
- Could not load type 'Microsoft.ML.Data.DataViewTypeAttribute?' と言うエラーが出たり。
$ nc -lk 9999 abcdefg
>spark-submit ^ --class org.apache.spark.deploy.dotnet.DotnetRunner ^ --master local ^ microsoft-spark-3-0_2.12-2.0.0.jar ^ dotnet StructuredNetworkWordCount.dll localhost 9999 ...
>spark-submit ^ --class org.apache.spark.deploy.dotnet.DotnetRunner ^ --master local ^ microsoft-spark-3-0_2.12-2.0.0.jar ^ dotnet StructuredNetworkWordCountWindowed.dll localhost 9999 ...
リンク先を読むと解るが差は僅か。
ループで書かない。
「.NET for Apache Spark ガイドのチュートリアル部」を動画化してある。
https://dotnet.microsoft.com/learn/data/spark-tutorial/intro
Tags: :クラウド, :Azure, :.NET開発, :.NET Core, :.NET Standard