›

Optionxpress-penny-stock-commission Bagaimana-do-stock-options-work-for-employees Sekarang-online-trading-demo Mena-11th-forex-show Thinkorswim-forex-volume London-bursa-perdagangan-elektronik-sistem perdagangan

Memihak kayu tradisional untuk rumah Anda akan selalu menjadi pilihan pilihan dalam pemasangan papan. Namun, Anda memerlukan kontraktor berpihak tepat untuk membantu Anda memilih tampilan kayu terbaik dengan perawatan berpihak paling sedikit. Ada banyak jenis papan kayu yang dibuat dengan menggunakan bahan kayu sub-par. Jangan biarkan nexthellip Anda Baca lebih lanjut Layanan atap profesional dan perbaikan atap dengan harga terjangkau Jika Anda mengalami masalah kebocoran di atap, jangan menunggu untuk menghubungi kontraktor atap. Kebocoran atap yang berkepanjangan nampaknya tidak berbahaya pada awalnya, namun bisa berubah menjadi masalah serius dengan cepat. Setelah kebocoran atap yang terus-menerus dapat menyebabkan pertumbuhan jamur, langit-langit yang rusak, perabotan. Hubungi kami hari ini jika Anda membutuhkan ciph Read More Saat mencoba mengurangi tagihan pemanasan rumah Anda, tidak ada cara yang lebih baik untuk menghemat uang daripada memasang jendela vinyl atau penggantian kayu baru. Yang terpenting, dengan potongan pajak federal baru sampai 1.500 tidak ada alasan untuk tidak melakukannya. Memiliki dek yang indah akan selalu memiliki hasil positif pada nilai properti Anda. Menambahkan dek adalah cara murah untuk memperluas ruang hidup Anda. Biaya rata-rata untuk membangun sebuah dek kira-kira 7.000 dan menghasilkan pengembalian sekitar 15.000 saat menjual rumah Anda. Tidak seburuk, benar Jadi pertimbangkan untuk memiliki yang lebih baik. Baca lebih lanjut Memilih kontraktor berpihak benar sangat penting dalam pemasangan papan. Entah itu memasang vinyl berpihak pada berpihak atau benar-benar menyingkirkan berpihak Anda saat ini untuk berpihak baru. Memiliki kontraktor berpihak profesional yang bisa memberi Anda solusi terbaik untuk pemasangan papan paling halus akan menghemat banyak sakit kepala, waktu dan tenaga. Baca Selengkapnya Apa yang Pelanggan Kami Katakan Sangat senang Saya hanya ingin mengungkapkan betapa senangnya dengan atap baru kami dan Selokan mulus Mike dan pekerjanya sangat menyenangkan dan santun berada di sekitar. Aku tidak percaya seberapa cepat mereka menyelesaikan atap rumah dan garasi kita. Mereka meninggalkan tempat itu lebih bersih daripada saat mereka memulai dan melindungi semak-semak kami, tanaman seperti yang dijanjikan. Yang sangat senang dengan instalasi atap, kami ingin mereka kembali untuk penggantian jendela. Terima kasih Mike Lihat rumah mereka: atap Melrose MA mdash Robert Patricia Quinn, Melrose, MA MBM Construction diberi peringkat 5 5 berdasarkan 3 ulasan. Menemukan Kontraktor Tepat Seharusnya Menyakitkan Melakukan perbaikan rumah atau proyek renovasi yang tepat dapat menambah nilai nyata pada jenis rumah apa pun, jika dilakukan dengan benar dan efisien oleh profesional yang diberi lisensi dan Tertanggung. Menggunakan bahan berkualitas tinggi yang hemat energi, menarik dan yang terpenting, bisa diandalkan seperti low vinyl vinyl berpihak. Penggantian jendela Atap sirap dan geladak khusus akan menambah nilai nyata. Dalam kebanyakan kasus, Anda dapat mengharapkan dan segera mengembalikan investasi Anda setelah perbaikan rumah dilakukan. Pilih kontraktor yang akan menyelesaikan pekerjaan, memandu Anda melewati setiap langkah proyek dari awal sampai akhir tanpa tambahan tersembunyi. Layanan perbaikan rumah kami telah memberi kami keunggulan di atas perusahaan remodeling rumah lainnya. Menjadi salah satu daerah penyedia layanan papan atas di sisi, atap, jendela pengganti, dan penambahan ruangan. Memberi kita daya beli yang besar melalui pemasok kita dan sebagai gantinya, bisa meneruskan simpanannya kepada Anda. Jadi mengapa memilih kita sebagai kontraktor renovasi rumah Anda Kami mendengarkan kebutuhan Anda. Kami tidak menggunakan penjualan dengan tekanan tinggi, atau mencoba menjual sesuatu yang tidak Anda inginkan atau inginkan. Komunikasi adalah kunci untuk semua jenis proyek remodeling dan kami ingin proyek Anda mendapatkan hasil terbaik. Anda akan mendapatkan perkiraan perkiraan waktu penyelesaian - tanpa sakit kepala Anda juga akan mendapatkan salah satu jaminan terbaik dalam bisnis perbaikan rumah, jika Anda perlu menggunakannya. Apa Jenis Layanan Perbaikan Rumah yang Anda Cari Memilih kontraktor atap yang akan bekerja untuk kepentingan terbaik Anda dan bukan pada berapa banyak keuntungan yang dapat dia hasilkan dengan memotong sudut. Sebagai kontraktor atap, kami percaya dengan menggunakan atap dan atap bawah atap terbaik untuk memberi ketenangan pada klien kami. Untuk info lebih lanjut tentang layanan atap, kunjungi: atap datar komersial atau atap rumah Tidak semua vinil dan papan kayu sama. Pilihlah kontraktor berpihak profesional yang akan membantu Anda memahami jenis pemutaran vinyl apa yang harus dihindari, dan apa yang akan memberi Anda kemenangan terbaik untuk uang Anda dalam jangka panjang. Dari dek bertekanan standar, dek mahoni, atau dek komposit - membuat Anda tertutup. Baca lebih lanjut tentang Deck and Porches Deck buildersCrowdsourcing adalah cara yang sangat populer untuk mendapatkan sejumlah besar data berlabel yang dibutuhkan oleh metode pembelajaran mesin modern. Meski murah dan cepat didapat, label crowdsourced mengalami sejumlah kesalahan yang signifikan, sehingga menurunkan kinerja tugas belajar mesin hilir. Dengan tujuan untuk meningkatkan kualitas data berlabel, kami berusaha mengurangi banyak kesalahan yang terjadi karena kesalahan konyol atau kesalahan yang tidak disengaja oleh pekerja crowdsourcing. Kami mengusulkan pengaturan dua tahap untuk kerumunan orang di mana pekerja pertama kali menjawab pertanyaan, dan kemudian diizinkan untuk mengubah jawabannya setelah melihat jawaban referensi (berisik). Kami secara matematis merumuskan proses ini dan mengembangkan mekanisme untuk memberi insentif kepada pekerja untuk bertindak dengan tepat. Jaminan matematis kami menunjukkan bahwa mekanisme kami memberi insentif kepada para pekerja untuk menjawab dengan jujur di kedua tahap, dan menahan diri untuk tidak menjawab secara acak pada tahap pertama atau hanya menyalinnya di tahap kedua. Eksperimen numerik menunjukkan peningkatan kinerja yang signifikan yang dapat dilakukan oleh 8220 sendiri-koreksi8221 saat menggunakan crowdsourcing untuk melatih algoritma pembelajaran mesin. Ada berbagai model parametrik untuk menganalisis data perbandingan berpasangan, termasuk model Bradley-Terry-Luce (BTL) dan Thurstone, namun ketergantungan mereka pada asumsi parametrik yang kuat membatasi. Dalam karya ini, kita mempelajari model fleksibel untuk perbandingan berpasangan, di mana probabilitas hasil dibutuhkan hanya untuk memenuhi bentuk alami dari transitivity stokastik. Kelas ini mencakup model parametrik termasuk model BTL dan Thurstone sebagai kasus khusus, namun jauh lebih umum. Kami menyediakan berbagai contoh model dalam kelas transitif stokastik yang lebih luas dimana model parametrik klasik memberikan kecocokan yang buruk. Meskipun fleksibilitas ini lebih besar, kami menunjukkan bahwa matriks probabilitas dapat diperkirakan pada tingkat yang sama seperti pada model parametrik standar. Di sisi lain, tidak seperti model BTL dan Thurstone, menghitung estimator optimum minimum model stochastically transitive adalah non-sepele, dan kami mengeksplorasi berbagai alternatif yang dapat dilakukan secara komputasi. Kami menunjukkan bahwa algoritma thresholding singular sederhana secara statistik konsisten namun tidak mencapai tingkat minimax. Kami kemudian mengusulkan dan mempelajari algoritma yang mencapai tingkat minimax pada subkelas yang menarik dari kelas transitif stokastik penuh. Kami melengkapi hasil teoritis kami dengan simulasi numerik menyeluruh. Kami menunjukkan bagaimana model berpasangan biner dapat dicabut ke model simetris sepenuhnya, dimana potensi tunggal asli diubah menjadi potensi pada sisi ke variabel tambahan, dan kemudian dirender ke model baru pada jumlah variabel aslinya. Model baru ini pada dasarnya setara dengan model aslinya, dengan fungsi partisi yang sama dan memungkinkan pemulihan marginal asli atau kongres MAP, namun mungkin memiliki sifat komputasi yang sangat berbeda sehingga memungkinkan kesimpulan yang lebih efisien. Pendekatan meta ini memperdalam pemahaman kita, dapat diterapkan pada algoritma yang ada untuk menghasilkan metode perbaikan dalam praktik, menggeneralisasi hasil teoritis sebelumnya, dan mengungkapkan interpretasi yang luar biasa dari polytope triplet yang konsisten. Kami menunjukkan bagaimana metode pembelajaran yang mendalam dapat diterapkan dalam konteks crowdsourcing dan ansambel tanpa pengawasan. Pertama, kami membuktikan bahwa model populer Dawid and Skene, yang mengasumsikan bahwa semua pengklasifikasi bersifat kondisional, adalah Mesin Boltzmann yang Dibatasi (RBM) dengan satu simpul tersembunyi. Oleh karena itu, di bawah model ini, probabilitas posterior dari label sebenarnya dapat diperkirakan melalui RBM yang terlatih. Selanjutnya, untuk mengatasi kasus yang lebih umum, di mana pengklasifikasi dapat sangat melanggar asumsi independensi bersyarat, kami mengusulkan untuk menerapkan RBM berbasis Net Neural Net (DNN). Hasil eksperimen pada berbagai dataset simulasi dan data dunia nyata menunjukkan bahwa pendekatan DNN yang kami tawarkan lebih baik daripada metode state-of-the-art lainnya, terutama bila data tersebut melanggar asumsi independensi bersyarat. Meninjau kembali Pembelajaran Semi-Supervisi dengan Grafik Embeddings Zhilin Yang Carnegie Mellon University. William Cohen CMU. Ruslan Salakhudinov U. dari Toronto Paper AbstractWe menyajikan kerangka belajar semi-supervisi berdasarkan grafik embeddings. Dengan grafik antara contoh, kami melatih penyisipan untuk setiap contoh untuk bersama-sama memprediksi label kelas dan konteks lingkungan dalam grafik. Kami mengembangkan varian transduktif dan induktif dari metode kami. Dalam varian transduktif metode kami, label kelas ditentukan oleh vektor pembelajaran dan vektor fitur masukan, sementara pada varian induktif, embeddings didefinisikan sebagai fungsi parametrik dari vektor fitur, sehingga prediksi dapat dibuat pada contoh yang tidak Terlihat saat latihan. Pada rangkaian tugas benchmark yang besar dan beragam, termasuk klasifikasi teks, ekstraksi entitas yang diawasi secara ketat, dan klasifikasi entitas, kami menunjukkan peningkatan kinerja pada banyak model yang ada. Penguatan pembelajaran dapat memperoleh perilaku kompleks dari spesifikasi tingkat tinggi. Namun, menentukan fungsi biaya yang bisa dioptimalkan secara efektif dan mengkodekan tugas yang benar adalah tantangan dalam praktiknya. Kami mengeksplorasi bagaimana invers optimal control (IOC) dapat digunakan untuk mempelajari perilaku dari demonstrasi, dengan aplikasi untuk mengendalikan torsi sistem robot berdimensi tinggi. Metode kami membahas dua tantangan utama dalam pengendalian optimal terbalik: pertama, kebutuhan akan fitur informatif dan regularisasi yang efektif untuk menerapkan struktur pada biaya, dan kedua, sulitnya mempelajari fungsi biaya di bawah dinamika yang tidak diketahui untuk sistem kontinu dimensi tinggi. Untuk mengatasi tantangan sebelumnya, kami menyajikan algoritma yang mampu mempelajari fungsi biaya nonlinier sewenang-wenang, seperti jaringan syaraf tiruan, tanpa rekayasa fitur yang teliti. Untuk mengatasi tantangan terakhir, kami merumuskan perkiraan berbasis sampel yang efisien untuk MaxEnt IOC. Kami mengevaluasi metode kami pada serangkaian tugas simulasi dan masalah manipulasi robotik real-dunia, menunjukkan peningkatan yang substansial dari metode sebelumnya, baik dari segi kompleksitas tugas dan efisiensi sampel. Dalam mempelajari model variabel laten (LVMs), penting untuk menangkap secara efektif pola yang jarang terjadi dan mengecilkan ukuran model tanpa mengorbankan kekuatan pemodelan. Berbagai penelitian telah dilakukan untuk memperbaiki LVM, yang bertujuan untuk mempelajari beragam komponen laten dalam LVMs. Sebagian besar studi yang ada termasuk dalam kerangka regularisasi bergaya frequentist, dimana komponen dipelajari melalui estimasi titik. Dalam tulisan ini, kami menyelidiki bagaimana cara mengubah paradigma pembelajaran Bayesian, yang memiliki kelebihan yang melengkapi estimasi titik, seperti mengurangi overfitting melalui model rata-rata dan mengkuantifikasi ketidakpastian. Kami mengusulkan dua pendekatan yang memiliki keunggulan komplementer. Salah satunya adalah untuk mendefinisikan keragaman yang mempromosikan kelas sudut bersama yang menetapkan kepadatan yang lebih besar ke komponen dengan sudut pandang yang lebih besar berdasarkan jaringan Bayesian dan distribusi von Mises-Fisher dan menggunakan ramuan ini untuk mempengaruhi posterior melalui peraturan Bayes. Kami mengembangkan dua algoritma inferensi perkiraan posterior yang efisien berdasarkan inferensi variasional dan sampling Markov chain Monte Carlo. Pendekatan lainnya adalah memaksakan regularisasi keterpusatan eksposur secara langsung melalui distribusi komponen post-data. Kedua metode ini diterapkan pada model ahli botani Bayesian untuk mendorong para ahli pengukuran menjadi hasil yang beragam dan eksperimental menunjukkan efektivitas dan efisiensi metode kami. Regresi nonparametrik dimensi tinggi merupakan masalah yang secara inheren sulit dikenali batas bawah yang tergantung secara eksponensial dalam dimensi. Strategi populer untuk meringankan kutukan dimensi ini adalah dengan menggunakan model aditif dari emph, yang memodelkan fungsi regresi sebagai jumlah fungsi independen pada setiap dimensi. Meskipun berguna dalam mengendalikan varians perkiraan, model semacam itu seringkali terlalu membatasi dalam pengaturan praktis. Antara model non-aditif yang sering memiliki varians besar dan model aditif orde pertama yang memiliki bias besar, hanya ada sedikit usaha untuk mengeksploitasi trade-off di tengahnya melalui model aditif pesanan menengah. Dalam karya ini, kami mengusulkan salsa, yang menjembatani kesenjangan ini dengan membiarkan interaksi antar variabel, namun mengendalikan kapasitas model dengan membatasi urutan interaksi. Salsas meminimalkan jumlah kuadrat sisa dengan hukuman normal RKHS kuadrat. Algoritma ini dapat dilihat sebagai Regresi Kernel Ridge dengan kernel aditif. Bila fungsi regresi aditif, kelebihan risiko hanya polinomial dalam dimensi. Dengan menggunakan formula Girard-Newton, kami secara efisien merangkum sejumlah istilah kombinasi dalam ekspansi aditif. Melalui perbandingan pada 15 dataset nyata, kami menunjukkan bahwa metode kami bersaing melawan 21 alternatif lainnya. Kami mengusulkan perpanjangan proses Hawkes dengan memperlakukan tingkat eksitasi diri sebagai persamaan diferensial stokastik. Proses titik baru kami memungkinkan perkiraan yang lebih baik dalam domain aplikasi di mana peristiwa dan intensitas saling mempercepat dengan tingkat penularan yang berkorelasi. Kami menggeneralisasi algoritma baru-baru ini untuk simulasi menarik dari proses Hawkes yang tingkat eksitasinya adalah proses stokastik, dan mengusulkan pendekatan rantai Markov rantai ganda Monte Carlo untuk pemasangan model. Prosedur sampling kami secara linear dengan jumlah kejadian yang dibutuhkan dan tidak memerlukan stationitas dari proses titik. Prosedur inferensi modular yang terdiri dari kombinasi antara langkah Gibbs dan Metropolis Hastings diajukan. Kami memulihkan maksimalisasi harapan sebagai kasus khusus. Pendekatan umum kami diilustrasikan untuk penularan mengikuti gerak Brown geometris dan dinamika Langevin yang eksponensial. Sistem agregasi peringkat mengumpulkan preferensi ordinal dari individu untuk menghasilkan peringkat global yang mewakili preferensi sosial. Untuk mengurangi kompleksitas komputasi dalam mempelajari peringkat global, praktik yang umum dilakukan adalah dengan menggunakan pemecahan peringkat. Preferensi individu dipecah menjadi perbandingan berpasangan dan kemudian diterapkan pada algoritma efisien yang disesuaikan untuk perbandingan berpasangan independen. Namun, karena ketergantungan yang diabaikan, pendekatan pemutusan peringkat naif dapat mengakibatkan perkiraan yang tidak konsisten. Gagasan utama untuk menghasilkan taksiran yang tidak bias dan akurat adalah dengan memperlakukan hasil perbandingan yang dipasangkan dengan tidak adil, bergantung pada topologi data yang dikumpulkan. Dalam tulisan ini, kami memberikan estimator pemecah rangking yang optimal, yang tidak hanya mencapai konsistensi tetapi juga mencapai kesalahan terbaik. Hal ini memungkinkan kita untuk mengkarakterisasi tradeoff fundamental antara akurasi dan kompleksitas dalam beberapa skenario kanonik. Selanjutnya, kita mengidentifikasi bagaimana akurasi tergantung pada jarak spektral dari grafik perbandingan yang sesuai. Penyulingan sulap Samuel Rota Bul FBK. Lorenzo Porzi FBK. Peter Kontschieder Microsoft Research Cambridge Paper AbstractDropout adalah teknik regularisasi stokastik yang populer untuk jaringan syaraf dalam yang bekerja dengan menjatuhkan secara acak (yaitu zeroing) unit dari jaringan selama pelatihan. Proses pengacakan ini memungkinkan untuk secara implisit melatih ansambel secara eksponensial banyak jaringan yang berbagi parameter yang sama, yang harus dirata-ratakan pada waktu uji untuk memberikan prediksi akhir. Solusi umum untuk operasi rata-rata yang sulit ini terdiri dari penskalaan lapisan yang menjalani pengacakan putus sekolah. Aturan sederhana yang disebut 8216standard dropout8217 ini efisien, namun bisa menurunkan keakuratan prediksi. Dalam karya ini, kami memperkenalkan sebuah pendekatan baru, yang menghasilkan penyulingan 8216, yang memungkinkan kita melatih prediktor dengan cara yang lebih baik untuk memperkirakan proses pengimbangan rata-rata yang sulit, namun lebih baik, sambil tetap mengendalikan efisiensinya. Dengan demikian kita dapat membangun model yang seefektif standar putus sekolah, atau bahkan lebih efisien lagi, sementara lebih akurat. Percobaan pada dataset benchmark standar menunjukkan validitas metode kami, menghasilkan perbaikan yang konsisten terhadap penjadwalan konvensional. Pesan anonim Metadata-sadar Giulia Fanti UIUC. Peter Kairouz UIUC. Sewoong Oh UIUC. Kannan Ramchandran UC Berkeley. Pramod Viswanath UIUC Paper Abstract Platform pesan anonim seperti Whisper dan Yik Yak memungkinkan pengguna menyebarkan pesan melalui jaringan (mis., Jaringan sosial) tanpa mengungkapkan kepengarangan pesan kepada pengguna lain. Penyebaran pesan pada platform ini dapat dimodelkan dengan proses difusi melalui grafik. Kemajuan terbaru dalam analisis jaringan telah mengungkapkan bahwa proses difusi semacam itu rentan terhadap kehilangan identitas penulis oleh lawan yang memiliki akses terhadap metadata, seperti informasi waktu. Dalam karya ini, kami mengajukan pertanyaan mendasar tentang bagaimana cara menyebarkan pesan anonim melalui sebuah grafik sehingga sulit bagi lawan untuk menyimpulkan sumbernya. Secara khusus, kami mempelajari kinerja protokol propagasi pesan yang disebut difusi adaptif yang diperkenalkan di (Fanti et al., 2015). Kami membuktikan bahwa ketika musuh memiliki akses terhadap metadata pada sebagian kecil dari nodus grafik yang rusak, difusi adaptif mencapai penyembunyian sumber asimtotik secara optimal dan secara signifikan melebihi standar difusi. Kami lebih jauh menunjukkan secara empiris bahwa difusi adaptif menyembunyikan sumber secara efektif pada jaringan sosial yang sebenarnya. Dimensi Pengajaran Pembelajar Linear Ji Liu University of Rochester. Xiaojin Zhu University of Wisconsin. Hurst Ohannessian University of Wisconsin-Madison Paper AbstractTeaching dimension adalah kuantitas teoritis pembelajaran yang menentukan ukuran latihan minimum untuk mengajarkan model target kepada pelajar. Studi sebelumnya tentang dimensi pengajaran difokuskan pada pembelajar kelas versi yang mempertahankan semua hipotesis sesuai dengan data pelatihan, dan tidak dapat diterapkan pada pelajar mesin modern yang memilih hipotesis spesifik melalui pengoptimalan. Makalah ini menyajikan dimensi pengajaran pertama yang diketahui untuk regresi ridge, mesin vektor pendukung, dan regresi logistik. Kami juga menunjukkan rangkaian pelatihan optimal yang sesuai dengan dimensi pengajaran ini. Pendekatan kami menggeneralisasi pelajar linier lainnya. Estimator Univariat Sejati Ioannis Caragiannis University of Patras. Ariel Procaccia Universitas Carnegie Mellon. Nisarg Shah Carnegie Mellon University Paper AbstractKami meninjau kembali masalah klasik untuk memperkirakan mean populasi dari distribusi dimensi tunggal yang tidak diketahui dari sampel, mengambil sudut pandang permainan-teoritis. Di tempat kami, sampel dipasok oleh agen strategis, yang ingin menarik perkiraan sedekat mungkin dengan nilainya sendiri. Dalam setting ini, mean sampel menimbulkan peluang manipulasi, sedangkan median sampel tidak. Pertanyaan utama kami adalah apakah median sampel adalah yang terbaik (dalam hal kesalahan kuadrat rata-rata) estimator sebenarnya dari mean populasi. Kami menunjukkan bahwa ketika distribusi yang mendasari simetris, ada estimator sejati yang mendominasi median. Hasil utama kami adalah karakterisasi estimator jujur yang ideal, yang terbukti mengungguli median, untuk distribusi asimetris yang mungkin dengan dukungan terbatas. Mengapa Regularized Auto-Encoders mempelajari Representasi Jarang Devansh Arpit SUNY Buffalo. Yingbo Zhou SUNY Buffalo. Hung Ngo SUNY Buffalo. Venu Govindaraju SUNY Buffalo Paper AbstractSparse distributed representation adalah kunci untuk mempelajari fitur yang berguna dalam algoritma pembelajaran yang mendalam, karena tidak hanya itu adalah mode representasi data yang efisien, namun juga 8212 lebih penting lagi, 8212 ini menangkap proses pembangkitan data dunia nyata. Sementara sejumlah encoden otomatis yang diatur (regular-encoders) menerapkan sparsitas secara eksplisit dalam representasi terpelajar mereka dan yang lainnya tidak, hanya sedikit analisis formal mengenai apa yang mendorong percikan pada model ini secara umum. Tujuan kami adalah untuk secara formal mempelajari masalah umum ini untuk encoders otomatis yang diatur. Kami menyediakan kondisi yang cukup baik pada regularisasi dan fungsi aktivasi yang mendorong sparsity. Kami menunjukkan bahwa beberapa model populer (de-noising dan contractive auto encoders, mis.) Dan aktivasi (rectified linear dan sigmoid, mis.) Memenuhi kondisi ini, kondisi kami membantu menjelaskan sparsitas dalam representasi terpelajar mereka. Dengan demikian, analisis teoritis dan empiris kami secara keseluruhan menjelaskan sifat-sifat regularisasi yang bersifat konduktif terhadap sparsitas dan menyatukan sejumlah model auto-encoder yang ada dan fungsi aktivasi berdasarkan kerangka analisis yang sama. K-variates: lebih banyak plus di k-means Richard Nock Nicta 038 ANU. Raphael Canyasse Ecole Polytechnique dan The Technion. Roksana Boreli Data61. Frank Nielsen Ecole Polytechnique dan Sony CS Labs Inc. Paper Abstractk-means seeding telah menjadi standar de facto untuk algoritma clustering keras. Dalam makalah ini, kontribusi pertama kami adalah generalisasi dua arah dari pembibitan ini, k-variates, yang mencakup pengambilan sampel kerapatan umum dan bukan hanya kerapatan Dirac diskrit yang dilabuhkan pada lokasi titik, mengirimkan sebuah generalisasi yang terkenal Estimasi apung Arthur-Vassilvitskii (AV), dalam bentuk pendekatan textit yang terikat pada textit optimum. Aproksimasi ini menunjukkan ketergantungan yang berkurang pada komponen 8220noise8221 sehubungan dengan potensi optimal 8212 yang benar-benar mendekati batas bawah statistik. Kami menunjukkan bahwa k-variates textit untuk algoritma clustering yang efisien (pembesaran bias) disesuaikan dengan kerangka kerja spesifik, termasuk pengelompokan, streaming dan on-line yang terdistribusi, dengan hasil aplikasinya untuk algoritma ini. Akhirnya, kami menyajikan aplikasi baru k-variates untuk privasi diferensial. Untuk kerangka kerja spesifik yang dipertimbangkan di sini, atau untuk pengaturan privasi diferensial, tidak banyak hasil sebelum penerapan langsung k-means dan perkiraannya 8212 keadaan pesaing seni tampak secara signifikan lebih kompleks dan atau kurang ditampilkan. Menguntungkan (aproksimasi) properti. Kami menekankan bahwa algoritme kami masih dapat dijalankan dalam kasus dimana ada solusi bentuk teks tertutup untuk diminimalkan populasi. Kami menunjukkan penerapan analisis kami melalui evaluasi eksperimental pada beberapa domain dan setting, menampilkan kinerja kompetitif vs keadaan seni. Bandit Multi-Player 8212 Pendekatan Kursi Musik Jonathan Rosenski Weizmann Institute of Science. Ohad Shamir Weizmann Institute of Science. Liran Szlak Weizmann Institute of Science Paper Abstract Kami mempertimbangkan varian dari masalah bandit stokastik multi-bersenjata, di mana banyak pemain secara bersamaan memilih dari rangkaian senjata yang sama dan mungkin bertabrakan, tidak mendapat imbalan. Pengaturan ini dimotivasi oleh masalah yang timbul di jaringan radio kognitif, dan terutama menantang berdasarkan asumsi realistis bahwa komunikasi antar pemain terbatas. Kami menyediakan algoritma komunikasi bebas (Musical Chairs) yang mencapai penyesalan konstan dengan probabilitas tinggi, serta algoritma penyortiran bebas sublinear-regret, Dynamic Musical Chairs) untuk pengaturan pemain yang lebih sulit yang secara dinamis masuk dan keluar sepanjang permainan. . Selain itu, kedua algoritma tersebut tidak memerlukan pengetahuan sebelumnya tentang jumlah pemain. Sepengetahuan kami, ini adalah algoritma komunikasi bebas pertama dengan jenis jaminan formal ini. Informasi Sieve Greg Ver Steeg Information Sciences Institute. Aram Galstyan Information Sciences Institute Paper AbstractWe memperkenalkan kerangka kerja baru untuk pembelajaran tanpa pengawasan dari representasi berdasarkan dekomposisi informasi hirarkis novel. Secara intuitif, data dilalui melalui serangkaian saringan yang secara bertahap berbutir halus. Setiap lapisan saringan memulihkan satu faktor laten yang sangat informatif tentang ketergantungan multivariat pada data. Data ditransformasikan setelah masing-masing berlalu sehingga informasi yang tidak dapat dijelaskan yang tersisa terjerumus ke lapisan berikutnya. Pada akhirnya, kita ditinggalkan dengan satu set faktor laten yang menjelaskan semua ketergantungan pada data asli dan informasi tambahan yang terdiri dari kebisingan independen. Kami menyajikan implementasi praktis dari kerangka kerja ini untuk variabel diskrit dan menerapkannya pada berbagai tugas mendasar dalam pembelajaran tanpa pengawasan termasuk analisis komponen independen, kompresi lossy dan lossless, dan prediksi nilai data yang hilang. Deep Speech 2. Pengenalan Ucapan Akhir-ke-Akhir dalam bahasa Inggris dan Mandarin Dario Amodei. Rishita Anubhai. Eric Battenberg. Kasus Carl Jared Casper. Bryan Catanzaro. JingDong Chen. Mike Chrzanowski Baidu USA, Inc. Adam Coates. Greg Diamos Baidu USA, Inc. Erich Elsen Baidu USA, Inc. Jesse Engel. Linxi Fan. Christopher Fougner Awni Hannun Baidu USA, Inc. Billy Jun. Tony Han Patrick LeGresley. Xiangang Li Baidu. Libby Lin. Sharan Narang. Andrew Ng. Sherjil Ozair. Ryan Prenger Sheng Qian Baidu. Jonathan Raiman. Sanjeev Satheesh Baidu SVAIL. David Seetapun. Shubho Sengupta. Chong Wang. Yi Wang. Zhiqian Wang. Bo Xiao Yan Xie Baidu. Dani Yogatama. Jun Zhan. Zhenyao Zhu Paper Abstract Kami menunjukkan bahwa pendekatan pembelajaran mendalam end-to-end dapat digunakan untuk mengenali bahasa Mandarin atau Mandarin Cina dengan bahasa yang sangat berbeda. Karena ini menggantikan seluruh jaringan pipa komponen rekayasa tangan dengan jaringan syaraf tiruan, pembelajaran end-to-end memungkinkan kita untuk menangani beragam jenis suara termasuk lingkungan yang bising, aksen dan bahasa yang berbeda. Kunci pendekatan kami adalah penerapan teknik HPC kami, memungkinkan eksperimen yang sebelumnya membutuhkan waktu beberapa minggu untuk berjalan dalam beberapa hari. Hal ini memungkinkan kita untuk iterate lebih cepat untuk mengidentifikasi superior arsitektur dan algoritma. Akibatnya, dalam beberapa kasus, sistem kita bersaing dengan transkripsi pekerja manusia bila dibandingkan dengan dataset standar. Akhirnya, dengan menggunakan teknik yang disebut Batch Dispatch dengan GPU di pusat data, kami menunjukkan bahwa sistem kami dapat digunakan secara murah dalam setting online, memberikan latency rendah saat melayani pengguna dalam skala besar. Pertanyaan penting dalam pemilihan fitur adalah apakah strategi seleksi memulihkan fitur fitur 8220true8221, dengan data yang cukup. Kami mempelajari pertanyaan ini dalam konteks strategi pemilihan fitur Penyesuaian Mutlak Terkecil dan Seleksi Terkenal (Lasso). Secara khusus, kita mempertimbangkan skenario ketika model tersebut salah spesifikasi sehingga model terpelajar bersifat linier sedangkan target sebenarnya yang mendasarinya adalah nonlinier. Anehnya, kami membuktikan bahwa dalam kondisi tertentu, Lasso masih bisa memulihkan fitur yang benar dalam hal ini. Kami juga melakukan studi numerik untuk secara empiris memverifikasi hasil teoritis dan mengeksplorasi perlunya kondisi di mana bukti tersebut berlaku. Kami mengusulkan minimal regret search (MRS), sebuah fungsi akuisisi baru untuk optimasi Bayesian. MRS memiliki kemiripan dengan pendekatan teoritis informasi seperti pencarian entropi (ES). Namun, sementara ES bertujuan untuk setiap permintaan dalam memaksimalkan keuntungan informasi sehubungan dengan maksimum global, MRS bertujuan untuk meminimalkan penyesalan sederhana yang diharapkan atas rekomendasi utamanya agar optimal. Sementara secara empiris ES dan MRS melakukan hal yang sama pada sebagian besar kasus, MRS menghasilkan lebih sedikit outlier dengan penyesalan sederhana dari ES. Kami memberikan hasil empiris baik untuk masalah pengoptimalan tugas tunggal sintetis maupun untuk masalah kontrol robot multi fungsi yang disimulasikan. CryptoNets: Melaksanakan Jaringan Syaraf Tiruan ke Data Terenkripsi dengan Throughput Tinggi dan Akurasi Ran Gilad-Bachrach Microsoft Research. Nathan Dowlin Princeton. Kim Laine Microsoft Research. Kristin Lauter Microsoft Research. Michael Naehrig Microsoft Research. John Wernsing Microsoft Research Paper AbstractApplying mesin belajar untuk masalah yang melibatkan medis, keuangan, atau jenis data sensitif lainnya, tidak hanya memerlukan prediksi yang akurat namun juga memperhatikan dengan hati-hati menjaga privasi dan keamanan data. Persyaratan hukum dan etika dapat mencegah penggunaan solusi pembelajaran mesin berbasis cloud untuk tugas semacam itu. Dalam karya ini, kami akan menyajikan sebuah metode untuk mengubah jaringan saraf yang dipelajari ke CryptoNets, jaringan syaraf tiruan yang dapat diterapkan pada data terenkripsi. Ini memungkinkan pemilik data mengirim data mereka dalam bentuk terenkripsi ke layanan awan yang menghosting jaringan. Enkripsi memastikan bahwa data tetap rahasia karena awan tidak memiliki akses ke tombol yang diperlukan untuk mendekripsinya. Namun demikian, kami akan menunjukkan bahwa layanan cloud mampu menerapkan jaringan syaraf tiruan ke data terenkripsi untuk membuat prediksi terenkripsi, dan juga mengembalikannya dalam bentuk terenkripsi. Prediksi terenkripsi ini dapat dikirim kembali ke pemilik kunci rahasia yang bisa mendekripsinya. Oleh karena itu, layanan cloud tidak mendapatkan informasi apapun tentang data mentah maupun prediksi yang dibuatnya. Kami menunjukkan CryptoNets pada tugas pengenalan karakter optik MNIST. CryptoNets mencapai 99 akurasi dan bisa menghasilkan sekitar 59000 prediksi per jam pada satu PC. Oleh karena itu, mereka memungkinkan prediksi throughput, akurat, dan pribadi yang tinggi. Metode spektral untuk pengurangan dimensi dan pengelompokan memerlukan pemecahan masalah eigen yang didefinisikan oleh matriks afinitas yang jarang. Bila matriks ini besar, kita mencari solusi perkiraan. Cara standar untuk melakukan ini adalah metode Nystrom, yang pertama-tama memecahkan masalah eigen kecil yang hanya mempertimbangkan subset dari titik tengara, dan kemudian menerapkan formula out-of-sample untuk mengekstrapolasi solusi ke keseluruhan dataset. Kami menunjukkan bahwa dengan membatasi masalah asli untuk memenuhi formula Nystrom, kami memperoleh perkiraan yang sederhana dan efisien secara komputasi, namun menghasilkan kesalahan aproksimasi yang lebih rendah dengan menggunakan lebih sedikit tengara dan kurang runtime. Kami juga mempelajari peran normalisasi dalam biaya komputasi dan kualitas solusi yang dihasilkan. Sebagai aktivasi non linier yang banyak digunakan, Rectified Linear Unit (ReLU) memisahkan noise dan sinyal pada peta fitur dengan mempelajari ambang batas atau bias. Namun, kami berpendapat bahwa klasifikasi kebisingan dan sinyal tidak hanya bergantung pada besarnya tanggapan, namun juga konteks bagaimana tanggapan fitur akan digunakan untuk mendeteksi lebih banyak pola abstrak pada lapisan yang lebih tinggi. Untuk menghasilkan beberapa peta respons dengan besaran dalam rentang yang berbeda untuk pola visual tertentu, jaringan yang ada yang menggunakan ReLU dan variannya harus mempelajari sejumlah besar filter berlebihan. Dalam makalah ini, kami mengusulkan lapisan aktivasi non-linear multi-bias (MBA) untuk mengeksplorasi informasi yang tersembunyi dalam besaran tanggapan. Ini ditempatkan setelah lapisan konvolusi untuk memisahkan respons ke kernel konvolusi ke beberapa peta dengan besaran multi-thresholding, sehingga menghasilkan lebih banyak pola di ruang fitur dengan biaya komputasi yang rendah. Ini memberikan fleksibilitas yang besar untuk memilih tanggapan terhadap pola visual yang berbeda dalam rentang magnitudo yang berbeda untuk menghasilkan representasi yang kaya di lapisan yang lebih tinggi. Skema yang sederhana namun efektif ini mencapai kinerja mutakhir pada beberapa tolok ukur. Kami mengusulkan metode pembelajaran multi tugas yang dapat meminimalkan efek transfer negatif dengan membiarkan transfer asimetris antara tugas berdasarkan keterkaitan tugas serta jumlah kerugian tugas individual, yang kami sebut sebagai Asymmetric Multi-Task Learning (AMTL ). Untuk mengatasi masalah ini, kami menggabungkan beberapa tugas melalui grafik regularisasi yang jarang dan diarahkan, yang memaksa setiap parameter tugas untuk direkonstruksi sebagai kombinasi tugas lain yang jarang, yang dipilih berdasarkan kerugian tugas-bijaksana. Kami menyajikan dua algoritma yang berbeda untuk memecahkan pembelajaran gabungan dari prediktor tugas dan grafik regularisasi. Algoritma pertama menyelesaikan tujuan pembelajaran asli dengan menggunakan pengoptimalan alternatif, dan algoritma kedua menyelesaikan perkiraan dengan menggunakan strategi pembelajaran kurikulum, yang mempelajari satu tugas sekaligus. Kami melakukan eksperimen pada beberapa dataset untuk klasifikasi dan regresi, di mana kita mendapatkan peningkatan kinerja yang signifikan selama pembelajaran tugas tunggal dan baseline pembelajaran multitask simetris. Makalah ini mengilustrasikan pendekatan baru terhadap estimasi kesalahan generalisasi pengelompokan pohon keputusan. Kami menetapkan studi tentang kesalahan pohon keputusan dalam konteks teori analisis konsistensi, yang membuktikan bahwa kesalahan Bayes dapat dicapai hanya jika ketika jumlah sampel data dilemparkan ke dalam setiap simpul daun tidak terbatas. Untuk kasus yang lebih menantang dan praktis dimana ukuran sampelnya terbatas atau kecil, istilah kesalahan sampling baru diperkenalkan di makalah ini untuk mengatasi masalah sampel kecil secara efektif dan efisien. Hasil percobaan yang ekstensif menunjukkan bahwa estimasi kesalahan yang diajukan lebih unggul dari metode validasi silang K-fold yang terkenal dalam hal ketahanan dan akurasi. Selain itu, perintah besarnya lebih efisien daripada metode validasi silang. Kami mempelajari sifat konvergensi dari algoritma VR-PCA yang diperkenalkan dengan mengutip untuk perhitungan cepat vektor tunggal terkemuka. Kami membuktikan beberapa hasil baru, termasuk analisis formal dari versi blok algoritma, dan konvergensi dari inisialisasi acak. Kami juga membuat beberapa pengamatan mengenai kepentingan independen, seperti bagaimana pra-inisialisasi hanya dengan satu iterasi daya yang tepat dapat memperbaiki analisis secara signifikan, dan apa sifat konveksitas dan non-konveksitas dari masalah pengoptimalan yang mendasarinya. Kami mempertimbangkan masalah analisis komponen utama (PCA) dalam pengaturan stokastik yang mengalir, di mana tujuan kami adalah untuk menemukan arah perkiraan varians maksimal, berdasarkan arus i.i.d. Titik data di realsd Sebuah algoritma sederhana dan komputasi murah untuk ini adalah stenastic gradient descent (SGD), yang secara bertahap memperbarui perkiraan berdasarkan masing-masing titik data baru. Namun, karena sifat non-cembung dari masalah ini, menganalisis kinerjanya telah menjadi tantangan tersendiri. Secara khusus, jaminan yang ada bergantung pada asumsi eigengap non-sepele pada matriks kovariansi, yang secara intuitif tidak perlu. Dalam makalah ini, kami memberikan (sepengetahuan kami) konvergensi bebas eigengap gratis untuk SGD dalam konteks PCA. Ini juga sebagian menyelesaikan masalah terbuka yang diajukan dalam kutipan. Selain itu, dengan asumsi eigengap, kami menunjukkan bahwa teknik yang sama menghasilkan SGD konvergensi baru dengan ketergantungan yang lebih baik pada eigengap. Dealbreaker: Model Variabel Laten Nonlinier untuk Data Pendidikan Andrew Lan Rice University. Tom Goldstein University of Maryland. Universitas Rice Richard Baraniuk. Christoph Studer Cornell University Paper AbstractStatistical models of student responses on assessment questions, such as those in homeworks and exams, enable educators and computer-based personalized learning systems to gain insights into students knowledge using machine learning. Popular student-response models, including the Rasch model and item response theory models, represent the probability of a student answering a question correctly using an affine function of latent factors. While such models can accurately predict student responses, their ability to interpret the underlying knowledge structure (which is certainly nonlinear) is limited. In response, we develop a new, nonlinear latent variable model that we call the dealbreaker model, in which a students success probability is determined by their weakest concept mastery. We develop efficient parameter inference algorithms for this model using novel methods for nonconvex optimization. We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We further demonstrate that the parameters learned by the dealbreaker model are interpretablethey provide key insights into which concepts are critical (i.e. the dealbreaker) to answering a question correctly. We conclude by reporting preliminary results for a movie-rating dataset, which illustrate the broader applicability of the dealbreaker model. We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein8217s identity and the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly. Variable Elimination in the Fourier Domain Yexiang Xue Cornell University . Stefano Ermon . Ronan Le Bras Cornell University . Carla . Bart Paper AbstractThe ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements. Low-rank matrix approximation has been widely adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and noisy, introduces challenges to the algorithm stability 8212 small changes in the training data may significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset, and minimizing the training error may not guarantee error reduction on the testing dataset. In this paper, we investigate the algorithm stability problem of low-rank matrix approximations. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better prediction accuracy compared with both state-of-the-art low-rank matrix approximation methods and ensemble methods in recommendation task. Given samples from two densities p and q, density ratio estimation (DRE) is the problem of estimating the ratio pq. Two popular discriminative approaches to DRE are KL importance estimation (KLIEP), and least squares importance fitting (LSIF). In this paper, we show that KLIEP and LSIF both employ class-probability estimation (CPE) losses. Motivated by this, we formally relate DRE and CPE, and demonstrate the viability of using existing losses from one problem for the other. For the DRE problem, we show that essentially any CPE loss (eg logistic, exponential) can be used, as this equivalently minimises a Bregman divergence to the true density ratio. We show how different losses focus on accurately modelling different ranges of the density ratio, and use this to design new CPE losses for DRE. For the CPE problem, we argue that the LSIF loss is useful in the regime where one wishes to rank instances with maximal accuracy at the head of the ranking. In the course of our analysis, we establish a Bregman divergence identity that may be of independent interest. We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD) but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings. Hierarchical Variational Models Rajesh Ranganath . Dustin Tran Columbia University . Blei David Columbia Paper AbstractBlack box variational inference allows researchers to easily prototype and evaluate an array of models. Recent advances allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution that maintains efficient computation To address this, we develop hierarchical variational models (HVMs). HVMs augment a variational approximation with a prior on its parameters, which allows it to capture complex structure for both discrete and continuous latent variables. The algorithm we develop is black box, can be used for any HVM, and has the same computational efficiency as the original approximation. We study HVMs on a variety of deep discrete latent variable models. HVMs generalize other expressive variational distributions and maintains higher fidelity to the posterior. The field of mobile health (mHealth) has the potential to yield new insights into health and behavior through the analysis of continuously recorded data from wearable health and activity sensors. In this paper, we present a hierarchical span-based conditional random field model for the key problem of jointly detecting discrete events in such sensor data streams and segmenting these events into high-level activity sessions. Our model includes higher-order cardinality factors and inter-event duration factors to capture domain-specific structure in the label space. We show that our model supports exact MAP inference in quadratic time via dynamic programming, which we leverage to perform learning in the structured support vector machine framework. We apply the model to the problems of smoking and eating detection using four real data sets. Our results show statistically significant improvements in segmentation performance relative to a hierarchical pairwise CRF. Binary embeddings with structured hashed projections Anna Choromanska Courant Institute, NYU . Krzysztof Choromanski Google Research NYC . Mariusz Bojarski NVIDIA . Tony Jebara Columbia . Sanjiv Kumar . Yann Paper AbstractWe consider the hashing mechanism for constructing binary embeddings, that involves pseudo-random projections followed by nonlinear (sign function) mappings. The pseudorandom projection is described by a matrix, where not all entries are independent random variables but instead a fixed budget of randomness is distributed across the matrix. Such matrices can be efficiently stored in sub-quadratic or even linear space, provide reduction in randomness usage (i.e. number of required random values), and very often lead to computational speed ups. We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors. To the best of our knowledge, these results are the first that give theoretical ground for the use of general structured matrices in the nonlinear setting. In particular, they generalize previous extensions of the Johnson- Lindenstrauss lemma and prove the plausibility of the approach that was so far only heuristically confirmed for some special structured matrices. Consequently, we show that many structured matrices can be used as an efficient information compression mechanism. Our findings build a better understanding of certain deep architectures, which contain randomly weighted and untrained layers, and yet achieve high performance on different learning tasks. We empirically verify our theoretical findings and show the dependence of learning via structured hashed projections on the performance of neural network as well as nearest neighbor classifier. A Variational Analysis of Stochastic Gradient Algorithms Stephan Mandt Columbia University . Matthew Hoffman Adobe Research . Blei David Columbia Paper AbstractStochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models. This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a textit gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques. Tensor regression has shown to be advantageous in learning tasks with multi-directional relatedness. Given massive multiway data, traditional methods are often too slow to operate on or suffer from memory bottleneck. In this paper, we introduce subsampled tensor projected gradient to solve the problem. Our algorithm is impressively simple and efficient. It is built upon projected gradient method with fast tensor power iterations, leveraging randomized sketching for further acceleration. Theoretical analysis shows that our algorithm converges to the correct solution in fixed number of iterations. The memory requirement grows linearly with the size of the problem. We demonstrate superior empirical performance on both multi-linear multi-task learning and spatio-temporal applications. This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a high-order Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two real-world datasets. Online Stochastic Linear Optimization under One-bit Feedback Lijun Zhang Nanjing University . Tianbao Yang University of Iowa . Rong Jin Alibaba Group . Yichi Xiao Nanjing University . Zhi-hua Zhou Paper AbstractIn this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. Although the existing method for generalized linear bandit can be applied to our problem, the high computational cost makes it impractical for real-world applications. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of O(dsqrt ), which matches the optimal result of stochastic linear bandits. We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints, which are constraints that need to be satisfied when accumulated over a finite number of rounds T, but can be violated in intermediate rounds. For some user-defined trade-off parameter beta in (0, 1), the proposed algorithm achieves cumulative regret bounds of O(Tmax ) and O(T ), respectively for the loss and the constraint violations. Our results hold for convex losses, can handle arbitrary convex constraints and rely on a single computationally efficient algorithm. Our contributions improve over the best known cumulative regret bounds of Mahdavi et al. (2012), which are respectively O(T12) and O(T34) for general convex domains, and respectively O(T23) and O(T23) when the domain is further restricted to be a polyhedral set. We supplement the analysis with experiments validating the performance of our algorithm in practice. Motivated by an application of eliciting users8217 preferences, we investigate the problem of learning hemimetrics, i.e. pairwise distances among a set of n items that satisfy triangle inequalities and non-negativity constraints. In our application, the (asymmetric) distances quantify private costs a user incurs when substituting one item by another. We aim to learn these distances (costs) by asking the users whether they are willing to switch from one item to another for a given incentive offer. Without exploiting structural constraints of the hemimetric polytope, learning the distances between each pair of items requires Theta(n2) queries. We propose an active learning algorithm that substantially reduces this sample complexity by exploiting the structural constraints on the version space of hemimetrics. Our proposed algorithm achieves provably-optimal sample complexity for various instances of the task. For example, when the items are embedded into K tight clusters, the sample complexity of our algorithm reduces to O(n K). Extensive experiments on a restaurant recommendation data set support the conclusions of our theoretical analysis. We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their ability to abstract the underlying algorithm from training instances and generalize to test examples with many thousands of digits. The controller is trained using Q-learning with several enhancements and we show that the bottleneck is in the capabilities of the controller rather than in the search incurred by Q-learning. Learning Physical Intuition of Block Towers by Example Adam Lerer Facebook AI Research . Sam Gross Facebook AI Research . Rob Fergus Facebook AI Research Paper AbstractWooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the trajectories of the blocks. The models are also able to generalize in two important ways: (i) to new physical scenarios, e.g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects. Structure Learning of Partitioned Markov Networks Song Liu The Inst. of Stats. Math. . Taiji Suzuki . Masashi Sugiyama University of Tokyo . Kenji Fukumizu The Institute of Statistical Mathematics Paper AbstractWe learn the structure of a Markov Network between two groups of random variables from joint observations. Since modelling and learning the full MN structure may be hard, learning the links between two groups directly may be a preferable option. We introduce a novel concept called the emph whose factorization directly associates with the Markovian properties of random variables across two groups. A simple one-shot convex optimization procedure is proposed for learning the emph factorizations of the partitioned ratio and it is theoretically guaranteed to recover the correct inter-group structure under mild conditions. The performance of the proposed method is experimentally compared with the state of the art MN structure learning methods using ROC curves. Real applications on analyzing bipartisanship in US congress and pairwise DNAtime-series alignments are also reported. This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i.e. the minimizers change slowly), we present several improved variation-based upper bounds of the dynamic regret under the true and noisy gradient feedback, which are in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant8217s minimizers, to which we refer as path variation. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches that is achieved with full information. Beyond CCA: Moment Matching for Multi-View Models Anastasia Podosinnikova INRIA 8211 ENS . Francis Bach Inria . Simon Lacoste-Julien INRIA Paper AbstractWe introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees. We consider moment matching techniques for estimation in these models. For that, by drawing explicit links between the new models and a discrete version of independent component analysis (DICA), we first extend the DICA cumulant tensors to the new discrete version of CCA. By further using a close connection with independent component analysis, we introduce generalized covariance matrices, which can replace the cumulant tensors in the moment matching framework, and, therefore, improve sample complexity and simplify derivations and algorithms significantly. As the tensor power method or orthogonal joint diagonalization are not applicable in the new setting, we use non-orthogonal joint diagonalization techniques for matching the cumulants. We demonstrate performance of the proposed models and estimation techniques on experiments with both synthetic and real datasets. We present two computationally inexpensive techniques for estimating the numerical rank of a matrix, combining powerful tools from computational linear algebra. These techniques exploit three key ingredients. The first is to approximate the projector on the non-null invariant subspace of the matrix by using a polynomial filter. Two types of filters are discussed, one based on Hermite interpolation and the other based on Chebyshev expansions. The second ingredient employs stochastic trace estimators to compute the rank of this wanted eigen-projector, which yields the desired rank of the matrix. In order to obtain a good filter, it is necessary to detect a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that correspond to the non-null invariant subspace. The third ingredient of the proposed approaches exploits the idea of spectral density, popular in physics, and the Lanczos spectroscopic method to locate this gap. Unsupervised Deep Embedding for Clustering Analysis Junyuan Xie University of Washington . Ross Girshick Facebook . Ali Farhadi University of Washington Paper AbstractClustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods. Dimensionality reduction is a popular approach for dealing with high dimensional data that leads to substantial computational savings. Random projections are a simple and effective method for universal dimensionality reduction with rigorous theoretical guarantees. In this paper, we theoretically study the problem of differentially private empirical risk minimization in the projected subspace (compressed domain). Empirical risk minimization (ERM) is a fundamental technique in statistical machine learning that forms the basis for various learning algorithms. Starting from the results of Chaudhuri et al. (NIPS 2009, JMLR 2011), there is a long line of work in designing differentially private algorithms for empirical risk minimization problems that operate in the original data space. We ask: is it possible to design differentially private algorithms with small excess risk given access to only projected data In this paper, we answer this question in affirmative, by showing that for the class of generalized linear functions, we can obtain excess risk bounds of O(w(Theta) n ) under eps-differential privacy, and O((w(Theta)n) ) under (eps,delta)-differential privacy, given only the projected data and the projection matrix. Here n is the sample size and w(Theta) is the Gaussian width of the parameter space that we optimize over. Our strategy is based on adding noise for privacy in the projected subspace and then lifting the solution to original space by using high-dimensional estimation techniques. A simple consequence of these results is that, for a large class of ERM problems, in the traditional setting (i.e. with access to the original data), under eps-differential privacy, we improve the worst-case risk bounds of Bassily et al. (FOCS 2014). We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e.g. when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets. Large-Margin Softmax Loss for Convolutional Neural Networks Weiyang Liu Peking University . Yandong Wen South China University of Technology . Zhiding Yu Carnegie Mellon University . Meng Yang Shenzhen University Paper AbstractCross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks. A Random Matrix Approach to Echo-State Neural Networks Romain Couillet CentraleSupelec . Gilles Wainrib ENS Ulm, Paris, France . Hafiz Tiomoko Ali CentraleSupelec, Gif-sur-Yvette, France . Harry Sevi ENS Lyon, Lyon, Paris Paper AbstractRecurrent neural networks, especially in their linear version, have provided many qualitative insights on their performance under different configurations. This article provides, through a novel random matrix framework, the quantitative counterpart of these performance results, specifically in the case of echo-state networks. Beyond mere insights, our approach conveys a deeper understanding on the core mechanism under play for both training and testing. One-hot CNN (convolutional neural network) has been shown to be effective for text categorization (Johnson 038 Zhang, 2015). We view it as a special case of a general framework which jointly trains a linear model with a non-linear feature generator consisting of text region embedding pooling8217. Under this framework, we explore a more sophisticated region embedding method using Long Short-Term Memory (LSTM). LSTM can embed text regions of variable (and possibly large) sizes, whereas the region size needs to be fixed in a CNN. We seek effective and efficient use of LSTM for this purpose in the supervised and semi-supervised settings. The best results were obtained by combining region embeddings in the form of LSTM and convolution layers trained on unlabeled data. The results indicate that on this task, embeddings of text regions, which can convey complex concepts, are more useful than embeddings of single words in isolation. We report performances exceeding the previous best results on four benchmark datasets. Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid (or even non-paid) workers. We study the problem of recovering the true labels from noisy crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap under a simple but canonical scenario where each worker is assigned at most two tasks. In particular, we introduce a tighter lower bound on the fundamental limit and prove that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly la- bel a larger fraction of the tasks. In the general setting, when more than two tasks are assigned to each worker, we establish the dominance result on BP that it outperforms other existing algorithms with known provable guarantees. Experimental results suggest that BP is close to optimal for all regimes considered, while existing state-of-the-art algorithms exhibit suboptimal performances. Learning control has become an appealing alternative to the derivation of control laws based on classic control theory. However, a major shortcoming of learning control is the lack of performance guarantees which prevents its application in many real-world scenarios. As a step in this direction, we provide a stability analysis tool for controllers acting on dynamics represented by Gaussian processes (GPs). We consider arbitrary Markovian control policies and system dynamics given as (i) the mean of a GP, and (ii) the full GP distribution. For the first case, our tool finds a state space region, where the closed-loop system is provably stable. In the second case, it is well known that infinite horizon stability guarantees cannot exist. Instead, our tool analyzes finite time stability. Empirical evaluations on simulated benchmark problems support our theoretical results. Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any partys private data We propose to transfer the knowledge of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(epsilon M ). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection. Network Morphism Tao Wei University at Buffalo . Changhu Wang Microsoft Research . Yong Rui Microsoft Research . Chang Wen Chen Paper AbstractWe present a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as network morphism in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous non-linear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme. Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting. Budget constrained optimal design of experiments is a classical problem in statistics. Although the optimal design literature is very mature, few efficient strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning and statistics. In this work, we study experimental design for the setting where the underlying regression model is characterized by a ell1-regularized linear function. We propose two novel strategies: the first is motivated geometrically whereas the second is algebraic in nature. We obtain tractable algorithms for this problem and also hold for a more general class of sparse linear models. We perform an extensive set of experiments, on benchmarks and a large multi-site neuroscience study, showing that the proposed models are effective in practice. The latter experiment suggests that these ideas may play a small role in informing enrollment strategies for similar scientific studies in the short-to-medium term future. Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs Anton Osokin . Jean-Baptiste Alayrac ENS . Isabella Lukasewitz INRIA . Puneet Dokania INRIA and Ecole Centrale Paris . Simon Lacoste-Julien INRIA Paper AbstractIn this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an adaptive criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gap-based sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets. Exact Exponent in Optimal Rates for Crowdsourcing Chao Gao Yale University . Yu Lu Yale University . Dengyong Zhou Microsoft Research Paper AbstractCrowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(pi), where m is the number of workers and I(pi) is the average Chernoff information that characterizes the workers8217 collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m ge frac logfrac in order to achieve an epsilon misclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters. Unsupervised learning and supervised learning are key research topics in deep learning. However, as high-capacity supervised neural networks trained with a large amount of labels have achieved remarkable success in many computer vision tasks, the availability of large-scale labeled images reduced the significance of unsupervised learning. Inspired by the recent trend toward revisiting the importance of unsupervised learning, we investigate joint supervised and unsupervised learning in a large-scale setting by augmenting existing neural networks with decoding pathways for reconstruction. First, we demonstrate that the intermediate activations of pretrained large-scale classification networks preserve almost all the information of input images except a portion of local spatial details. Then, by end-to-end training of the entire augmented architecture with the reconstructive objective, we show improvement of the network performance for supervised tasks. We evaluate several variants of autoencoders, including the recently proposed 8220what-where8221 autoencoder that uses the encoder pooling switches, to study the importance of the architecture design. Taking the 16-layer VGGNet trained under the ImageNet ILSVRC 2012 protocol as a strong baseline for image classification, our methods improve the validation-set accuracy by a noticeable margin. (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is n-by-n (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n2) to O(pd), with p being the ambient dimension and d being some estimated rank (d 20 reduction in the model size without any loss in accuracy on CIFAR-10 benchmark. We also demonstrate that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78 error-rate on CIFAR-10 benchmark. Provable Algorithms for Inference in Topic Models Sanjeev Arora Princeton University . Rong Ge . Frederic Koehler Princeton University . Tengyu Ma Princeton University . Ankur Moitra Paper AbstractRecently, there has been considerable progress on designing algorithms with provable guarantees 8212typically using linear algebraic methods8212for parameter learning in latent variable models. Designing provable algorithms for inference has proved more difficult. Here we tak e a first step towards provable inference in topic models. We leverage a property of topic models that enables us to construct simple linear estimators for the unknown topic proportions that have small variance, and consequently can work with short documents. Our estimators also correspond to finding an estimate around which the posterior is well-concentrated. We show lower bounds that for shorter documents it can be information theoretically impossible to find the hidden topics. Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. It yields good solutions on synthetic data and runs in time comparable to a single iteration of Gibbs sampling. This paper develops an approach for efficiently solving general convex optimization problems specified as disciplined convex programs (DCP), a common general-purpose modeling framework. Specifically we develop an algorithm based upon fast epigraph projections, projections onto the epigraph of a convex function, an approach closely linked to proximal operator methods. We show that by using these operators, we can solve any disciplined convex program without transforming the problem to a standard cone form, as is done by current DCP libraries. We then develop a large library of efficient epigraph projection operators, mirroring and extending work on fast proximal algorithms, for many common convex functions. Finally, we evaluate the performance of the algorithm, and show it often achieves order of magnitude speedups over existing general-purpose optimization solvers. We study the fixed design segmented regression problem: Given noisy samples from a piecewise linear function f, we want to recover f up to a desired accuracy in mean-squared error. Previous rigorous approaches for this problem rely on dynamic programming (DP) and, while sample efficient, have running time quadratic in the sample size. As our main contribution, we provide new sample near-linear time algorithms for the problem that 8211 while not being minimax optimal 8211 achieve a significantly better sample-time tradeoff on large datasets compared to the DP approach. Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of 2 to 4, while achieving speedups of three orders of magnitude. Energetic Natural Gradient Descent Philip Thomas CMU . Bruno Castro da Silva . Christoph Dann Carnegie Mellon University . Emma Paper AbstractWe propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient. Partition Functions from Rao-Blackwellized Tempered Sampling David Carlson Columbia University . Patrick Stinson Columbia University . Ari Pakman Columbia University . Liam Paper AbstractPartition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM) moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost. In this paper we address the identifiability and efficient learning problems of finite mixtures of Plackett-Luce models for rank data. We prove that for any kgeq 2, the mixture of k Plackett-Luce models for no more than 2k-1 alternatives is non-identifiable and this bound is tight for k2. For generic identifiability, we prove that the mixture of k Plackett-Luce models over m alternatives is if kleqlfloorfrac 2rfloor. We also propose an efficient generalized method of moments (GMM) algorithm to learn the mixture of two Plackett-Luce models and show that the algorithm is consistent. Our experiments show that our GMM algorithm is significantly faster than the EMM algorithm by Gormley 038 Murphy (2008), while achieving competitive statistical efficiency. The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments. Power of Ordered Hypothesis Testing Lihua Lei Lihua . William Fithian UC Berkeley, Department of Statistics Paper AbstractOrdered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li 038 Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically powerless when the ordering is weaker. By contrast, Selective SeqStep, proposed by Barber 038 Candes (2015), is much less sensitive to the quality of the ordering. We compare the power of these procedures in different regimes, concluding that Selective SeqStep dominates accumulation tests if either the ordering is weak or non-null hypotheses are sparse or weak. Motivated by our asymptotic analysis, we derive an improved version of Selective SeqStep which we call Adaptive SeqStep, analogous to Storeys improvement on the Benjamini-Hochberg proce- dure. We compare these methods using the GEO-Query data set analyzed by (Li 038 Barber, 2015) and find Adaptive SeqStep has favorable performance for both good and bad prior orderings. PHOG: Probabilistic Model for Code Pavol Bielik ETH Zurich . Veselin Raychev ETH Zurich . Martin Vechev ETH Zurich Paper AbstractWe introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code. We consider the problem of online prediction in changing environments. In this framework the performance of a predictor is evaluated as the loss relative to an arbitrarily changing predictor, whose individual components come from a base class of predictors. Typical results in the literature consider different base classes (experts, linear predictors on the simplex, etc.) separately. Introducing an arbitrary mapping inside the mirror decent algorithm, we provide a framework that unifies and extends existing results. As an example, we prove new shifting regret bounds for matrix prediction problems. Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model. Hyperparameters are adjusted so as to make the model parameter gradients, and hence updates, more advantageous for the validation cost. We explore the approach for tuning regularization hyperparameters and find that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions. The additional computational cost depends on how frequently the hyperparameters are trained, but the tested scheme adds only 30 computational overhead regardless of the model size. Since the method is significantly less computationally demanding compared to similar gradient-based approaches to hyperparameter optimization, and consistently finds good hyperparameter values, it can be a useful tool for training neural network models. Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics. Due to its numerous applications, rank aggregation has become a problem of major interest across many fields of the computer science literature. In the vast majority of situations, Kemeny consensus(es) are considered as the ideal solutions. It is however well known that their computation is NP-hard. Many contributions have thus established various results to apprehend this complexity. In this paper we introduce a practical method to predict, for a ranking and a dataset, how close the Kemeny consensus(es) are to this ranking. A major strength of this method is its generality: it does not require any assumption on the dataset nor the ranking. Furthermore, it relies on a new geometric interpretation of Kemeny aggregation that, we believe, could lead to many other results. Horizontally Scalable Submodular Maximization Mario Lucic ETH Zurich . Olivier Bachem ETH Zurich . Morteza Zadimoghaddam Google Research . Andreas Krause Paper AbstractA variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity 8211 number of instances that can fit in memory 8211 must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization under fixed capacity. The proposed framework applies to a broad class of algorithms and constraints and provides theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that it achieves performance competitive with the centralized greedy solution. Group Equivariant Convolutional Networks Taco Cohen University of Amsterdam . Max Welling University of Amsterdam CIFAR Paper AbstractWe introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CIFAR10 and rotated MNIST. The partition function is fundamental for probabilistic graphical models8212it is required for inference, parameter estimation, and model selection. Evaluating this function corresponds to discrete integration, namely a weighted sum over an exponentially large set. This task quickly becomes intractable as the dimensionality of the problem increases. We propose an approximation scheme that, for any discrete graphical model whose parameter vector has bounded norm, estimates the partition function with arbitrarily small error. Our algorithm relies on a near minimax optimal polynomial approximation to the potential function and a Clenshaw-Curtis style quadrature. Furthermore, we show that this algorithm can be randomized to split the computation into a high-complexity part and a low-complexity part, where the latter may be carried out on small computational devices. Experiments confirm that the new randomized algorithm is highly accurate if the parameter norm is small, and is otherwise comparable to methods with unbounded error. Correcting Forecasts with Multifactor Neural Attention Matthew Riemer IBM . Aditya Vempaty IBM . Flavio Calmon IBM . Fenno Heath IBM . Richard Hull IBM . Elham Khabiri IBM Paper AbstractAutomatic forecasting of time series data is a challenging problem in many industries. Current forecast models adopted by businesses do not provide adequate means for including data representing external factors that may have a significant impact on the time series, such as weather, national events, local events, social media trends, promotions, etc. This paper introduces a novel neural network attention mechanism that naturally incorporates data from multiple external sources without the feature engineering needed to get other techniques to work. We demonstrate empirically that the proposed model achieves superior performance for predicting the demand of 20 commodities across 107 stores of one of America8217s largest retailers when compared to other baseline models, including neural networks, linear models, certain kernel methods, Bayesian regression, and decision trees. Our method ultimately accounts for a 23.9 relative improvement as a result of the incorporation of external data sources, and provides an unprecedented level of descriptive ability for a neural network forecasting model. Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, 8220Would this patient have lower blood sugar had she received a different medication8221. We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art. Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets US stock data, US house price index data and currency exchange rate data. We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings. Slice Sampling on Hamiltonian Trajectories Benjamin Bloem-Reddy Columbia University . John Cunningham Columbia University Paper AbstractHamiltonian Monte Carlo and slice sampling are amongst the most widely used and studied classes of Markov Chain Monte Carlo samplers. We connect these two methods and present Hamiltonian slice sampling, which allows slice sampling to be carried out along Hamiltonian trajectories, or transformations thereof. Hamiltonian slice sampling clarifies a class of model priors that induce closed-form slice samplers. More pragmatically, inheriting properties of slice samplers, it offers advantages over Hamiltonian Monte Carlo, in that it has fewer tunable hyperparameters and does not require gradient information. We demonstrate the utility of Hamiltonian slice sampling out of the box on problems ranging from Gaussian process regression to Pitman-Yor based mixture models. Noisy Activation Functions Caglar Glehre . Marcin Moczulski . Misha Denil . Yoshua Bengio U. of Montreal Paper AbstractCommon nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only). Gating mechanisms that use softly saturating activation functions to emulate the discrete switching of digital logic circuits are good examples of this. We propose to exploit the injection of appropriate noise so that the gradients may flow easily, even if the noiseless application of the activation function would yield zero gradients. Large noise will dominate the noise-free gradient and allow stochastic gradient descent to explore more. By adding noise only to the problematic parts of the activation function, we allow the optimization procedure to explore the boundary between the degenerate saturating) and the well-behaved parts of the activation function. We also establish connections to simulated annealing, when the amount of noise is annealed down, making it easier to optimize hard objective functions. We find experimentally that replacing such saturating activation functions by noisy variants helps optimization in many contexts, yielding state-of-the-art or competitive results on different datasets and task, especially when training seems to be the most difficult, e.g. when curriculum learning is necessary to obtain good results. PD-Sparse. A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification Ian En-Hsu Yen University of Texas at Austin . Xiangru Huang UTaustin . Pradeep Ravikumar UT Austin . Kai Zhong ICES department, University of Texas at Austin . Inderjit Paper AbstractWe consider Multiclass and Multilabel classification with extremely large number of classes, of which only few are labeled to each instance. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus aim to reduce the complexity by exploiting correlation between labels under assumption that the similarity between labels can be captured by structures such as low-rank matrix or balanced tree. However, as the diversity of labels increases in the feature space, structural assumption can be easily violated, which leads to degrade in the testing performance. In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm that exploits both primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables. A bi-stochastic search method is proposed to further improve the efficiency. In our experiments on both Multiclass and Multilabel problems, the proposed method achieves significant higher accuracy than existing approaches of Extreme Classification with very competitive training and prediction time.As more people become interested in Lean ideas and their application to knowledge work and project management, it8217s helpful to find ways that make it easier to get started or learn a few basic concepts that can lead to deeper insights later. For those that are curious about kanban in an office context, it8217s not unusual to find people who are either currently using Scrum, or have some understanding of Scrum as representative of Agile thinking. One way or another, Scrum users are an important constituent of the Kanban audience. Since Scrum can be described as a statement in the language we use to describe kanban systems, it is also fairly easy to elaborate on that case in order to describe ScrumKanban hybrids. This can be useful for existing Scrum teams who are looking to improve their scale or capability. It can also be useful for more cautious new users who find comfort in an 8220established8221 method 1 . The idea of using a simple task board with index cards or sticky notes is as old as Agile itself. A simple variation of this is a task board with a simple Pending - In Process - Complete workflow. The cards represent work items that are in the current scope of work. Names can be associated with the cards to indicate who8217s working on what. Agile teams have been using this sort of method for a long time, and a few people pointed out early on that this had some resemblance to the notion of kanban in lean systems. Of course, a variety of electronic tools exist that perform these functions, but the simple task board represents a couple of lean principles that I find very valuable, simple technology and visual control . The utility of such a simple method of workflow management is that it is easy to manage, and more importantly, it is easy to change . Huddling around a computer monitor, even a very large one, is in no way a substitute for the tactile and social interactivity that accompanies manipulating a large task board. Maybe someday it will. Not today. What electronic tools are good for are managing lists of things, like backlogs and bugs, and producing reports. Simple tools can be a difficult concept to explain to technology fanatics, but then, so can value . A problem with the basic index-card task board is that there is nothing to prevent you from accumulating a big pile of work in process. Time-boxing, by its nature, sets a bound on how much WIP that can be, but it can still allow much more than would be desirable. If a kanban is a token that represents a work request, and our task board can still get out of control, then what is the problem here The problem is that a kanban is more than just a work request on a card, and putting sticky notes on a whiteboard is not enough to implement a pull system. A kanban is more than an index card In a modern economy, the production and distribution of scarce goods and services are regulated by a system of money and prices. Money can be represented by currency notes, which have little intrinsic value, but that by agreement, can be exchanged for real goods and services. The existence of a neutral medium of exchange makes possible a system of economic calculation of the relative scarcity of the supply of goods in an economy. Such a system of prices is a market. Markets communicate the value of economic production and distribution to their participants. If a currency note can be exchanged for an object of real value, then there must be some way to enforce the scarcity of the notes in a way that corresponds to the scarcity of real value in the economy. In practice, some kind of institution must enforce this scarcity. The health of a market economy depends greatly on the ability of its monetary institution to coordinate the supply of money with the supply of goods and services. In an unhealthy economy, unstable prices make economic calculation difficult and disrupt the communication between producers and consumers needed for efficient production and distribution. A kanban represents a portion of the productive capacity of some closed internal economy. It is a medium of exchange for the goods and services provided by the operations of a system of productive resources. The supply of kanban in circulation is controlled by some regulatory function that enforces its value. That is, a kanban is a kind of private currency and the shop floor manager is the bank that issues it, for the purpose of economic calculation. If you carry the currency analogy further, then you might say that kanban is not about the cards at all. Just like money is not about the bills. Kanban is all about the limits, the quantity in circulation. How that is represented in a transaction is mostly incidental. A simple rule for understanding all of this might be: A task card without a limit is not a kanban in the same way that a photocopy of a dollar bill is not money. If you use a durable token like a plastic card, then this is easy to manage: control the number of cards in circulation. If all of the available cards are already in circulation, then the next person who comes looking for one is just going to have to wait until one returns. This is the very purpose of the kanban system. However, if you use a more disposable medium like index cards or sticky notes, then you need another mechanism to regulate the 8220money supply.8221 In our case, we simply write the quantity of kanban in circulation on the task board, and allocate new cards according to that limit. This means that a kanban serves two functions: it is a request to do something in particular, but it is also permission to do something in general. That second notion of permission is where people who are new to lean thinking tend to struggle. But this is precisely how we can 8220optimize the whole8221 or 8220subordinate to the constraint.8221 Crunchy on the outside, chewy on the inside Just as an unregulated index card on a cork board is not a kanban, time-boxed iteration planning is not pull. No reasonable interpretation of Lean involves building to a one-month forecast unless the cycle time for each work order is also a month. One month worth of stuff in process is certainly a much smaller batch size than 3 months or 18 months, but if your iteration backlog contains 20 work items, then that8217s still about 19 more than it needs to be a pull system. Nonetheless, it is not difficult to augment Scrum with a few simple practices that move us towards a more recognizably lean workflow. The most obvious is the reduction of iteration length, although this is not without problems 2. As we8217ll see, it8217s possible to incrementally enhance Scrum with more and more pull-like features until all that remains of the original process is vestigial scaffolding. The simple approach is to start with Scrum-like iterations and iteration planning process, and begin to add pull features to the team8217s internal process. One simple technique that brings us much closer to our kanban definition is to set a multitasking limit for individuals. You might have a simple principle like: prefer completing work to starting new work, or you might express that as a rule that says: try to work on only one item at a time, but if you are blocked, then you can work on a second item, but no more . In our example, that rule gives us an effective WIP limit of 6. Another common technique is the late binding of tasks to owners. Some teams will pre-assign all of the known tasks during iteration planning. That8217s generally not a good idea because it artificially creates a critical path. Waiting until the 8220last responsible moment8221 to assign tasks to people maximizes knowledge and brings you closer to pull. Just because anybody can have more than one item in process doesn8217t mean that everybody should have more than one item in process. A problem with our multitasking rule is that it locally optimizes with no consideration of the whole. An implicit total WIP limit of 6 is still more WIP than we should probably tolerate for our three workers. A limit of 4 of 5 total items in process at one time still allows for some multitasking exceptions, but disallows the obviously dysfunctional behavior of everybody carrying two items. At this step, we have moved beyond a rule about individuals and have made a rule about the task cards themselves. That is, we have made our cards into kanban. Another enhancement we can make to our previous board is to add a ready queue between the backlog and work-in-process. The ready queue contains items that are pending from the backlog, but have high priority. We still haven8217t bound any individual to these tasks, but as soon as somebody becomes available, they should take one of these tasks instead of picking something out of the general backlog. This enables us to decouple the process of assigning work from the process of prioritizing work, and it simplifies assignment. The ready queue also has a kanban limit, and it should be a small limit, since its only purpose is to indicate which work item should be started next. Now we can begin to see some of the mechanics of pull and flow: 1. David completes a task and moves it into the 8220done8221 column. 2. David pulls a new kanban from the ready queue and begins working. 3. The team responds to the pull event and selects the next priority item to go into the ready queue. At this point, we are now operating a simple kanban pull system. We still have our time-boxed iteration and planning cycle, so perhaps we might call such a thing a Scrumban system Now that we have a sense of capacity and pull, it8217s natural to think about flow. Breaking up our nebulous 8220in process8221 state into better defined states can give everybody more visibility into the strengths, weaknesses, and overall health of the team. Even Agile workflows like Extreme Programming have relatively well-defined roles and states, and a smooth flow of work between those states is just as important as a smooth flow of work through the process overall. Here we8217ve broken down in-process into two states: specify and execute . Specify is about defining whatever criteria are necessary to determine when the work item can be considered complete. Execute is about doing the work necessary to bring that work item into a state which satisfies those criteria. We have split our previous WIP limit of 5 across these two states. Specify is considered to take less time in this case, so it is given a limit of 2. Execute consumes the remaining limit of 3. We might change this ratio as time goes on and our performance changes. Since we are now thinking more about flow, the additional workflow detail strongly suggests using a Cumulative Flow Diagram to track the work and measure our performance. A simple burndown tells you something about whether or not you are delivering value, but not very much about why. The CFD communicates a lot of additional information about lead times and inventories that can diagnose problems, or even prevent them. By defining our workflow a little better, we can also account for some functional specialization. In this case, it might be a soft specialization, where some of us prefer doing one type of work more than another, even though we8217re capable of doing it all. It8217s important to understand that this kind of pull workflow system allows specialization but does not enforce specialization. The team owns the work and the workflow, and it8217s up to the team to figure out how to get it done efficiently. If we let the person who8217s best at performing the 8220specify8221 function handle more of that work, then we may also need to coordinate handoffs between ourselves. Adding the specify-complete column communicates to the team that a work item which was previously in the specify state is now ready to be pulled by anyone who wants to move it to the execute state. Work that is still in the specify state is not eligible to be pulled yet. If the owner of a ticket in the specify state wants to hand it off, he can put it in the complete buffer. If he doesn8217t want to hand it off, he can move it directly into the execute state as long as capacity is available. It might be that the execute state is full, and the only eligible work is to pull another ticket from the ready queue into specify. Since we have added a new column for our handoff buffer, we are also increasing the WIP limit by a small amount. The tradeoff is that the increase in lead time due to the new inventory should be offset by the decrease in lead time due to the advantage of specialization. We also mitigate the impact of that new inventory by pooling the WIP limit across the preceding state. This has the very beneficial consequence of making the specify-complete buffer a variable throttle for the preceding station. The more work that piles up in the specify-complete buffer, the less work can be in process in the specify state, until specify is shut down entirely. But we see it coming, it doesn8217t 8220just happen.8221 If we8217re going to allow workflow specialization and the handoffs that result, then we will also need some agreement about what results to expect at each handoff. We can do that by defining some simple work standards or standard procedures for each state. These do not have to be complicated or exhaustive. Here, they are simple bullets or checklists drawn directly on the task board. They only need to be sufficient to avoid misunderstanding between producers and consumers. These standards are themselves made and owned by the team, and they can change them as necessary according the practice of kaizen . Putting them in a soft medium like a whiteboard or a wiki reinforces the notion of team ownership. Level 2 Scrumban In the basic version of Scrumban described so far, the iteration review and planning cycle happens just as it does in ordinary Scrum. But as our production process has matured, we have also given ourselves some tools to make the planning process more efficient, more responsive, and better integrated with the business that it serves. With the pull system in place, our flow will become smoother as our process capability improves. We can use our inter-process buffers and flow diagrams to show us our process weaknesses and opportunities for kaizen. As we get closer to level production, we will start to become less concerned with burndown and more concerned with cycle time, as one is the effect and the other is the cause. Average lead time and cycle time will become the primary focus of performance. If cycle time is under control and the team capacity is balanced against demand, then lead time will also be under control. If cycle time is under control, then burndowns are predictable and uninteresting. If burndowns are uninteresting, then goal-setting and risky heroic efforts are unnecessary. If burndowns are uninteresting, then iteration backlogs are just inventory for the purpose of planning regularity and feeding the pull system. As such, they should be the smallest inventories possible that optimize planning cost. Since the team now pulls work into a small ready queue before pulling it into WIP, then from the team8217s perspective, the utility of the iteration backlog is that it always contains something that is worth doing next. Therefore, we should use the least wasteful mechanism that will satisfy that simple condition. A simple mechanism that fits the bill is a size limit for the iteration backlog. Rather than go through the trouble of estimating a scope of work for every iteration, just make the backlog a fixed size that will occasionally run to zero before the planning interval ends. That8217s a simple calculation. It8217s just the average number of things released per iteration, which in turn is just a multiple of average cycle time. So if you have 5 things in process, on average, and it takes 5 days to complete something, on average, then you8217ll finish 1 thing per day, on average. If your iteration interval is two work weeks, or 10 work days, then the iteration backlog should be 10. You can add one or two for padding if you worry about running out. This might be a point that8217s been lost on the Scrum community: it8217s never necessary to estimate the particular sizes of things in the backlog. It8217s only necessary to estimate the average size of things in the backlog. Most of the effort spent estimating in Scrum is waste. In our final incarnation of Scrumban, iteration planning still happens at a regular interval, synchronized with review and retrospective, but the goal of planning is to fill the slots available, not fill all of the slots, and certainly not determine the number of slots. This greatly reduces the overhead and ceremony of iteration planning. Time spent batch processing for iteration planning estimation can be replaced with a quality control inspection at the time that work is promoted to the ready queue. If a work item is ill-formed, then it gets bounced and repeat offenders get a root cause analysis. Off with the training wheels If you have made it this far in your evolution, you will probably realize that the original mechanisms of Scrum are no longer doing much for you. Scrum can be a useful scaffold to hold a team together while you erect a more optimized solution in place. At some point you can slough off the cocoon and allow the pull system to spread its wings and take flight. The first step beyond Scrum is to decouple the planning and release periods. There may be a convenient interval to batch up features to release, and there may be a convenient interval to get people together to plan. If we have a leaner, more pull-driven planning method, there8217s really no reason why those two intervals should be the same. Your operations team might like to release once a month, and your product managers might like to establish a weekly prioritization routine. No reason not to accommodate them. Once you8217ve broken up the timebox, you can start to get leaner about the construction of the backlog. Agility implies an ability to respond to demand. The backlog should reflect the current understanding of business circumstances as often as possible. Which is to say, the backlog should be event-driven. Timeboxed backlog planning is just that, where the event is a timer, but once we see it that way, we can imagine other sorts of events that allow us to respond more quickly to emerging priorities. Since our system already demonstrates pull and flow, that increased responsiveness should come at no cost to our current efficiency. The problem we are trying to solve is: The ideal work planning process should always provide the development team with best thing to work on next, no more and no less. Further planning beyond this does not add value and is therefore waste. Scrum-style timeboxed planning usually provides a much bigger backlog than what is strictly necessary to pick the next work item, and as such, it is unnecessary inventory and therefore unnecessary waste. The next event we might consider for scheduling planning activities is the concept of an order point . An order point is an inventory level that triggers a process to order new materials. As we pull items from the backlog into the process, the backlog will diminish until the number of items remaining drops below the order point. When this happens, a notice goes out to the responsible parties to organize the next planning meeting. If our current backlog is 10, our throughput is 1day, and we set an order point at 5, then this planning will happen about once a week. Once a week might be reasonable if people are hard to schedule or need some lead time in order to prioritize. However, if they are more available than that, then we can set the order point lower. If the planners can respond within a day, then perhaps we can set the order point at 2. If the order point is 2, then there may be no need to keep a backlog of 10. Perhaps we can reduce the backlog to 48230and reduce our lead time by 6 days in the process. The end state of this evolution is pull, or prioritization-on-demand. If the planners can make a good decision quickly enough, and there is no economy of scale in batching priority decisions together, then the size of the backlog only needs to be 1. At the moment the item is pulled by the development team, the planning team is signaled to begin selecting the next item. If the planning team is fast enough in their response, then the development team will never stall. If there is some variation or delay in reponse, then a backlog of 2 might be necessary to prevent stalls. But 2 is still a lot smaller and leaner than 10. Or 20. Or 50, which is something I8217ve seen more often than I would like. The same kind of logic can be applied to the release interval. There is an optimum batch size for releases and we should first try to find it, and then try to improve it. The result of our efforts will ultimately be features-on-demand. Even at this level, we still have a fairly basic kanban system. From here we can add work item decomposition (swimlanes), or structural dependency flow for additional scale. Along with an enlightened division of labor. this is how we believe that Lean shows the way to scale Agile up to the enterprise. 1. in spite of the fact that the kanban idea is at least 40 years older 2. which I8217ll probably write about in another post sometime An excellent article and some food for thought. Terima kasih. Your post is really interesting and I8217ve done exactly what you8217ve described above with one of my clients who found it massively useful8230despite being really hesitant at first. They were also doing scrum (loosely and in inverted commas) and were struggling so I got them back to Scrum proper in the fall of last year and then introduced the kanban idea8217s almost the way you described above over a month or so with great results. Couple of things I wanted to get your take on: One thing that I didn8217t get round to changing with them was how they sized their stories 8211 they8217re using Story Points but I8217m wondered what your take on that would be Also how does this pull based work fit into a release plan Scrum based teams have their velocity based on completed story points and can then do some basic release management work (e.g. it8217ll be roughly 4 iterations (or sprints) before all these stories would be completed and you can draw the burndown from this information. If you8217re not doing development iterations anymore what happens to velocity and how Scrum teams traditionally use it to track progress I mention this last question as that is one that will come up a lot with Scrum teams and may be worth amending the article to cover. While I 99.44 agree with this post, I think there is still some detail that needs to be filled in. For example 8220Its only necessary to estimate the average size of things i n the backlog. Most of the effort spent estimating in Scrum is waste.8221 I would say this is true if there is managed variation in the size of the backlog. For example, all PBIs could be constrained to be small (e.g. everything is 82203 ideal days or less8221) and all PBIs bigger than that broken into smaller PBIs (I think this helps support single piece flow). At some point the product owner needs to judge PBIs based on value (e.g. ROI). How can they do that if they are just basing each PBI8217s value based on an average First, it8217s really great to hear about your results. We love hearing kanban stories from the field I think story point limits are a legitimate variation on kanban limits. It adds a little complexity to do it that way, but I wouldn8217t object if somebody felt that was best for them. I knew somebody would call me on the release planning question. I will be writing more about that in the near future, and we8217ll be discussing it at the APLN conference in Seattle next week. Throughput is continuously calculated as work items complete. We8217re managing throughput directly by managing work-in-process and cycle time. Kanban is all about fixing work-in-process, and that leaves us with cycle time which we manage with value stream and theory of constraints methods and such. Release planning consumes the historical throughput metric, and can apply methods like Minimum Marketable Features, Staged Delivery, and Rolling Wave Planning. That8217s what we recommend: a rolling wave planning event on a regular cadence. Toyota naturally makes production schedulesJust because we produce just-in-time in response to market needsdoes not mean we can operate without planning First, the Toyota Motor Company has an annual plan. This means the rough number of carsto be produced and sold during the current year. Next there is the monthly production scheduleBased on these plans, the daily production schedule is established in detail and includes production leveling. Limiting the size of work items for downstream scheduling is one of the things we recommend. A kanban workflow can extend the value stream before and after the typical boundaries of a Scrum team, so that some of the work that the Product Owner does is pulled into the workflow and managed accordingly. Size and effort estimates can be produced as a natural consequence of analysis, since that analysis has to be done anyway. Heaven knows I don8217t expect anybody to agree with me 100. You rightly point out the significance of work item sizing to single piece flow, and that is spot on. Mostly, I want to help facilitate a conversation about the implications of pull and flow for software development. That doesn8217t mean that pull and flow are 8220right8221 in some absolute sense, although I personally find them to be extremely compelling. But if we decide that pull and flow are the right answer, then this blog is mostly about figuring out what that really means, and I very much appreciate the feedback that I get here. Scrum works flawlessly for all of my teams. Why overly complicate things by adding kanban constraints From my KISS perspective, Scrum is 8220leaner8221 than kanban will ever be. Lean processes are built for competition. If 8220good enough8221 process is good enough for you, then perhaps textbook Scrum is adequate for your purposes. Comfortable is a luxury, so I hope you enjoy it. Maybe you are not in a competitive situation. Maybe your competitors are inept. If, however, you are under any pressure for systematic performance improvement, then the suggestions in this article address inefficiencies that are built into Scrum. Thanks for your post. Through years of using Scrum and tweaking to make it more lean, my current team is using techniques very similar to the ones you describe. You8217ve given me some more ideas for future kaizen meetings to further tweak the process. . section of this talk that was about applying Kanban to an existing Scrum process, 8220Scrum-ban8220, can be found on Corey8217s blog. What this does for you is allows you to evolve Scrum . . teams across time zones 8211 Corey Ladas talk about 8220Scrum-ban8221 helped me figure out that one of the key issues with distributing teams across time zones is . How to avoid having too much work in progress8230 Most teams I worked with have at some point had a lot of work in progress. Sometimes virtually the complete8230 Corey Ladas has written an interesting paper titled Scrum-ban in which he describes how a Scrum team8230 . Corey Ladas propose aux quipes Scrum d8217voluer vers le Lean en introduisant un systme de Kanban. . Corey Ladas explains Scrum-ban8230 Cory has a great post titled: Scrum-ban Lean Software Engineering. In it he describes how a team can8230 . Ladas explains Scrum-ban Cory has a enthusiastic place titled: Scrum-ban Lean Software Engineering. In it he describes how a aggroup crapper verify plus of kanban within a Scrum . It looks like it8217s getting closer to a production like system. Do you believe that creating software is a production like activity No, I believe that creating software is often a workflow-like activity. In a pull system, where does one schedule retrospectives Does the team decide how often they should occur Seems like everything else doesn8217t have a particular schedule. Firstly, a pull system creates what I8217d call an 8220event-rich environment,8221 which means there is a great deal of context and opportunity for introspection and process improvement. The pull system is giving you permission to not wait until your next 8220official8221 retrospective to change something. 8220Pink tickets8221 or 8220Andon lights8221 ought to trigger a process that can lead to a root-cause analysis and process improvement. Secondly, there should often be some kind of planning process or rhythm above the level of individual work item scheduling. This particular article showed how you could use the Scrum framework for that purpose. You could also schedule retrospectives around lower-frequency events like release of an MMF. If your MMFs are sufficiently 8220M8221 then you might use their natural rhythm to trigger planning events. Or you could schedule a regular event that may or may not coincide with some other planning or release event. One suggestion is a 2-week integrationrelease cadence, with a 6-week (or semi-quarterly) rolling wave planning event. Corey, I8217m not familiar with using 8220Pink tickets8221 or 8220Andon lights8221, but I understand the purpose. Heh, what is an MMF More jargon for me to learn heh. Great article I8217m applying this to game development where there is a transition from exploring the game mechanics (fun) using Scrum to the production phases where we develop 8-12 hours of assets using a Lean-Kanban approach. Your description of Scrumban is perfect for transitioning the team. The main thing that would prevent us from trying Scrumban across the whole project is the concern oflosing a major benefit of the iteration. Preproduction (Scrum) iterations are ideal for a 8220unifying audacious goal8221 for the team. We don8217t know all of our tasks (often only 50), so we leave room in the schedule for exploration. Leveling development, decoupling iteration planning from review8230this seems to deprecate audacious iteration goals. Am I overlooking something Thanks Clint Firstly, thank you very much Part of the thinking about the Scrumban approach is that it allows you to keep old practices that have value to you, add new practices that have value to you, and drop ones that don8217t. I told one story here about an evolution of a process, but there are other stories that could also be told. You could do a lot of the things that are in the article without giving up iterations. And the point was meant to be that you would only give them up ifwhen you recognized that they no longer have value for you. If that never happens, then you wouldn8217t give them up One thing I left out of this article was project or product planning, which can provide additional context and motivation. It seems like I only write about this in the comments8230but I will have an article soon about the relationship between Minimum Marketable Features, Rolling Wave Planning, Real Options, and Kanban. Corey Ladas has written an interesting paper titled Scrum-ban in which he describes how a Scrum team8230 . montamos o quadro do Scrum Neste vdeo apresento nosso 8220Scrum-ban8220, comentando sobre os materiais utilizados para mont-lo e quais so as informaes . . Scrum-ban Lean Software Engineering . I still don8217t get specify and execute totally. Can you give a specific example. Specifyexecute is only an example. The boundary was meant to approximate what-to-build vs. how-to-build-it. Specify is meant to be the 8220operational definition of the problem statement8221. That could be things like requirements specifications, test cases, and wireframes. In turn, these things could be represented by things like user stories and automated acceptance tests. Execute could be schematics, design verification tests, source code, integration, acceptance testing and other V038V. Corey, I found the article written by you a lot of insights to me. Many more concepts and flaws in the industry to be cleared to make it leaner and give value to the customer . rapid delivery, regular inspection, adaption, customer alignment, quality, etc. by other means (eg: kanban, FDD, etc.) then they pass the test. I8217d perhaps change this to 8220Can teams effectively . Thanks for this article, Corey. I am leading a Production SupportMinor Enhancement team in a development shop where all the features teams are using Scrum and I8217ve been looking for a way to implement an AgileLean methodology which would tolerate the constant interruptions inherent to application support and maintenance. Your approach is the most promising I8217ve found so far. Now if you8217ll excuse me, I8217m off to fight our other Scrum teams for some wall-space . different direction, but I8217m going to read up more on LeanKanban some more, (offhand, Scrum-Ban may prove helpful), and see if there are tweaks that can be borrowed from those ideas that would . Kanban 8211 Pulling Value From The Supplier8230 Before I start talking about how our team is going about our implementations of Lean and Kanban, I wanted8230 Kanban 8211 Pulling Value From The Supplier8230 . Scrum morphen in Kanban. . Corey, Do you have any productivity data to compare the improvements of a project using Scrum-ban vs. textbook Scrum It8217s still pretty early, so much of the evidence is still anecdotal or speculative. David Anderson has real data on pull system performance in general. Clinton Keith has specific data on Kanban vs Scrum. There may be some others, possibly Dave Laribee or Karl Scotland. A good place to ask would be on the Yahoo kanbandev group. . I like Twitter. A lot. Twitter has helped me connect with a diverse group of people, particularly in the Agile community. I consider myself fortunate to chat and learn from people like Lisa Crispin (lisacrispin), Brian Marick (marick), Esther Derby (estherderby), as well as lesser knowns like myself. For instance, I learned about Lisa8217s upcoming book on Agile Testing (pre-ordered), Esther8217s love of gardening and good food and have a ring-side seat between the always colourful Ron Jeffries (RonJeffries) and Bob Martin (unclebobmartin). I also discovered some great articles, including Cory Ladas8217 Scrumban. . Another batch of Interesting Scrum and Agile Blog Posts:8230 Another batch of Interesting Scrum and Agile Blog Posts:8230 . (thinking) tools that can aid in tuning current Agile practices. A perfect example is Corey Ladas8217s Scrum-ban approach a way to upgrade the default Scrum task board using Lean tools. Given the rising . Thanks for this excellent article. We8217re running with Scrum but need to be that little bit more agile so am looking into Kanban. This article is good at helping map out a transition approach. Cheers. . scrum Following Skype, Twitter and e-mail discussions with Rick Cogley, looking for example at Scrum-ban, we thought about how to improve the current Scrum module, creating a more 8220visual8221 . Improving Speed and Quality8230. VisionFramtarsn Viskiptavinir eru ngir me afkst og gi verkefna deildarinnar. 8220In preparing for battle I have always found that plans are useless, but planning is indispensable82308230. Interesting article about how to evolve Kanban8230 Scrum amp Kanban Discussion on the 1712 Goal: Investigate how netkerfi and agangkerfi would start to use Kanban system ( with Scrum ) Main items discussed: Kanban Flow optimization Just in time process Reduce the planning process Kanban8230. . un rsume, vous pouvez lire ce post sur son . . Scrumban is such an approach, read more here . . hier ist nicht mehr ganz frisch, aber immer noch Wert gelesen zu werden: Scrum-ban von Corey Ladas. Es geht dabei um die Verbindung von Scrum und dem ursprnglich vom . We8217re implementing kanban for our IT Operations still in the prototyping phase but it8217s helped so far to visualize our bottlenecks. I8217ve posted a few pictures on my blog with little explanation and hope to expound in the future. Just a quick note on page 55 of your book, I found the passing reference to Axiomatic Design to require a bit more explanation. It8217s a bit coming out of nowhere and I8217d love to have more details. I agree about some of the references in the Scrumban book. I8217m sure I drop a couple of random TRIZ references as well. I do have a couple of related articles here, and more will be coming. Kanban: Some Kanban Resources8230 . are designed to reduce multi-tasking, maximize throughput, and enhance teamwork. Scrum-ban: leansoftwareengineeringkssescrum-ban. a kanban serves two functions: it is a request to do something in particular, but it is also . . anyone that hasn8217t read it Corey Ladas8217 blog post on Scrumban is truly worth the read. It describes how to evolve from Scrum into a more Kanbanesque process. . . Scrum-ban Scrum-ban: leansoftwareengineeringkssescrum-ban. 8220The idea of using a simple task board with index cards or sticky notes is as old as Agile . . conclusion that the best course of action would be to not break the stories down at all, as seen in Scrum-ban, but we8217re not ready yet, and I8217m not sure we will be in this project. So in the mean . . Scrum-ban. Interesting attempt to mix Scrum and Kanban, taking the best from both worlds. Kanban with iterations is possible. . . on is Kanban. I believe Kanban is a great fit to many teams and situations. Specifically doing Scrumban is a great way to get the benefits of Agile project management together with the Lean Flow Kanban . . Scrum-ban Lean Software Engineering . . to you in order to improve an existing Scrum team or as a step in moving towards Kanban (see also Scrumban by Corey . IT (Kanban Development)8230 (Kanban Development)8230 . in Miami is available at seplk2009 with talks from among others Corey Ladas on scrumban, Alan Shalloway on going beyond Toyota, David Anderson on Kanban, Karl Scotland on Kanban flow and . . sich im Vorfeld etwas informieren mchte kann nachlesen bei Henrik Kniberg, Boris Gloger oder Corey Ladas (Autor des Buches . If you use the post-its with the sticky side at on the bottom edge, instead of the top edge), then it8217s easy to see the writing at the top of each note, even if crowded. . Scrum-ban Scrum-ban: leansoftwareengineeringkssescrum-ban. 8220The idea of using a simple task board with index cards or sticky notes is as old as Agile . . Kanban - Feature development is streamlined by moving features through a kanban 8220pull8221 system. Kanban systems can take many forms. Most kanbans are com prised of two primary components: units (i.e. Goal) and cards (i.e. features and user stories). For more information about using kanban systems for software development, check out Scrum-ban Lean Software Engineering. . . Scrum-ban Lean Software Engineering (tags: scrum kanban) . Cory 8211 would like to watch your video on Scrumban at seplk2009core. -evolution. but noticed it never displays for me8230seeing the same thing Hi Brian. It doesn8217t work for me either. Kanban In Time-Boxes: The Cadence of WIP and Sprints8230 A comment that was left on a previous post. and a response that I made to the comment, got me thinking8230 . started throwing out the idea of Kanban instead of Scrum. Really, they are wanting to start with Scrumban, where they ease into Kanban and as Rex likes to stay, 8220kick the ends out of . . phase ourselves into XP principles that suited us when I came across a blog by Corey Ladas called Scrum-ban and now I8217m more confused and intrigued than . . (Kanban resources)Kanban vs. Scrum (Friendly comparison)One day in Kanban Land (Kanban Cartoon)Scrumban (Fully . . I also refrence this postsite: leansoftwareengineeringkssescrum-ban which was our original introduction and starting point. Follows is original description with . . Corey Ladas on Scrumban . . prioritization in any process, but particularly in the most flexible Agile processes8230like ScrumBan, of which I am a . . to never work on more than a certain limited number of items at any given time. Kanban, CONWIP and Scrum-ban are similar techniques to achieve this. I won8217t go into detail about these techniques in this . . leansoftwareengineeringkssescrum-ban Filed under: Uncategorized Leave a comment Comments (0) Trackbacks (0) ( subscribe to . Hi Corey, A wonderful article and a great way to simplify Kanban learning. I don8217t get this line- 8220This might be a point thats been lost on the Scrum community: its never necessary to estimate the particular sizes of things in the backlog. Its only necessary to estimate the average size of things in the backlog8221 What is meant by estimating average size of things All things are not the same. Let us say I have 5 MMFs in my in process (Specify) and my initial task is to find the lead time so that I can identify the limits for my backlog. I find that the lead time for 5 MMFs is 20 days. This means 1 MMF takes 4 days. If my sprint is 10 business days I could complete approx 3 MMFs. This would mean my backlog limit could be 3. But these 5 MMFs have varying sizes. If I have to identify MMFs of similar sizes to fit in my backlog (3) I would have to spend considerable time planning, breaking stories etc. In the process if I don8217t find MMFs that could fit the 3 backlog limit then what happens . Scrumban (Scrum um Kanban angereichert incl. superduper Kanban-Board) . . vs Scrum Interesting read on InfoQ. Henrik provides more info. Also worth reading David8217s . . (Kanban resources)Kanban vs. Scrum (Friendly comparison)One day in Kanban Land (Kanban Cartoon)Scrumban (Fully . . A guide to incrementally introduce Kanban ideas into a Scrum environment, known as Scrum-ban. . . Scrumshyban is a great transhysishytional step for teams and clients when trying to go from ScrumAgile or, shudshyder Watershyfall, to Kanban. . Process description This page contains information on the majority of processes that could be applied to IT project Here are the most well known processes into the PP presentation format SDLC models.ppt SDLC models82308230. . to you in order to improve an existing Scrum team or as a step in moving towards Kanban (see also Scrumban by Corey Ladas). Daily . Wireless Networks description This page contains information on the majority of processes that could be applied to IT project Here are the most well known processes into the PP presentation format SDLC models.ppt SDLC models.ppt Agile8230 . Lean Software Engineering offer a good overview of how ScrumBan differs from Scrum 8211 essentially improving the speed of time-to-market. Agile . . on. ja jos sitten haluaa hmment itsen, niin kannattaa tutustua uuteen ksitteeseen eli scrumban. Trkeint ei ole tekniikka vaan siit saatva hyty. Eli jokaisessa tapauksessa paras . Great post I also really like the Scrumban book 8230 I used similar principles by myself in the role of a scrum master (after studying Theory of Constraints and Lean Software Development theories) and it greatly helped to make a good team even better I am glad you provided this nice introduction and motivation article This is a great mechanism for working. This is not at all saying this can not work, but one contextual item where a pull system may NOT work is if during Sprint Planning, you estimate when you need a particular user to help with refining the requirements (story). You can8217t just pull, because you 8220scheduled8221 that person at a particular time most likely for a reason (perhaps they weren8217t available). At the very least, it could become a problem as you would be giving them last minute notice. Again, this is not to say this technique can8217t work, but just to provide a case where it may be more difficult as a consideration for those thinking of using it. If the user (and perhaps it is the product owner, but often he or she may not be) can be totally dedicated to the team or at least to that Sprint, then this could be problematic. Great post and I8217ll be looking for where I can apply these concepts. ClearWorks New Version 8211 2.4 Released Many our customers asking us about enhancements, and we are doing our best to provide requested features and functionality. Today8217s release is a big update of current Sevenuc best seller product (also known as agile lifecycle tool for hardware amp software project) and at the same time composition with other software configuration management tools, and more automation test tools and build servers. Update contains more elements for Lean RampD real-time collaboration platform and reflects latest innovations in Lean Kanban created by Sevenuc and other platform vendors. What8217s New in 2.4 Workflow define for deferent project with Lean stage management. Event and status driven mechanism by Triggers. Email classfication for effective customer request life-cycle management. Complete release support for Lean agile project. Lean RampD behavior improvements for all type of statistic charts . etwa dem guten alten Wasserfall oder Scrum kombiniert werden. Klar, dass daraus sofort der Begriff Scrumban entstanden . 8220The idea of using a simple task board with index cards or sticky notes is as 823082218230 The idea of using a simple task board with index cards or sticky notes is as old as Agile itself. A simple variation of this is a task board with a simple Pending -gt In Process -gt Complete workflow82308230. . best of both systems youll knock the rough edges off of both.160 (Some people use the term Scrum-ban.)160 One of the best influences Kanban can have on Scrum is to put the concept of a sprint . Kanban und Scrum 8211 Literatur und Links8230 Zitiert aus Wikipedia: Kanban in der IT8230 . Once you get work showing on your Kanban board you will see where work is piling up.160 Excess work in process raises a number of challenges. It increases the time a new item will take to travel through the system. It indicates the likelihood of an overburden on the current performers or the next performers. For example, in Scrum iterations when there is a lot of work in development that all moves to test at the end of the iteration this is undesirable.160 You can limit WIP in a number of ways.160 In Scrum, we limit WIP at the iteration boundary and some teams limit WIP by limiting the number of work items that can be active at any one time. The Kanban board calls for explicit WIP limits and also recommends buffer columns to mitigate the impact of variation to keep work flowing through the system. You can certainly explicitly limit WIP and include a buffer or buffers on a normal Agile board. Here is a essay from Lean Software Engineering showing ScrumBan. . Interesting Agile related links8230 Here you will find links to various blog articles and training opportunities outside of DRW. Articles, Blogs, random musings on Agile Agile development is more culture than process8230 . Scrum ceremony I8217m not entirely sure. It seems to be more of a Scrum Kanban mix for now (Scrumban), and I don8217t see them discarding the rest of the Scrum ceremony anytime . Artificial Critical Paths8230 I was reading through Coreys post on Scrum-ban again, and I really liked his point that assigning all8230 . Scrum-ban Lean Software Engineering 8211 lsaquoPrevious Post Bookmarks for June 22nd from 12:43 to 12:51 . . on the KanBan road needs to remember the importance of the Kanban Board. If you are new to KanBan there is a very good minibook available here. KanBan is probably . great article. We are doing Kanban for Administration teams and Maintenance Teams and Software Architecture Teams soon also. It8217s really successful and we like it. XING AG, xing Susanne . useful for getting the entire team on the same page but it039s painful. Now I prefer something more scrum-ban style and hold weekly 30-45 minute planning sessions. Just enough to fill out the backlog for a . Scrum meeting with Raj 8211 September 28th, 20108230 Raj is fixing row total calculation issue in Cargotec PDF (OFF18) Raj can contact Mikko or Jukka for help if needed There is also new field required for the Parma Email Form layout, Tommi will create new issue for this Some discussion about Scrum,82308230 Kanban 8211 Scrumban8230 Quellen zu Kanban u.. Kanban and Scrum making the most of both (eBook als PDF) Kanban ScrumbanKanbanAndScrumInfoQVersionFINAL.pdf Kanban in der IT (Wikipedia)8230 What are the metrics by which you measure your agile development process8230 You introduced some good points, Parker. I would offer that both volatility and velocity (and your mention of story point inflation) measure story grooming, which is not always a proper measurement. It may be that minimal volatility indicates that a 8230 . new ScrumBan development . . agile systems and creating their continously improving Scrum system, which is also called Scrumban. It is basically a evolution of Scrum using the concepts of the Japanese lean Kanban methods. . February, 2010 8230 I really like this one video: an interview with Henrik Kniberg8230 . be used in order to create hybrid agile-Kanban systems. One example is Scrumban, fully discussed by Corey Ladas in his 2008 paper, in which he describes Scrumban as Incrementally enhancing Scrum with more and . . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . . releases You can filter stories and bugs on Kanban Board by release and iterations now. It enables Scrum-ban development process and makes Kanban Board useful for iterative development activities like Release . . Others have made a blended process, often Scrum Leankanban hybrids with the Leankanban part being brought in from the ops side and merged with more of a love for Scrum from the dev side. Though some folks say Scrum and kanban are more in opposition, others posit a combined 8220Scrumban.8221 . I work with a team that does both new feature development as well as production support. The 8220pull8221 concept seems to break down for us is when a support issue needs to be addressed immediately which requires reallocation of individuals from their current WIP task to a support task. This type of scenario 8220pushes8221 a high priority item into the flow and could take us beyond our WIP limits that we are trying to adhere to. Maybe this is just an acceptable situation where an item is pushed into the flow instead of pulled by a team member that is available for work. I would be interested to hear from anyone with similar experiences and how they may have adapted their process to create pull out of a push situation. . pas succomb la mode de l8217open space. Tout le monde son Kanban ou driv (Scrumban). du designer au commercial en passant par le dveloppeur. Nous sommes donc adeptes du Post-It, . . 8222Lean Software Engineering8220 von Corey Ladas und Bernie Thompson: Die beiden teilen ihre Erfahrungen aus Teams von u.a. Microsoft und IBM. Ihre Artikel bekommen hohe Aufmerksamkeit und sorgen regelmig fr umfangreiche Diskussionen. Dir hat dieser Beitrag gefallen Ich freue mich ber einen Kommentar. Du kannst auch meinen RSS-Feed abonnieren oder mir auf Twitter folgen: pherwarth. Foto: Von RafaEU . . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . . I8217m reading the Scrumban book by Corey Ladas. One thing Corey says is that Test-Driven Development is good, but not as good . . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . . software development) and how it can be better scheduled and delivered. One particular post on scrumban is recommended as it builds a complex visual management board step by step from a simple three . Your article is a must read and a must try I have translated it into french : fabrice-aimetti.fr. 1Scrumban Thank you, you8217re great Corey, I have a hard time accepting your premise that: 8220The ideal work planning process should always provide the development team with best thing to work on next, no more and no less.8221 In a manufacturing environment, you don8217t need to know anything other than, 8220How fast are we putting out widgets8221 But in software development, the 20 items that you deployed to production last month may be completely unlike the 20 items you deploy this month. If you have vendors or customers who need to integrate with your application, users who need training, salespeople who need updated presentation materials, etc. etc. etc. you need to be able to tell people what features will be available when. I8217m not saying there has to be a year-long unalterable roadmap, but there are valid reasons to want to know more than, 8220What are the next five things we8217re doing8221 How do you square your statement about planning with all these competing needs for more planning . vdeo apresento nosso 8220Scrum-ban8220, comentando sobre os materiais utilizados para mont-lo e quais so as informaes . When Is A Sprint A Failure8230 Update: this blog is no longer active. For new posts and RSS subscriptions, please go to saintgimp 8230 . team is mature enough to make the most of it, if you8217re not sure, then try transitioning via scrum-ban, it8217ll help you see the benefits and enable you to get better at the things which make kaban . This page is an old draft Introduction This docume8230 In Scrum, we limit WIP at the iteration boundary and some teams limit WIP by limiting the number of work items that can be active at any one time. The Kanban board calls for explicit WIP limits and also recommends buffer columns to mitigate the impact of variation to keep work flowing through the system. Articles There are some articles that the whole world should read. List them here, in an appropriate topic section. If there isn8217t one, add one Try to keep topics alpha ordered82308230. . Story: Scrum-ban Lean Software Engineering) Like this:LikeBe the first to like this post. agile, scrum agile, kanban, scrum How Yahoo . Although this post was published several years ago I only had the chance to read it today. A very interesting concept. I wonder what happened to the concept of Scrumban now. It doesn8217t seem that it actually worked since no one is using it nor speaking about it8230 PM HUT You are incorrect. I and many others are doing versions of Cory8217s idea. For example, it seems the majority of those using AgileZen (agilezen) are using it 8211 see the discussion boards. . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . . Scrum-ban technique is being adapted by many. It is a combination of Scrum and Kanban method. It mainly categorizes support tasks in to Not started, In progress, Done on a white board. Post its with task description will be used to categorize the current pool of tasks. For more do check this nice article. . . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . ScrumBan is a combination of Scrum and Kanban, which is highly efficient for teams that perform product support and maintenance work,82308230 Development Processes amp Tools8230 Introduction This document gives a general overvie8230 your article is Great. Is there a way of buying the Scrumban book in a DRM-free version for Kindle (I assume that the one sold in Amazon is DRM) . Scrum-ban by Lean Software Engineering . . looked around at what others have done. On my ScrumMaster training I was introduced to the idea of Scrum-ban (thanks to Corey Ladas), and this excellent text by Henrik Kniberg and Mattias Skarin. Both . . uns optimalen Grundlagen aus dem Scrum mit denen des Kanban-Prinzips verbunden und dies ber ein Scrum-Ban-Board . . the use of a backlog, ready, specify, complete, execute, done. Further reading can be found in this Blogpost about Scrum-ban by Corey Ladas. Possible Scrumban . I adopted Scrumban for a maintenance team which handles new features as well as bugs. Our Scrumban board comprises of To Do, Development, Testing, Deploying and Done columns. We often have bugs escalated by the support team which needs to be handled urgently. The escalations could either involve development and testing together or just testing (for verification) alone. Escalations causes the current work to either stay idle or be moved back to the To Do column (if the columns have reached the WIP limit) which then increases the lead time. I am guessing this is something normal and acceptable. Would like to hear from anyone with similar experience and how they tackle this situation. super article about scrumban. Seems to be a good approach for (software) product development. . the emergence of agile approaches, one of these firms has now adopted an agile model similar to Scrum-ban for projects that fit the agile sweet spot. For other of types projects, NANW continues to be a . I. What is Scrum A light weight frame8230