Archives: big data

column-oriented DB No ratings yet.

Free and open-source software Columnar DB   Database Name Language Implemented in Notes Apache Druid Java started in 2011 for low-latency massive ingestion and queries Apache Kudu C++ released in 2016 to complete the Apache Hadoop ecosystem Apache Pinot Java open sourced in 2015 for real-time low-latency analytics Calpont InfiniDB C++ ClickHouse C++ released in 2016 to • Read More »


Cassandra Query No ratings yet.

Time Series db using cassandra https://docs.datastax.com/en/tutorials/Time_Series.pdf Try it out https://www.datastax.com/try-it-out   cqlsh:demo> create TABLE demo.users3(lastname text, firstname text, time timestamp , primary key(lastname, time)); cqlsh:demo> INSERT INTO users2(lastname, firstname , time ) VALUES ( ‘test1’, ‘testfir’, 164447) USING TTL 20; cqlsh:demo> select firstname FROM demo.users2;   Data Modeling However, in Cassandra, the data access queries • Read More »


bigdata OLTP , OLAP No ratings yet.

row-based vs col based db or format row based –> good for OLTP ( transcation),   e.g: cassendra col based –> good for OLAP (? easy to aggreation etc?), druid Parquet ( column based data format): https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/ https://www.upsolver.com/blog/apache-parquet-why-use   hadoop: big data storage, what is the alternatives? S3 on cloud? https://www.alluxio.io/learn/hdfs/basic-file-operations-commands/ https://stackoverflow.com/questions/31011078/data-retention-in-hadoop-hdfs   pinot vs cassandra • Read More »


bigdata platform with Kubernets or Hadoop No ratings yet.

Hadoop: Hadoop kubernets MapReduce Spark on K8s Flink stream HDFS S3? any better one Resource manager Yarn/Mesos K8s itself   During its evolution phase, Hadoop provided three main functionalities that made it a Big Data-ready solution: a distributed computer mechanism (MapReduce), a robust data storage (HDFS), and a resource manager (YARN/Mesos). But modern technologies now • Read More »



Apache Kafka big picture and quick start No ratings yet.

What is Apache Kafka? ( big picture)  I found the article http://www.confluent.io/blog/stream-data-platform-1/ ( from Jay Kreps) presented a very good big picture on what Kafka suppose to do: you can use Kafka to build a stream data platform. Here the pictures from that article. The big idea is simple: many business processes can be modeled • Read More »